Experiments Tests

Acceptance tests

A tutorial for creating and managing acceptance tests on Nextmv Cloud.

Acceptance tests are formal offline tests that verify if a system satisfies business requirements. In an optimization context, an acceptance test determines if business goals or key performance indicators (KPIs) are met by a new model (typically on a set of inputs always used for testing) and allow you to determine whether or not to deploy a model update to production.

When an acceptance test is run, data is collected from runs made with the baseline and candidate instances, and then the candidate instance's metrics are compared to the baseline instance’s metrics. The result of these comparisons is determined by the operator for how the metric should be evaluated. For example, if should increase is set for a metric, it means that the value of the metric in the output returned for the candidate instance should be greater than the same metric value returned from the baseline instance.

To be concise, an acceptance test is based on a batch experiment. For the metrics specified in the acceptance test, it compares the results of two instances: candidate vs baseline. For each metric (comparison), the acceptance test gives a pass/fail result based on the operator.

The ID and name of the acceptance test and the underlying batch experiment must be the same.

When using subscription apps, make sure the candidate and baseline instances do not use a major version, i.e.: v1 or v2. Instead, assign a complete (specific) version to your instances, i.e.: v1.1.0.

Acceptance tests are designed to be visualized in the Console web interface. Go to the app, Experiments > Acceptance tab.

Acceptance test

There are several interfaces for creating batch experiments:

Defining metrics

When you are creating an acceptance test you must define the metrics you want to analyze. At least one metric is required to run an acceptance test. These metrics are user-defined, though if you are using a subscription app or a custom app based on a template, there are some pre-defined metrics available to you.

The metrics are governed by the statistics convention. Any item under the statistics block is a valid entry for an acceptance test metric. To specify a metric, use object dot notation for path reference, starting from the .statistics field of the output. The metric is specified relative to the parent statistics block.

To compare metrics, you must define the operator for the comparison.

OperatorSymbolDescription
eq==Equal to
gt>Greater than
ge>=Greater than or equal to
lt<Less than
le<=Less than or equal to
ne!=Not equal to

Consider the meal allocation output as an example.

{
  "options": {
    "solve": {
      "control": {
        "bool": [],
        "float": [],
        "int": [],
        "string": []
      },
      "duration": 10000000000,
      "mip": {
        "gap": {
          "absolute": 0.000001,
          "relative": 0.0001
        }
      },
      "verbosity": "off"
    }
  },
  "solutions": [
    {
      "meals": [
        {
          "name": "A",
          "quantity": 2
        },
        {
          "name": "B",
          "quantity": 3
        }
      ]
    }
  ],
  "statistics": {
    "result": {
      "custom": {
        "constraints": 2,
        "provider": "HiGHS",
        "status": "optimal",
        "variables": 2
      },
      "duration": 0.123,
      "value": 27
    },
    "run": {
      "duration": 0.123
    },
    "schema": "v1"
  },
  "version": {
    "go-mip": "VERSION",
    "sdk": "VERSION"
  }
}
Copy

These are valid metrics for the acceptance test:

  • result.value with le: the value of the result in the candidate must be less than or equal to the baseline.
  • result.custom.constraints with eq: the number of constraints in the candidate must be equal to the baseline.
  • result.custom.variables with eq: the number of variables in the candidate must be equal to the baseline.
  • run.duration with ge: the run duration of the candidate must be greter than or equal to the baseline.

Console

Go to the Console web interface, and open your app. Go to the Experiments > Acceptance tab. Click on New Acceptance Test. Fill in the fields.

Acceptance tests

A new batch experiment will be created with the same ID and name as the acceptance test.

To specify many metrics, it is recommended that you use the Free-form tab in the Metrics section. In this view, each metric is specified as a new line in this format:

path: operator

For example:

result.value: le
result.custom.constraints: eq
result.custom.variables: eq
run.duration: ge
Copy

Nextmv CLI

Define the desired acceptance test ID and name. As mentioned above, an acceptance test is based on a batch experiment.

  • If you already started a batch experiment, you don't need to provide the -s, --input-set-id flag. In that case, the ID and name of the acceptance test and the underlying batch experiment must be the same.
  • If you didn't start a batch experiment, you need to provide the -s, --input-set-id flag and a new batch experiment will be created for you, with the same ID and name as the acceptance test.

Start by defining the metrics you want the acceptance test to use.

nextmv experiment acceptance init
Copy

The command will produce a metrics.json file. Edit this file to include the metrics you want to use.

Once the metrics.json file is ready, run the following command to create the acceptance test.

nextmv experiment acceptance start \
    --app-id $APP_ID \
    --experiment-id $EXPERIMENT_ID \
    --name "YOUR_EXPERIMENT_NAME" \
    --baseline-instance-id $INSTANCE_ID \
    --candidate-instance-id $INSTANCE_ID \
    --input-set-id $INPUT_SET_ID \
    --description "An optional description" \
    --metrics metrics.json \
    --confirm
Copy

Python SDK

Define the desired acceptance test ID and name. As mentioned above, an acceptance test is based on a batch experiment.

  • If you already started a batch experiment, you don't need to provide the input_set_id parameter. In that case, the ID and name of the acceptance test and the underlying batch experiment must be the same.
  • If you didn't start a batch experiment, you need to provide the input_set_id and a new batch experiment will be created for you, with the same ID and name as the acceptance test.
import json
import os

from nextmv.cloud import (
    Application,
    Client,
    Comparison,
    Metric,
    MetricParams,
    MetricType,
)

client = Client(api_key=os.getenv("NEXTMV_API_KEY"))
app = Application(client=client, id=os.getenv("APP_ID"))
acceptance_test = app.new_acceptance_test(
    candidate_instance_id="latest-2",
    control_instance_id="latest",
    id=os.getenv("ACCEPTANCE_TEST_ID"),
    name=os.getenv("ACCEPTANCE_TEST_ID"),
    metrics=[
        Metric(
            field="result.value",
            metric_type=MetricType.direct_comparison,
            params=MetricParams(operator=Comparison.less_than),
            statistic="mean",
        ),
        Metric(
            field="result.custom.activated_vehicles",
            metric_type=MetricType.direct_comparison,
            params=MetricParams(operator=Comparison.greater_than),
            statistic="mean",
        ),
    ],
    # input_set_id=os.getenv("INPUT_SET_ID"), # Defining this would create a new batch experiment.
    description="An optional description",
)
print(json.dumps(acceptance_test.to_dict(), indent=2))  # Pretty print.
Copy

Cloud API

Define the desired acceptance test ID and name. As mentioned above, an acceptance test is based on a batch experiment. The acceptance test ID and name must be the same as the batch experiment ID and name, respectively.

curl -sS -L -X POST \
    "https://api.cloud.nextmv.io/v1/applications/$APP_ID/experiments/acceptance" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $NEXTMV_API_KEY" \
    -d "{
      \"id\": \"YOUR-ACCEPTANCE-TEST\",
      \"experiment_id\": \"$EXPERIMENT_ID\",
      \"name\": \"$EXPERIMENT_ID\",
      \"control\": {
        \"instance_id\": \"$INSTANCE_ID\"
      },
      \"candidate\": {
        \"instance_id\": \"$INSTANCE_ID\"
      },
      \"metrics\": [
            {
                \"field\": \"result.value\",
                \"metric_type\": \"direct-comparison\",
                \"params\": {
                    \"operator\": \"lt\"
                },
                \"statistic\": \"mean\"
            },
            {
                \"field\": \"result.custom.activated_vehicles\",
                \"metric_type\": \"direct-comparison\",
                \"params\": {
                    \"operator\": \"gt\"
                },
                \"statistic\": \"mean\"
            }
        ],
      \"description\": \"An optional description\"
    }" | jq

Copy

Results

Also included is a Statistical Results table that can be used as an aid when interpreting the significance of the results. It includes the difference between the mean value of the metric for the candidate and the baseline instance, the percentage change of that difference, and the associated p-value.

The p-value is calculated with the Wilcoxon signed rank test with continuity correction. This value gives an indication of whether the change in value is statistically significant, but does not account for the intended direction of the test. If there is no difference in the data, the p-value is not provided.

Note that in cases where one of the runs from either the candidate or baseline instances failed, the paired observation for this input will be excluded from analysis.

Page last updated

Go to on-page nav menu