Experiments Tests

Batch experiments

A tutorial for creating and managing batch experiments on Nextmv Cloud.

Batch experiments are offline tests used to analyze the output from one or more decision models on a fixed input set. They are generally used as an exploratory test to understand the impacts to business metrics (or KPIs) when updating a model with a new feature, such as an additional constraint. They can also be used to validate that a model is ready for further testing and likely to make an intended business impact.

To be concise, batch experiments take the inputs from the input set and the specified instances and do a run for each input-instance pair. The results are then aggregated and analyzed. The decision model must produce an output that contains a statistics block that uses the statistics convention.

Batch experiments are designed to be visualized in the Console web interface. Go to the app, Experiments > Batch tab.

Batch experiment

A batch experiment needs an input set to run.

There are several interfaces for creating batch experiments:

In addition to an input_set_id and instance_ids, a batch experiment can process runs and option_sets.

  • runs. A list of objects.

    "runs": [
      {
        "instance_id": "string",
        "version_id": "string",
        "option_set": "string",
        "input_id": "string"
      }
    ]
    
    Copy

    Without runs, the batch experiment will run all of the inputs in the input set on all of the instances. With runs, the batch experiment will only run the specified inputs on the specified instances. The runs list can also be used to run the same input multiple times on specific instances or versions.

    A run can specify an option_set to be used for that run.

  • option_sets. A dictionary of objects.

    "option_sets": {
      "option_set_id": {
        "string": "string"
      }
    }
    
    Copy

    option_sets is a dictionary of options that will be used for a run. An option set can be referenced in the runs list via its option_set_id. This can be useful for example if:

    • You want to compare the results of different solvers that have conflicting options. E.g. one uses solve.duration for the maximum runtime while the other uses duration.
    • You want to run specific inputs with different options within the same experiment.

Console

Go to the Console web interface, and open your app. Go to the Experiments > Batch tab. Click on New Experiment. Fill in the fields.

Batch experiments

Nextmv CLI

Define the desired batch experiment ID and name. After, create the batch experiment.

nextmv experiment batch start \
    --app-id $APP_ID \
    --experiment-id $EXPERIMENT_ID \
    --name $EXPERIMENT_ID \
    --input-set-id $INPUT_SET_ID \
    --instance-ids "latest,latest-2" \
    --description "An optional description" \
    --confirm
Copy

Alternatively, you can use a JSON file that contains all of the information needed to start a batch experiment. The file must be valid JSON and equal to the payload of the POST request when using the Cloud API.

Using the definitions file you can fine tune the batch experiment by specifying runs and option_sets, for example.

Consider this example using a JSON file named batch-experiment.json.

{
  "id": "experiment-1",
  "name": "experiment-1",
  "description": "An optional description",
  "input_set_id": "input-set-1",
  "option_sets": {
    "long_running": {
      "solve.duration": "10s"
    },
    "short_running": {
      "solve.duration": "5s"
    }
  },
  "runs": [
    {
      "instance_id": "latest",
      "input_id": "latest-RNBs7AKSg",
      "option_set": "long_running"
    },
    {
      "instance_id": "latest-2",
      "input_id": "latest-RNBs7AKSg",
      "option_set": "short_running"
    },
    {
      "instance_id": "latest",
      "input_id": "latest-fK_wZ0FSg",
      "option_set": "long_running"
    },
    {
      "instance_id": "latest-2",
      "input_id": "latest-fK_wZ0FSg",
      "option_set": "short_running"
    }
  ]
}

Copy

Python SDK

Define the desired batch experiment ID and name. After, create the batch experiment.

import os

from nextmv.cloud import Application, Client

client = Client(api_key=os.getenv("NEXTMV_API_KEY"))
app = Application(client=client, id=os.getenv("APP_ID"))
batch_experiment_id = app.new_batch_experiment(
    id=os.getenv("BATCH_EXPERIMENT_ID"),
    name=os.getenv("BATCH_EXPERIMENT_ID"),
    input_set_id=os.getenv("INPUT_SET_ID"),
    instance_ids=["latest", "latest-2"],
    description="An optional description",
)
print(batch_experiment_id)
Copy

Defining runs and option_sets is supported as well.

Cloud API

POSThttps://api.cloud.nextmv.io/v1/applications/{application_id}/experiments/batch

Create and start batch experiment.

Create and start batch experiment.

Define the desired batch experiment ID and name. After, create the batch experiment.

curl -sS -L -X POST \
    "https://api.cloud.nextmv.io/v1/applications/$APP_ID/experiments/batch" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $NEXTMV_API_KEY" \
    -d "{
      \"id\": \"$EXPERIMENT_ID\",
      \"name\": \"YOUR_EXPERIMENT_NAME\",
      \"input_set_id\": \"$INPUT_SET_ID\",
      \"instance_ids\": [\"$INSTANCE_ID\", \"$INSTANCE_ID\"],
      \"description\": \"An optional description\"
    }" | jq
Copy

Result of a batch experiment - Console

Nextmv Console displays the results grouped by indicator keys and for each one includes:

  • a summary table
  • the percentile values
  • a box plot chart displaying summary values from the instances compared

Result of a batch experiment - CLI

The results can also be retrieved with the Nextmv CLI, which returns the raw data in JSON format and is useful if you would like to perform your own operations on the experiment results.

nextmv experiment batch result \
    --app-id $APP_ID \
    --experiment-id $EXPERIMENT_ID
Copy

The result of a batch experiment is returned as JSON. If you’re viewing the result in Console, the JSON is parsed and displayed in a web view; Nextmv CLI returns the complete JSON response. The response contains the batch experiment metadata and the results in the form of grouped distributional summaries.

Top-level properties

The table below summarizes the top-level properties in the return.

Field Description
idThe batch experiment ID that was specified.
nameThe batch experiment name that was specified.
descriptionThe batch experiment description that was specified. If no description was specified this field will not exist.
statusThe status of the experiment. The status can be: started, completed, or failed.
created_atThe date the experiment was created and started.
input_set_idThe input set ID specified for the batch experiment.
instance_idsAn array of the Instance IDs that were specified for the batch experiment.
grouped_distributional_summariesThe grouped distributional summaries is an array that contains the results of the batch experiment. It is a collection of calculated results from the individual runs by certain groupings.

Grouped distributional summaries

There are three types of summaries included in the grouped_distributional_summaries array (there could be more in the future):

  1. Version (instance)
  2. Version (instance) + input
  3. Input

Each type is included for every experiment. However, note that if you are viewing the experiment in Console, ONLY the version summaries are displayed. In the future, Console will display all types of summaries.

No matter the type, each grouped distributional summary includes the following:

Field Description
group_keysThis describes the type of group distributional summary which can be one of three options:
  • instanceID, versionId (the version summary)
  • inputID, instanceID, versionId (the version & input summary)
  • inputID (the input summary)
group_valuesThe values that correspond to the group_keys. So if the group keys are instanceID and versionID, the group_values will be the ID of the instance and the ID of the version.

indicator_keys

These are the statistics that are being evaluated by the batch experiment.

If you’re using the nextroute template, six statistics are automatically added for evaluation:

  • result.value,
  • result.elapsed,
  • metadata.duration,
  • result.custom.routing.stops.unassigned,
  • result.custom.routing.stops.assigned, and
  • result.custom.used_vehicles.

Custom statistics can be specified by adding them to the output JSON’s top-level statistics block. See the Add custom statistics section of the Statistics convention reference.

indicator_distributionsAn object that contains all of the values from the analysis for that particular indicator. If there are six indicator keys for example, the indicator_distributions will contain six object properties, the property key will correspond to the values in the indicator_keys array and each property value will be an object with matching data (see Indicator distributions section below).
number_of_runs_totalThis is the number of runs that were analyzed for this particular summary. For example, if you ran an experiment with two instances and an input set with three runs, the version summary (instanceID + versionID) will have a run total of three runs because it is running all three input files on that particular version. The version & input summary (instanceID + versionID + inputID) will have one run because it ran that one input file on that particular version. And the input summary (inputID) will have two runs because it ran that input file on the two instances.

Indicator distributions

Each object property value in the indicator_distributions contains the values in the table below. Note that for some runs, certain values may be missing (a custom statistic for example). If you’re viewing the results in Console and a grouped distributional summary is missing values, a warning message will appear. If you are analyzing the results from the returned JSON, you must handle this check in your own systems.

When the runs are being evaluated, the final value is taken from the last solution found before the run has been terminated. The run duration can be set as an option on the experiment or will be set by the executable binary used for the run.

All values in the indicator distributions are either numbers or strings. If they are strings, they are one of three string values: nan, +inf, or -inf.

Field Description
minThe minimum of the values returned from the runs for the statistic being evaluated. For example, if you are viewing the result.custom.used_vehicles indicator distribution, and if there were three runs with one input file returning 50, another 40, and the other 60; the min value would be 40.
maxThe maximum of the values returned from the runs for the statistic being evaluated.
countThe number of successful runs that have the specific indicator in their statistics output.
meanThe average of the values returned from the runs for the statistic being evaluated.
stdThe standard deviation of the values returned from the runs for the statistic being evaluated. Uses a denominator of n−1 (see Corrected sample standard deviation).
shifted_geometric_meanThe shifted geometric mean of the values returned from the runs for the statistic being evaluated. (The shift parameter is equal to 10.)
percentilesAn object that contains the percentiles of the values returned from the runs for the statistic being evaluated. There are nine values that give the following percentiles: 1%, 5%, 10%, 25%, 50%, 75%, 90%, 95%, and 99%.

Experiment runs

Runs made for the experiment can be retreived with this endpoint.

GEThttps://api.cloud.nextmv.io/v1/applications/{application_id}/experiments/batch/{batch_id}/runs

List batch experiment runs.

List runs used for batch experiment specified by application and batch ID.

This will return the runs of the experiment in the following format:

{
  "runs": [
    {
      "id": "staging-dehnyogSg",
      "input_id": "staging-wZ8xcIQ4R",
      "created_at": "2023-08-21T21:09:41.48243386Z",
      "application_id": "pizza-delivery",
      "experiment_id": "pizza-delivery-region-compare",
      "application_instance_id": "nyc-pizza-region",
      "application_version_id": "nyc-v1.2.13",
      "status": "succeeded",
      "statistics": {
        "status": "succeeded",
        "indicators": [
          {
            "name": "result.value",
            "value": 2175064,
          },
          ...
        ]
      }
    },
    ...
  ]
}
Copy

Where runs is an array that contains the runs made for the experiment. Each run object includes the run metadata plus any summary statistics that were specified in the excecutable binaries used for the experiment (see table below).

Experiment runs must be retrieved independent of the batch experiment details. However, in Console this run history table can be viewed at the bottom of the experiment details. Each run history item can be clicked to view the details of the run. If the app is a routing app (using either the routing or nextroute template) the run details will also include a visualization of the results.

The run history data can also can be downloaded as a CSV file in Console. Click the Download CSV link in the upper right area of the experiment run history table to download the data as a CSV file.

Note that experiment runs are not shown in your app’s run history.

Experiment run summary data

Run history objects can include the items specified in the table below.

Field Description
idThe id for the run. This is generated automatically and cannot be changed.
created_atThe date and time the run was created.
application_idThe ID of the application in which the run was made.
application_instance_idThe ID of the instance used for the run.
application_version_idThe ID of the version used for the run.
experiment_idThe ID of the experiment for which the run was made.
input_idThe ID of the input used for the run.
statusThe status of the run. This can be either running, succeeded, or failed. If the status if failed, no error will be shown in the run history data, but the details of this run will contain relevant error messages. The run details for an experiment run can be viewed in Console by clicking on the run ID in the experiment run history table or retrieved using the /applications/{application_id}/runs/{run_id} endpoint.

statistics

The statistics evaluated for the experiment and their values. The statistics field contains the following fields:

  • status
  • error
  • indicators

The status of the statistics can be none, pending, succeeded, or failed. If it’s failed, then indicators is not present and error contains the error message. If the status is succeded, then indicators contains the values and error is not present. If the status is none or pending, then both error and indicators are not present.

The indicators field contains an array of the statistic indicator evaluated for the batch experiment in the format of { name: <indicator_name>, value: <indicator_value> }.

In Console, the indicator values are shown in the experiment run history table only if they are present. No errors are displayed.

Page last updated

Go to on-page nav menu