Experiments are used to evaluate changes to your models by running and comparing the results of one or more executable binaries (i.e. different versions). Experimentation is a key part of developing a good model and Nextmv’s goal is to make it easier to run experiments so you can focus on improving your model.
Nextmv Platform provides a suite of products to create and manage different types of experiments. There are five types of experiments: scenario, batch, acceptance, shadow, and switchback; along with input sets and managed inputs to organize the data used for experiments. Experiments are always created and managed in the context of an application. That is, each application will have its own set of experiments (that you have created). See the Apps core concepts page for more information about applications.
Experiments and input sets can be created and managed with Nextmv CLI, Nextmv Console, or the HTTP API endpoints. Created experiments are saved and can be accessed at any time. After experiments have been started, the results are aggregated and can be retrieved with the same tools. When viewing the result of an experiment, Console provides a visual interpretation of the results, while the API and Nextmv CLI provide the raw JSON.
The different types of experiments and input sets are summarized below.
Types of experiments
Scenario
Scenario tests compare the output from one or more scenarios. A scenario is composed of a model version, a collection of inputs, and any specific configuration that should be applied to the runs for that scenario. You can also configure repetitions to test for variability in the results.
You can use scenario tests as a way to explore impacts to business metrics (KPIs) based on model updates, different conditions (e.g. low demand vs. high demand), parameter tuning, and more. You can also use scenario tests as a way to validate that a model is ready for further testing and likely to make an intended business impact.
Batch
Batch experiments are used to analyze the output from one or more decision models. They are generally used as an exploratory test to understand the impacts to business metrics (or KPIs) when updating a model with a new feature, such as an additional constraint. They can also be used to validate that a model is ready for further testing — and likely to make an intended business impact.
See the batch experiment reference guide for more information on batch experiments.
Acceptance
Acceptance tests build on the core concept of a batch test with a focus on evaluating the differences between exactly two models and assigning a pass / fail label based on predefined thresholds. They are used to verify if business or operational requirements (e.g., KPIs and OKRs) are being met. Acceptance tests involve running an existing production model and a new updated model against a set of test data. You then look at the results and determine if the new model is acceptable based on criteria identified beforehand.
See the acceptance tests reference guide for more information on acceptance tests.
Shadow
A shadow test is an experiment that runs in the background and compares the results of a baseline instance against a candidate instance. When the shadow test has started, any run made on the baseline instance will trigger a run on the candidate instance using the same input and options. The results of the shadow test are often used to determine if a new version of a model is ready to be promoted to production.
Shadow tests can be created using the CLI, Nextmv console or the HTTP API. See the shadow test reference guide for more information on shadow tests
Switchback
Switchback tests for decision models allow algorithm teams to analyze the performance of a candidate model compared to a baseline model using production data and conditions while making operational decisions by randomizing the candidate treatment over units of time.
Switchback tests are related to general A/B tests, but they are not the same. Switchback tests allow you to account for network effects, whereas A/B tests do not.
Switchback tests can be created using the Nextmv console or the HTTP API. See the switchback test reference guide for more information on switchback tests
Managed inputs
Managed inputs are input data that you upload and manage directly on the platform. You can create a managed input from an uploaded file or by referencing a previous run. Managed inputs can be created and managed with Nextmv CLI, the Python SDK, Nextmv Console, or the HTTP API endpoints.
Input sets
Input sets are defined sets of inputs to use for an experiment. You can create input sets with Nextmv CLI, the Python SDK, Nextmv Console, or the HTTP API endpoints.
An input set can be composed of managed inputs, inputs from prior runs (by referencing run IDs), or inputs gathered from a date range and instance ID. Note that the maximum number of inputs allowed in an input set is 20.
Custom metrics (statistics convention)
It is often useful to define custom metrics to evaluate the results of an experiment. Custom metrics are defined as part of the run output in the metrics or statistics field.
The metrics field is a flexible JSON and can be fully customized to your needs. For more information on custom metrics, see the metrics reference guide.
Review the Results
After running an experiment from the CLI, navigate to the Nextmv console to view the results of your experiment comparing the models. Note, when running large experiments, you may need to check back later to view results.
Within the Nextmv console, you'll find your experiment under the Experiments section.