There are two main content formats supported by Nextmv when performing a new remote run of an application:
json: the application reads fromstdin, writes results tostdout, and logs tostderr. Input and output are expected to be inJSONformat. This is the default content format.multi-file: the application reads input data from one, or more files, and writes output data to one, or more files. Logs are streamed to bothstdoutandstderr.
This documentation details how to work with the multi-file content format and the Cloud API. These are the steps that you must follow to start and get the results of a multi-file run.
- Understand the
multi-fileconvention. - Create the input archive.
- Upload the input files.
- Start a new run with the
multi-filecontent format. - Wait for the run to finish / poll for status.
- Download the output files.
1. The multi-file convention
If you want your application to work with the multi-file content format, it must follow these conventions:
- Input data must be read from one, or more, input files.
- Output data must be written to one, or more, output files.
- Output metrics (statistics) must be written to a
.jsonfile. - Custom assets and visuals must be written to a
.jsonfile. - Configurations (options) must be read from command-line arguments.
The app.yaml (app manifest) holds the configuration for the multi-file content format. Here is an example configuration:
The example above is the default configuration for the multi-file content format. This means that if your application follows the default file structure and naming conventions, you do not need to specify any additional configuration in the app manifest.
input/pathis the directory (folder) where the input files are expected to be found. The input files should be eitherUTF8encoded files (e.g..csv,.json) or Microsoft Excel (.xlsx) documents. This folder is expected to be present when the application starts running. Your application can read input files from a different location, and in that case you must specify the correct path in the app manifest.outputs/solutionsis a directory expected to contain the output of the run after the application completes processing. This output is expected to be one or more files that are eitherUTF8encoded or Microsoft Excel (.xlsx) documents. This folder is then compressed into agzipcompressed tarball (.tar.gz) and stored on Nextmv Cloud as the output of the run. Your application can write solution files to a different location, and in that case you must specify the correct path in the app manifest.outputs/statisticsis a directory expected to either be empty or contain statistics information for the run used for experimentation on the Nextmv Cloud platform. If empty, no statistics information is recorded on the run on the Nextmv Cloud. To output statistics to the Nextmv Cloud from the run, the application should create astatistics.jsonfile withinoutputs/statisticsthat contains aJSONobject following the Nextmv statistics convention. Your application can write the statistics file to a different location, and in that case you must specify the correct path in the app manifest.outputs/assetsis a directory expected to either be empty or contain additional assets for the run that can contain supplemental or transformative files that are generally not informationally additive to the output, such as, but not limited to, assets for enabling custom visualization of run results on Nextmv Console. If empty, no assets are recorded on the run on the Nextmv Cloud. To output assets to the Nextmv Cloud from the run, the application should create aassets.jsonfile withinoutputs/assetsthat contains aJSONobject following the Nextmv run assets convention. Your application can write the assets file to a different location, and in that case you must specify the correct path in the app manifest.
This section showed how to configure the app.yaml manifest to use the multi-file content format. In addition to configuring it in the manifest, there are two other ways in which you can specify the multi-file content format so that a run can use it.
- Specify the
multi-filecontent format in the run creation request payload. This will be addressed in step 4 of this guide. - Specify the
multi-filecontent format in the instance configuration.
The hierarchy of precedence for the content format configuration is as follows:
- Run creation request payload. This takes highest precedence.
- Instance configuration.
- App manifest configuration.
2. Create the input archive
Input files should be all placed within a single directory (folder). The input folder must be compressed into a tar archive compressed with gzip (.tar.gz). This archive must contain only either UTF8 encoded files or Microsoft Excel (.xlsx) documents.
Consider the following example. We will create a directory inputs/ and two files within that directory:
file_1.csvfile_2.json
The structure of the current working directory should look like this:
Then, we will compress the inputs directory into an archive using tar.
A file named inputs.tar.gz should have been created in the current working directory. Once the archive is created, the structure of the current working directory should look like this:
3. Upload the input files
You are now going to upload the inputs.tar.gz file to Nextmv Cloud to prepare it for use on a run.
Request a presigned URL to upload your input archive to.
Retrieve unique upload URL and ID.
Retrieve a unique URL and ID for uploading files.
Note that you must specify the correct APP_ID path parameter. The result will contain an upload_id and upload_url.
This returned upload URL will only last for 15 minutes. If it expires, simply request a new one and take note of the new upload ID.
You will need the upload_id to start a run and the upload_url to actually upload the input archive.
Upload your input using the upload_url that was obtained before. We use --data-binary here to make sure curl sends the content of the file unaltered.
In the above code snippet, replace <UPLOAD_URL_OBTAINED_FROM_PREVIOUS_STEP> with the actual upload_url obtained from the previous step. Also note that if you named the .tar.gz file differently, you should replace inputs.tar.gz with the actual name of your file.
An empty response means the upload was successful. Note that this may take a moment depending on the size of the input file.
4. Start a new run with the multi-file content format
Use the following endpoint to start a new run.
New application run.
Create new application run.
When creating a run that uses multi-file content format input, you must ensure these requirements are met:
- The run creation payload must include the
upload_idobtained from the input upload step. - If not specified in the
app.yamlor instance configuration, the run's configuration must specify that the input content format ismulti-file.
Here is a sample curl command that showcases the use of the upload_id and the multi-file content format in the run creation request.
Note that you must specify the correct APP_ID and INSTANCE_ID path parameters.
After the run is created, take note of the run_id from the response. You will need it to poll for status and download the output files.
5. Poll for status / wait for the run to finish
As it is described in the run methodology, you must poll for the status of the run until it is completed. The run metadata can be retrieved while the run is still in progress (or at any point). The metadata.status_v2 key will contain the status of the run.
Get run status without output.
Get the status of a run without the output.
Sample curl request for obtaining metadata for a run:
Once the run has finished, you can proceed to download the output files.
6. Download the output files
In the same way as with inputs, the output consists of one or more files. Use the following endpoint to retrieve the results of the run. You must specify the format=url query parameter to indicate that you want to receive a URL to download the output files, otherwise you get a 400 error.
Get run result.
Get the result of a run.
Here is a sample curl command that showcases how to get the output URL for a completed run.
Note that you must specify the correct APP_ID and RUN_ID path parameters.
With this command, you should receive a response similar to this one:
For a json content format run, the output key would contain the actual output data. However, since this is a multi-file content format run, the output key contains a url key with a presigned URL to download the output archive.
You can download the output archive using curl as follows:
Replace <OUTPUT_URL_HERE> with the actual URL from the output.url field in the previous response.
An output.tar.gz file should have been created in the current working directory. You can also name it something else if you prefer. You can extract the contents of this archive using tar as follows:
This command will create an outputs/ directory and extract the contents of the archive into that directory.
The structure of the current working directory should look like this:
The outputs/ directory now contains the output files generated by the run. This example shows two sample output files but your actual output files may vary depending on your application logic.