Export Walkthrough

Getting started from the UI

The easiest way to get started is to export from the UI.

When we first arrive we see some options and an empty list:

2340

Here it has defaulted to the directory Luckfrill.

When we click generate we see

2410

For large exports it may be several minutes or longer. Small exports are faster. You may need to click the refresh button to see updated status and results.

2377

We must allow pop-ups for the download button to work. When we click download we see this file come up:

1248

Example downloading deep learning annotations diffgram

When I first load the file it may be hard to read.

2376

We can "un-minify" to expand the JSON. (Your editor may do this automatically)

1199

And that's it! Now we have the file ready review and use.
For details on the meaning of the file please refer to to the details here,.

Completed Files Only

By default, exports from the UI only export files marked as complete.

486

This is optional.

🚧

Caution: File Completion Status is different from Task Status

Because a file may be worked on by multiple tasks, the file status is separate from the task status. For example, Creating new tasks using existing files will reset the File completed status. Use the check values to verify that the export is exporting what you expect.

Check Values

A file count is provided as a check value. We highly recommend making sure the file count matches the expected value, ie for number of tasks complete, or viewing files in the studio.

191

For more general inspection and visibility on work see Reporting.

If you have additional ideas on check values please send feedback.

Export Troubleshooting and Alignment - Overview

There are a variety of ways data can be modified, including tasks, data-streams, and connections. There are also a variety of settings and choices for exporting. It's important to be aware of these differences.

👍

Alignment between Engineering, Project Admins and Data Science

The Export is where the "rubber hits the road" in terms of the results. Communication on expectations and understanding of settings here is important to get a good result.

Assumptions

  1. Do the settings chosen at import align with my export expectations?
    For example
  • If a video was split, then global frame numbers will be available, and this will change how the clip is presented at export.
  • Resolution - double check provided width and height (inside generated file).
  1. Was the Job/Tasks etc completed to my expectations?
    May visually inspect / spot check Tasks

Export settings

Correct, meaning the right match for the above expectations:

  1. Is the correct source selected?
    For example, for generated exports, can go to the job from the source, and then visually inspect the tasks to ensure alignment.

  2. Is the correct name (sub option) selected?
    Is the {job_name, directory name} a match?

  3. Check options?
    ie: Is {completed, all} as expected?

  4. Look at check values, file counts.

  5. Advanced conditions
    Modified files, removed instances, etc.

By default removed instances are optionally available on the UI but not included in the export.

Data is randomly not lining up.

While bugs are always possible, in general export errors are usually more dramatic then they first seem. Often the above troubleshooting steps can help resolve issues.

In general, anything that is rendering in the studio / tasks can be exported. The system saves automatically on a regular basis, and reloads data for confirmations / inspections.

Export File Considerations

Considerations when reviewing exports:

  • Order may not be preserved.
    For example, attribute groups, instance_list, and even files in general. So if for example a flag (such as complete/not complete) causes files to be excluded, then it may appear that data is randomly missing.

  • Some resources are unique per project.
    For example, a label "Apple" is created in two projects. It will have a different File ID.

  • There are limits on various list sizes.
    For example, it's possible to hit a limit on how many instances are generated per file.
    While this is rare, if you have tried the other troubleshooting items and are still "missing" information please contact support.

Other administrative considerations:

  • Am I looking at the correct file?
    Files are named with the export ID, datetime, and other information to help identify them.

Note in data stream cases, there could be various reasons that the latest file is not pushed/available in your cloud bucket.

  • When was the file generated?
    For example, if the file was generated a few hours ago, and new instances have been added (or status changes etc), then the export will need to be regenerated.

Note that completed status does not "lock" a file, so a completed file may still be modified if needed.

Export Generation through SDK

📘

SDK Version >= 0.1.7.6

From a Job

First we get the job by the id. Then we create a new export.

job = project.job.get_by_id(id = your_job_id)
data = job.generate_export()
print(data)

This assumes we have defined a project object as defined here.

By default it returns the results as JSON:

1835

Example of printing JSON returned from job.

This is similar to if we had clicked Generate on the UI, then Download, then loaded the file into our application.

📘

Must store or retrieve job ID

The assumption is that when a job is being created your system will store the job_id to retrieve the job. If the job id is not stored it can be retrieved from the web UI by navigating to the job. The URL is of the format job/job_id.

2248

If we navigate to the web UI we can also see the export has been added here.

Return URL

Another option is to generate a URL to the resource. This can be useful for downloading the file to some other system or storage option.

your_job_id = 1
job = project.job.get_by_id(id = your_job_id)

url = job.generate_export(
	return_type = "url")

print(url)
1820

Running the above example prints the URL with access to the export.

For the return_type we have selected return by url. We print the URL. To test it we can copy the URL into our browser and download the JSON file.

Trigger Export generation without blocking

📘

Returns meta data, not annotations

It returns an Export object. This is not the actual annotation data.

👍

Recommended approach for large datasets and/or repeated downloads

By default, the calling export results will wait until the export generation is complete to return.

However for longer exports, or in other batch cases, you may wish to trigger the export generation without blocking processing.
We can do this by setting wait_for_export_generation = False.

# SDK Version >= 0.1.7.6

export = job.generate_export(
    wait_for_export_generation = False)

time.sleep(2)

data = export.access_data(
    return_type = "url" )

Now it will return information regarding the Export object.

1808

JSON response from API call. In the SDK it returns an object.

We can now manually call export link to get the export.

📘

Returns immediately

Because this method returns before the export is complete it's possible access_data() will throw Exception: Export not ready yet. In this example we add a time.sleep(). Alternatively you could check the export status and download upon success.

Access existing exports

import time

# SDK Version >= 0.1.7.6
export_list = project.export.list()
export = export_list[0]
time.sleep(2)
data = export.access_data( format = "url")