Importing Instances Walkthrough

Support for existing instances, pre-labeling, production error reviewing.

Overview

Any process that can generate an Instance can be integrated with Diffgram.

Examples

  • Predictions from your deep learning algorithm, or other proprietary algorithm.
  • Use for pre-labeling, correcting production errors, real time labeling.
  • Migrate from another system

Block Diagram

889889

Process: How to get the data into Diffgram

  1. Map your Data to Diffgram format.
  2. Attach Data to File at Import
  3. Verify Data

1. Map your Data to Diffgram format.

Goal:
Map the data structure to the structure defined here.

Depending on your existing data structure this may be fairly straight forward, or may be a lot of work. Check out the example code for ideas on starting points.

The SDK will throw a variety of exceptions if the data is not in the right format, so don't worry about "breaking" it. You can keep the same string names as in your Project, and same sequence numbers you are using externally.

Let's try it out! :)

Key Concepts

  • We will build up from the Instance, to a List of Instances, to (in the case of Video) the Frame Packet Map
  • Labels default to map by string name to existing Diffgram labels.
  • The data may be Organized by all existing Diffgram concepts, such as Project and Directory

When to Map?

  • It's often best to map it at run time, instead of after the fact. Because data in memory is usually easier to work with.
  • If the data is already in another format, it will need to be converted to the format we describe below. E.g. Imagine a CSV with headers like [x_min, y_min, class]. If each row represents an Instance, it may require mapping each row to a dictionary object where E.g {x_min : row_value_list[0]}.

Image format overview

An image may have more then one Instance. Therefore we relate an image to an instance_list.

Video format overview

A video has many frames, and each frame is like an image. Each frame can have many instances.

889889

Instance Format

Here's an example of mocking a single Instance . See the full example here

Name, Label Name

Goal: Map your existing Label to match a Label Template Relationship as defined in your Diffgram Project.
AKA: {Class, target, y, id, name, label}

  • Exact match case: If you have dog (String) defined as a label in your data, and also define a Label dog in Diffgram then name : dog is correct.
  • Extra step case: If you define it as an internal id eg map = {1 : Dog}, then you need to map it back to the string name defined in Diffgram. Eg name : map[1].

Label Templates Must Be Defined First

  • Because the definition of Label Templates has implications in the rest of the project, undefined names will throw an error. eg. If no Label Template cat exists in the project it returns an error if an Instance with name : cat is attempted.
  • There is a new label method, so you could optionally use that to automatically create labels if desired or create the Label Templates in the UI.

Advanced:

  • label_file_id: A label_file_id may be provided, corresponding to the Diffgram label_file_id and it will supersede a string label name.
  • May define a generic "object" Label in Diffgram, and use this to populate only the "spatial" aspects. Then use Attributes to create further meaning.

📘

Exact match for string name in Diffgram

The name provided is expected to an exact match for an existing label in the Project.

Sequences (Video, Optional)

Sequence
Please read Video Sequences for more information on this concept.

The general idea of sequence is the following:

  • A sequence has multiple instances of the label at different frames of a video. For example if you are tracking a bird flying, you create a label called "bird", and then you create multiple instances of that label in the different frames of the video. What you did there was a sequence of the bird flying around.
  • A Label can have many sequences. For example, you can have 10 birds flying around and you want to track them all. Then you just create one 10 sequences for the label "bird"
    Goal: If you have existing Sequences, you may declare them here.
    AKA: {tracks, sub_sequence, external_id}
    Default: If a number is not provided it will default to 1.

To represents your id for that sequence, populate number. (Type int)

Advanced: A sequence_id (Diffgram) supersedes a number (External).

Box

{
        "name" : str,
        "type": "box",
        "number": sequence_number, # Video, Optional, Defaults to 1 if Not Provided.
        "x_max": int,
        "x_min": int,
        "y_max": int,
        "y_min": int
}

Polygon, Point, Line

point = {'x' : int, 'y' : int}
{
        "name" : str,
        "type": "polygon",  # or (point, line)
        "number": sequence_number, # Video, Optional, Defaults to 1 if Not Provided.
        "points": [point, point, point]  # type list
}

Example function to mock an Box instance

def mock_box_from_external_format(
            sequence_number: int = None,
            name : str = None):

    return {
        "name" : name,
        "number": sequence_number,
        "type": "box",
        "x_max": random.randint(500, 800),
        "x_min": random.randint(400, 499),
        "y_max": random.randint(500, 800),
        "y_min": random.randint(400, 499)
        }
  • It's assumed instances are machine_made unless flagged otherwise.
    An instance must implement all of the keys shown. Additional keys will be ignored.

Image format detail

instance = {}
instance_list = [instance, instance, instance]

Video format detail

A video may have instances in every frame, or only a handful of frames.
We need to relate each frame that has instances to it's relevant instance_list.

We expect a dictionary with a key value map of keyframe_number : instance_list
int : list

Frame Packet Map

frame_packet_map = {
   0 : instance_list,   # where 0 is an example of frame 0
   6 : instance_list,
   9 : instance_list
}

2. Attach Data to File at Import

📘

SDK Version >= 0.2.1

These examples assume a project object exists, but otherwise include sample data and mocking of data. (Sometimes based on past defined functions.) The "new" part is the single shown before each example.

Image attach example

instance_list = instance_list

def test_existing_instances_image(project):

    signed_url = "https://storage.googleapis.com/diffgram_public/example_data/000000001323.jpg"

    instance_list = []

    for i in range(3):
        instance_list.append(
           mock_box_from_external_format(name = "cat"))

    result = project.file.from_url(
        signed_url,
        media_type="image",
        instance_list=instance_list
    )

Video attach example

This can be passed to from_url, along with the media file
frame_packet_map = frame_packet_map

result = project.file.from_url(
        signed_url,
        media_type="video",
        frame_packet_map=frame_packet_map
    )
  • We assume there are no instances for a given frame if no key is provided for that frame

Below mock_frame_packet_map() is an example of creating the frame packet map that builds on the mock_box_from_external_format() example.

This can be a starting point for converting your data to the above format.

def mock_frame_packet_map(
            number_of_frames:int = None,
            number_of_sequences: int = 1):
        
    frame_packet_map = { }
    name = ["cat"]
  
    for i in range(number_of_frames):

        frame_packet_map[i] = []

        for j in range(1, number_of_sequences + 1):

            frame_packet_map[i].append(mock_box_from_external_format(
                sequence_number = j,
                name = name) )

    return frame_packet_map


def test_existing_video_instances(project):

    signed_url = "https://storage.googleapis.com/diffgram_public/example_data/challenge_videoTrim.mp4"

    frame_packet_map = mock_frame_packet_map(
            number_of_frames = 20,
            number_of_sequences = 1)

    result = project.file.from_url(
        signed_url,
        media_type="video",
        frame_packet_map=frame_packet_map
    )

See the full example here

3. Verifying the data in Diffgram

Input

Go to project Import to see the results and some forms of errors.

18501850

Example of Errors

This includes validation errors on Instances

11451145

Studio

Image verify

Note in this example, we were using the random function to generate instances so there is no expected correlation between the space and the cat pictured.

  • Verify Instances, ie count of instances is correct
21442144

Video verify

  • Verify sequences count is correct
  • Verify a few frames
23212321

Machine made flag

By default the machine_made flag is set and can be verified on the instance list.

644644