Instance List Format Reference

When using Diffgram's Python SDK or API, you will have to upload instances following a format we've specified for the creation labels and attributes. This format is a JSON file (or python dictionary in case of the SDK).

File format for a label instance

A label instance is an object representing a label applied to a specific position in an image/video.
The following JSON shows all the possible fields you and add to a label instance, depending on the type you will use or not use some of them.

{
  "name" : "Dog",
  "number": 54,
  "type": "box",
  "x_max": 500,
  "x_min": 400,
  "y_max": 500,
  "y_min": 400,
  "start_token": 0,
  "end_token": 0,
  "start_sentence": 0,
  "front_face": { // For Cuboids only
    "width": 0,
    "height": 0,
    "top_left": {"x":0, "y": 0},
    "top_right": {"x":0, "y": 0},
    "bot_left": {"x":0, "y": 0},
    "bot_right": {"x":0, "y": 0}
  },
  "rear_face": { // For Cuboids only
    "width": 0,
    "height": 0,
    "top_left": {"x":0, "y": 0},
    "top_right": {"x":0, "y": 0},
    "bot_left": {"x":0, "y": 0},
    "bot_right": {"x":0, "y": 0}
  },
  "nodes": [{"x": 0, "y": 0, "id": "id_1"}, {"x": 2, "y": 2, "id": "id_2"}], // For keypoints instances
  "edges": [{"from": "id_1", "to": "id_2"}], // For keypoints instances
  "center_x": 0, // For ellipses (ellipse center point)
  "center_y": 0, // For ellipses (ellipse center point)
  "width": 0, // For ellipses & boxes
  "height": 0, // For ellipses & boxes 
  "rotation": 0, // rotation of ellipse in radians
  "p1": 0, // For cuadratic curve
  "p2": 0, // For cuadratic curve 
  "cp": 0, // For cuadratic curve (control point)
  "end_sentence": 0,
  "center_3d": {
      "x": 0.4629900947224047,
      "y": 0.8248468727434732,
      "z": 0.6090215354871086
  },
  "dimensions_3d": {
      "depth": 0.2751620109855608,
      "height": 0.30507509076935757,
      "width": 0.2751620109855608
  },
  "position_3d": {
      "x": 0.4629900947224047,
      "y": 0.8248468727434732,
      "z": 0.6090215354871086
  },
  "rotation_euler_angles": {
      "x": 0,
      "y": 0,
      "z": 0
  },
  "start_char": 0,
  "end_char": 0,
  "model_id": 1, // To attach the instance to a model
  "model_ref": 'myref', // To attach the instance to a model
  "model_run_id": 1, // To attach the instance to a model run
  "model_run_ref": 'myrunref', // To attach the instance to a model run
  "y_min": 400,
  "points": [{'x' : 450, 'y' : 350},// For points, lines & polygons
             {'x' : 250, 'y' : 350},
             {'x' : 100, 'y' : 350}]  // type list
}
KeyDescriptionExample Values
nameThe name of the label you're adding to the image. This name should be an Exact Match of an existing label in Diffgram.

If you want you can provide the label_file_id instead of this key to give the exact id of the label.
String values:

"Dog", "Cat", "Flower", "Airplane"
label_file_idThe exact ID of the labe inside Diffgram. See how to get this id by reading hereNumber:
57,25,966
numberThis is the reference to the sequence number if the file is part of a video sequence. For images this is not necessary.

Each number will be unique to the label you provide, read
The key is optional, remove it if not used or if using an image.

Value is a number: i.e: 548, 587, 64
typeThe instance type you want to use. Depending on the type of the instance you will need to provide more data for the label instance to be correctly created.'box', 'polygon', 'point', 'geo_point', 'geo_circle', 'geo_polyline', 'geo_polygon', 'geo_box', 'cuboid', 'tag', 'line', 'text_token', 'ellipse', 'curve', 'keypoints', 'cuboid_3d', 'global', 'relation', 'audio'
x_maxThe upper bound of the x coordinates for a bounding box.
Only used when key "type" has the value "box"
Numbers: 635,857,10,50
x_minThe lower bound of the x coordinates for a bounding box.
Only used when key "type" has the value "box"
Numbers: 635,857,10,50
y_maxThe upper bound of the y coordinates for a bounding box.
Only used when key "type" has the value "box"
Numbers: 635,857,10,50
y_minThe lower bound of the y coordinates for a bounding box.
Only used when key "type" has the value "box"
Numbers: 635,857,10,50
pointsAn array definfing each of the points of a instance of type "line", "polygon", or "point".

This key is an Array of objects. Each object with the following format:
{"x": int, "y": int}
[{"x": 42, "y": 3}, {"x": 12, "y": 25}]
start_tokenThe starting token index of the label (in the context of a text file)Numbers, 1,4,5
end_tokenThe end token index of the label (in the context of a text file). If the label is just a single word you will see the same number as the start token here.Numbers: 1,4,5
start_sentenceThe starting sentence index of the label (in the context of a text file)Numbers: 1,4,5
end_sentenceThe end token index of the label (in the context of a text file). If the label is inside a single sentence you will see the same number as the start token here.Numbers: 1,4,5
start_charIn the context of a single word text file label. This will indicate the starting character inside the word that was labeledNumbers: 1,4,5
end_charIn the context of a single word text file label. This will indicate the ending character inside the word that was labeledNumbers: 1,4,5
sentenceThe current sentence number where the label occured.Numbers: 1,4,5
front_faceAn object containing the corners of the front face of a cuboid with it's width and height.{"width": 0,
"height": 0,
"top_left" {"x":0, "y": 0},
"top_right" {"x":0, "y": 0},
"bot_left" {"x":0, "y": 0},
"bot_right" {"x":0, "y": 0}}
rear_faceAn object containing the corners of the rear face of a cuboid with it's width and height.{"width": 0,
"height": 0,
"top_left" {"x":0, "y": 0},
"top_right" {"x":0, "y": 0},
"bot_left" {"x":0, "y": 0},
"bot_right" {"x":0, "y": 0}}
center_xX coordinate of the center of an ellipseNumbers: 1,4,5
center_yY coordinate of the center of an ellipseNumbers: 1,4,5
rotationRotation (in radians) of the ellipseFloat: 0.475
p1First point of a curadractic curve{"x":0, "y": 0}
p2Second point of a curadractic curve{"x":0, "y": 0}
cpControl point of cuadratic curvecuradractic
nodesList of points in a key points instance. Each point has an x,y coordinate a color, and an occlussion flag.[
{
"x": this.mouse_position.x,
"y": this.mouse_position.y,
"id": uuidv4(),
"occluded": undefined,
"left_or_right": undefined,
"name": undefined,
"ordinal": undefined
};
]
edgesDefines relations between key point instances. from contains the node ID from the nodes list where the relation starts and to contains the other end of the relation.[
{
"from": "my_id",
"to": "my_other_id"
}

]
center_3dThe center of the 3D shape (Cuboid){
"x": 0.4629900947224047,
"y": 0.8248468727434732,
"z": 0.6090215354871086
}
dimensions_3dThe dimensions of the 3D shape (Cuboid){
"depth": 0.2751620109855608,
"height": 0.30507509076935757,
"width": 0.2751620109855608
}
rotation_euler_anglesRotation of the 3D shape{
"x": 0,
"y": 0,
"z": 0
}
position_3dPosition of the 3D Shape{
"x": 0,
"y": 0,
"z": 0
}

Special considerations for some fields:

  • The "number" key (Or more specifically sequence number) in the file instance

    This is a special key that you can use to group together multiple labels. Each instance that has the same sequence number will be added as an element of the sequence.
    For example:
    If I have 2 labels called "Dog" and "Cat". And I have 2 sequence for each one:
    • Sequence A is tracking a bulldog dog.
    • Sequence B is tracking a golden retriever dog.
    • Sequence C is tracking a white cat.
    • Sequence D is tracking a black cat.
      From here you have 2 options:
  • First Option
    • You can create sequence numbers 1 and 2 for sequence A and B, related to the label dog
    • You can create sequence numbers 1 and 2 for sequence C and D, related to the label cat.

It will look something like this

**( The following examples assume we are just using bounding box, if you use other instance types, you might require to add other keys):

// This will be attached to the Bulldog sequence: Sequence A
{
		"name" : "dog",
		"number": 1,
		"type": "box",
		"x_max": 500,
		"x_min": 400,
		"y_max": 500,
		"y_min": 400,
}
// This will be attached to the Golden Retriever sequence: Sequence B
{
		"name" : "dog",
		"number": 2,
		"type": "box",
		"x_max": 700,
		"x_min": 485,
		"y_max": 520,
		"y_min": 750,
}
    // This will be attached to the White Cat sequence: Sequence C
{
		"name" : "cat",
		"number": 1,
		"type": "box",
		"x_max": 500,
		"x_min": 400,
		"y_max": 500,
		"y_min": 400,
}
// This will be attached to the Black cat sequence: Sequence D
{
		"name" : "cat",
		"number": 2,
		"type": "box",
		"x_max": 700,
		"x_min": 485,
		"y_max": 520,
		"y_min": 750,
}
  • Second Option
    • You can create sequence numbers 1 and 2 for sequence A and B, related to the label dog
    • You can create sequence numbers 3 and 4 for sequence C and D, related to the label cat.
      It will look something like this
// This will be attached to the Bulldog sequence: Sequence A
{
		"name" : "dog",
		"number": 1,
		"type": "box",
		"x_max": 500,
		"x_min": 400,
		"y_max": 500,
		"y_min": 400,
}
// This will be attached to the Golden Retriever sequence: Sequence B
{
		"name" : "dog",
		"number": 2,
		"type": "box",
		"x_max": 700,
		"x_min": 485,
		"y_max": 520,
		"y_min": 750,
}
    // This will be attached to the White Cat sequence: Sequence C
{
		"name" : "cat",
		"number": 3,
		"type": "box",
		"x_max": 500,
		"x_min": 400,
		"y_max": 500,
		"y_min": 400,
}
// This will be attached to the Black cat sequence: Sequence D
{
		"name" : "cat",
		"number": 4,
		"type": "box",
		"x_max": 700,
		"x_min": 485,
		"y_max": 520,
		"y_min": 750,
}

The idea is that the sequence number is always related to a label, and is unique in the context of that label. So you can repeat sequence number as long as the label is different. If you add the same sequence number to the same label name (or ID), it will take it as part of the same sequence.

So if you give Diffgram something like this:

// This will be attached to the Bulldog sequence: Sequence A
{
		"name" : "dog",
		"number": 1,
		"type": "box",
		"x_max": 500,
		"x_min": 400,
		"y_max": 500,
		"y_min": 400,
}
// This will ALSO attached to the Bulldog sequence Sequence A! 
// Maybe because the dog moved a little on another frame: 
{
		"name" : "dog",
		"number": 1,
		"type": "box",
		"x_max": 700,
		"x_min": 485,
		"y_max": 520,
		"y_min": 750,
}

You are attaching muliple instances to the same sequence, thus tracking the object across multiple frames in a video.

The "attributes" key

Attributes are special "extra" features that a label might have. You can read about them in the Attributes section.

See some examples here: Uploading Files With Attributes

Essentially you can delcare 4 types of attributes:

"attribute_groups_reference": [
  	// Multiple select attribute
    {
      "id": 2,
      "kind": "multiple_select",
      "is_root": true,
      "name": "carwheeltag",
      "prompt": "Please select all that apply",
      "show_prompt": true
    },
  	// Single select attribute
    {
      "id": 3,
      "kind": "select",
      "is_root": true,
      "name": "selectwheel",
      "prompt": "Please select just one special thing",
      "show_prompt": true
    },
    // Free text attribute
    {
      "id": 4,
      "kind": "text",
      "is_root": true,
      "name": "freewheel",
      "prompt": "What are your thoughts on this label?",
      "show_prompt": true
    },
    // Radio button attribute
    {
      "id": 5,
      "kind": "radio",
      "is_root": true,
      "name": "clean",
      "prompt": "Select something please",
      "show_prompt": true
    }
  ]

You can then reference the attributes inside your instance list. Here's an example:

{
        "type": "box",
        "x_min": 977,
        "y_min": 549,
        "x_max": 1152,
        "y_max": 754,
        "attribute_groups": {
          "2": [
            {
              "display_name": "This is the selected option for the attribute group with ID:2",
              "id": 7,
              "name": 7
            }
          ],
          "3": {
            "display_name": "This is the selected option for the attribute group with ID:3",
            "id": 9,
            "name": 9
          },
          "4": "This is the text of the free text question..",
          "5": {
            "display_name": "This is a radio button",
            "id": 12,
            "name": 12
          }
        },
        "name": "my label"

      }

You can red more about Attribute Types

Instance List Format

It's pretty uncommon to have just one instance on an image or video. So to group them together we use what we call an instance list. An instance list is simply a list of the instance objects described in the section above. An image can have an instance list, and a video can have one instance_list on each frame. Here's an example of an instance list

{
"instance_list": [
{
		"name" : "Dog",
		"type": "box",
		"x_max": 500,
		"x_min": 400,
		"y_max": 500,
		"y_min": 400,
},
{
		"name" : "Airplane",
		"type": "polygon",
    "points": [{'x' : 450, 'y' : 350},
               {'x' : 250, 'y' : 350},
               {'x' : 100, 'y' : 350}]  # type list
}]
}

Frame packet map format

Now that you know the format of an instance and an instance list, you can understand what a frame packet map is. A frame packet map is used only for video files. This object relates an instance_list with a specific frame on a video. To explain the format let's say we have 3 instance lists in JSON like this:

// INSTANCE LIST A
[
  {
      "name" : "flower",
      "type": "box",
      "x_max": 500,
      "x_min": 400,
      "y_max": 500,
      "y_min": 400,
  },
  {
      "name" : "cat",
      "type": "polygon",
      "points": [{'x' : 450, 'y' : 350},
                 {'x' : 250, 'y' : 350},
                 {'x' : 100, 'y' : 350}]  # type list
  }
]
// INSTANCE LIST B
[
  {
      "name" : "Dog",
      "type": "box",
      "x_max": 500,
      "x_min": 400,
      "y_max": 500,
      "y_min": 400,
  },
  {
      "name" : "car",
      "type": "polygon",
      "points": [{'x' : 450, 'y' : 350},
                 {'x' : 250, 'y' : 350},
                 {'x' : 100, 'y' : 350}]  # type list
  }
]
// INSTANCE LIST C
[
  {
      "name" : "pedestrian",
      "type": "box",
      "x_max": 500,
      "x_min": 400,
      "y_max": 500,
      "y_min": 400,
  },
  {
      "name" : "lion",
      "type": "polygon",
      "points": [{'x' : 450, 'y' : 350},
                 {'x' : 250, 'y' : 350},
                 {'x' : 100, 'y' : 350}]  # type list
  }
]

All this instance_lists represents objects that will appear at different frames of the video. So now the only thing we have to do is indicate on which frame, each of the instance list should appear. Let's say for example that our video has 350 frames, and that
Instance list A is on frame 5, 7, and 86
Instance B is on frame 14, 57
Instance C is on frame 345

The frame packet we build to represent this relation is the following. Note that what we do is put the frame number as the key in the JSON and the instance list as the value.

// Frame packet map example
{
5: instance_list_a_json_object,
7: instance_list_a_json_object,
86: instance_list_a_json_object,
  
14: instance_list_b_json_object,
57: instance_list_b_json_object,
  
345: instance_list_c_json_object,
  
}

With this data structure you will be able to send video files to Diffgram using the SDK that already contain labeled data.