Understanding Diffgram's File Format

When using Diffgram's Python SDK or API, you will have to upload files following a format we've specified for the creation of images, videos, labels and attributes. This format is a JSON file (or python dictionary in case of the SDK).

File format for a label instance

A label instance is an object representing a label applied to a specific position in an image/video.
The following JSON shows all the possible fields you and add to a label instance, depending on the type you will use or not use some of them.

{
  "name" : "Dog",
  "number": 54,
  "type": "box",
  "x_max": 500,
  "x_min": 400,
  "y_max": 500,
  "y_min": 400,
  "start_token": 0,
  "end_token": 0,
  "start_sentence": 0,
  "front_face": { // For Cuboids only
    "width": 0,
    "height": 0,
    "top_left": {"x":0, "y": 0},
    "top_right": {"x":0, "y": 0},
    "bot_left": {"x":0, "y": 0},
    "bot_right": {"x":0, "y": 0}
  },
  "rear_face": { // For Cuboids only
    "width": 0,
    "height": 0,
    "top_left": {"x":0, "y": 0},
    "top_right": {"x":0, "y": 0},
    "bot_left": {"x":0, "y": 0},
    "bot_right": {"x":0, "y": 0}
  },
  "nodes": [{"x": 0, "y": 0, "id": "id_1"}, {"x": 2, "y": 2, "id": "id_2"}], // For keypoints instances
  "edges": [{"from": "id_1", "to": "id_2"}], // For keypoints instances
  "center_x": 0, // For ellipses (ellipse center point)
  "center_y": 0, // For ellipses (ellipse center point)
  "width": 0, // For ellipses & boxes
  "height": 0, // For ellipses & boxes 
  "rotation": 0, // rotation of ellipse in radians
  "p1": 0, // For cuadratic curve
  "p2": 0, // For cuadratic curve 
  "cp": 0, // For cuadratic curve (control point)
  "end_sentence": 0,
  "start_chart": 0,
  "end_char": 0,
  "model_id": 1, // To attach the instance to a model
  "model_ref": 'myref', // To attach the instance to a model
  "model_run_id": 1, // To attach the instance to a model run
  "model_run_ref": 'myrunref', // To attach the instance to a model run
  "y_min": 400,
  "points": [{'x' : 450, 'y' : 350},// For points, lines & polygons
             {'x' : 250, 'y' : 350},
             {'x' : 100, 'y' : 350}]  // type list
}

Key

Description

Example Values

name

The name of the label you're adding to the image. This name should be an Exact Match of an existing label in Diffgram.

If you want you can provide the label_file_id instead of this key to give the exact id of the label.

String values:

"Dog", "Cat", "Flower", "Airplane"

label_file_id

The exact ID of the labe inside Diffgram. See how to get this id by reading here

Number:
57,25,966

number

This is the reference to the sequence number if the file is part of a video sequence. For images this is not necessary.

Each number will be unique to the label you provide, read

The key is optional, remove it if not used or if using an image.

Value is a number: i.e: 548, 587, 64

type

The instance type you want to use. Depending on the type of the instance you will need to provide more data for the label instance to be correctly created.

"box", "polygon", "line", "point".

x_max

The upper bound of the x coordinates for a bounding box.
Only used when key "type" has the value "box"

Numbers: 635,857,10,50

x_min

The lower bound of the x coordinates for a bounding box.
Only used when key "type" has the value "box"

Numbers: 635,857,10,50

y_max

The upper bound of the y coordinates for a bounding box.
Only used when key "type" has the value "box"

Numbers: 635,857,10,50

y_min

The lower bound of the y coordinates for a bounding box.
Only used when key "type" has the value "box"

Numbers: 635,857,10,50

points

An array definfing each of the points of a instance of type "line", "polygon", or "point".

This key is an Array of objects. Each object with the following format:
{"x": int, "y": int}

[{"x": 42, "y": 3}, {"x": 12, "y": 25}]

start_token

The starting token index of the label (in the context of a text file)

Numbers, 1,4,5

end_token

The end token index of the label (in the context of a text file). If the label is just a single word you will see the same number as the start token here.

Numbers: 1,4,5

start_sentence

The starting sentence index of the label (in the context of a text file)

Numbers: 1,4,5

end_sentence

The end token index of the label (in the context of a text file). If the label is inside a single sentence you will see the same number as the start token here.

Numbers: 1,4,5

start_char

In the context of a single word text file label. This will indicate the starting character inside the word that was labeled

Numbers: 1,4,5

end_char

In the context of a single word text file label. This will indicate the ending character inside the word that was labeled

Numbers: 1,4,5

sentence

The current sentence number where the label occured.

Numbers: 1,4,5

front_face

An object containing the corners of the front face of a cuboid with it's width and height.

{"width": 0,
"height": 0,
"top_left" {"x":0, "y": 0},
"top_right" {"x":0, "y": 0},
"bot_left" {"x":0, "y": 0},
"bot_right" {"x":0, "y": 0}}

rear_face

An object containing the corners of the rear face of a cuboid with it's width and height.

{"width": 0,
"height": 0,
"top_left" {"x":0, "y": 0},
"top_right" {"x":0, "y": 0},
"bot_left" {"x":0, "y": 0},
"bot_right" {"x":0, "y": 0}}

center_x

X coordinate of the center of an ellipse

Numbers: 1,4,5

center_y

Y coordinate of the center of an ellipse

Numbers: 1,4,5

rotation

Rotation (in radians) of the ellipse

Float: 0.475

p1

First point of a curadractic curve

{"x":0, "y": 0}

p2

Second point of a curadractic curve

{"x":0, "y": 0}

cp

Control point of cuadratic curve

curadractic

nodes

List of points in a key points instance. Each point has an x,y coordinate a color, and an occlussion flag.

[
{
"x": this.mouse_position.x,
"y": this.mouse_position.y,
"id": uuidv4(),
"occluded": undefined,
"left_or_right": undefined,
"name": undefined,
"ordinal": undefined
};
]

edges

Defines relations between key point instances. from contains the node ID from the nodes list where the relation starts and to contains the other end of the relation.

[
{
"from": "my_id",
"to": "my_other_id"
}

]

Special considerations for some fields:

  • The "number" key (Or more specifically sequence number) in the file instance

    This is a special key that you can use to group together multiple labels. Each instance that has the same sequence number will be added as an element of the sequence.
    For example:
    If I have 2 labels called "Dog" and "Cat". And I have 2 sequence for each one:
    • Sequence A is tracking a bulldog dog.
    • Sequence B is tracking a golden retriever dog.
    • Sequence C is tracking a white cat.
    • Sequence D is tracking a black cat.
      From here you have 2 options:
  • First Option
    • You can create sequence numbers 1 and 2 for sequence A and B, related to the label dog
    • You can create sequence numbers 1 and 2 for sequence C and D, related to the label cat.

It will look something like this

**( The following examples assume we are just using bounding box, if you use other instance types, you might require to add other keys):

// This will be attached to the Bulldog sequence: Sequence A
{
        "name" : "dog",
        "number": 1,
        "type": "box",
        "x_max": 500,
        "x_min": 400,
        "y_max": 500,
        "y_min": 400,
}
// This will be attached to the Golden Retriever sequence: Sequence B
{
        "name" : "dog",
        "number": 2,
        "type": "box",
        "x_max": 700,
        "x_min": 485,
        "y_max": 520,
        "y_min": 750,
}
    // This will be attached to the White Cat sequence: Sequence C
{
        "name" : "cat",
        "number": 1,
        "type": "box",
        "x_max": 500,
        "x_min": 400,
        "y_max": 500,
        "y_min": 400,
}
// This will be attached to the Black cat sequence: Sequence D
{
        "name" : "cat",
        "number": 2,
        "type": "box",
        "x_max": 700,
        "x_min": 485,
        "y_max": 520,
        "y_min": 750,
}
  • Second Option
    • You can create sequence numbers 1 and 2 for sequence A and B, related to the label dog
    • You can create sequence numbers 3 and 4 for sequence C and D, related to the label cat.
      It will look something like this
// This will be attached to the Bulldog sequence: Sequence A
{
        "name" : "dog",
        "number": 1,
        "type": "box",
        "x_max": 500,
        "x_min": 400,
        "y_max": 500,
        "y_min": 400,
}
// This will be attached to the Golden Retriever sequence: Sequence B
{
        "name" : "dog",
        "number": 2,
        "type": "box",
        "x_max": 700,
        "x_min": 485,
        "y_max": 520,
        "y_min": 750,
}
    // This will be attached to the White Cat sequence: Sequence C
{
        "name" : "cat",
        "number": 3,
        "type": "box",
        "x_max": 500,
        "x_min": 400,
        "y_max": 500,
        "y_min": 400,
}
// This will be attached to the Black cat sequence: Sequence D
{
        "name" : "cat",
        "number": 4,
        "type": "box",
        "x_max": 700,
        "x_min": 485,
        "y_max": 520,
        "y_min": 750,
}

The idea is that the sequence number is always related to a label, and is unique in the context of that label. So you can repeat sequence number as long as the label is different. If you add the same sequence number to the same label name (or ID), it will take it as part of the same sequence.

So if you give Diffgram something like this:

// This will be attached to the Bulldog sequence: Sequence A
{
        "name" : "dog",
        "number": 1,
        "type": "box",
        "x_max": 500,
        "x_min": 400,
        "y_max": 500,
        "y_min": 400,
}
// This will ALSO attached to the Bulldog sequence Sequence A! 
// Maybe because the dog moved a little on another frame: 
{
        "name" : "dog",
        "number": 1,
        "type": "box",
        "x_max": 700,
        "x_min": 485,
        "y_max": 520,
        "y_min": 750,
}

You are attaching muliple instances to the same sequence, thus tracking the object across multiple frames in a video.

* The "attributes" key

Attributes are special "extra" features that a label might have. You can read about them in the Attributes section.

Essentially you can delcare 4 types of attributes:

"attribute_groups_reference": [
    // Multiple select attribute
    {
      "id": 2,
      "kind": "multiple_select",
      "is_root": true,
      "name": "carwheeltag",
      "prompt": "Please select all that apply",
      "show_prompt": true
    },
    // Single select attribute
    {
      "id": 3,
      "kind": "select",
      "is_root": true,
      "name": "selectwheel",
      "prompt": "Please select just one special thing",
      "show_prompt": true
    },
    // Free text attribute
    {
      "id": 4,
      "kind": "text",
      "is_root": true,
      "name": "freewheel",
      "prompt": "What are your thoughts on this label?",
      "show_prompt": true
    },
    // Radio button attribute
    {
      "id": 5,
      "kind": "radio",
      "is_root": true,
      "name": "clean",
      "prompt": "Select something please",
      "show_prompt": true
    }
  ]

You can then reference the attributes inside your instance list. Here's an example:

{
        "type": "box",
        "x_min": 977,
        "y_min": 549,
        "x_max": 1152,
        "y_max": 754,
        "attribute_groups": {
          "2": [
            {
              "display_name": "This is the selected option for the attribute group with ID:2",
              "id": 7,
              "name": 7
            }
          ],
          "3": {
            "display_name": "This is the selected option for the attribute group with ID:3",
            "id": 9,
            "name": 9
          },
          "4": "This is the text of the free text question..",
          "5": {
            "display_name": "This is a radio button",
            "id": 12,
            "name": 12
          }
        },
        "name": "my label"

      }

You can red more about Attribute Types

Instance List Format

It's pretty uncommon to have just one instance on an image or video. So to group them together we use what we call an instance list. An instance list is simply a list of the instance objects described in the section above. An image can have an instance list, and a video can have one instance_list on each frame. Here's an example of an instance list

{
"instance_list": [
{
        "name" : "Dog",
        "type": "box",
        "x_max": 500,
        "x_min": 400,
        "y_max": 500,
        "y_min": 400,
},
{
        "name" : "Airplane",
        "type": "polygon",
    "points": [{'x' : 450, 'y' : 350},
               {'x' : 250, 'y' : 350},
               {'x' : 100, 'y' : 350}]  # type list
}]
}

Frame packet map format

Now that you know the format of an instance and an instance list, you can understand what a frame packet map is. A frame packet map is used only for video files. This object relates an instance_list with a specific frame on a video. To explain the format let's say we have 3 instance lists in JSON like this:

// INSTANCE LIST A
[
  {
      "name" : "flower",
      "type": "box",
      "x_max": 500,
      "x_min": 400,
      "y_max": 500,
      "y_min": 400,
  },
  {
      "name" : "cat",
      "type": "polygon",
      "points": [{'x' : 450, 'y' : 350},
                 {'x' : 250, 'y' : 350},
                 {'x' : 100, 'y' : 350}]  # type list
  }
]
// INSTANCE LIST B
[
  {
      "name" : "Dog",
      "type": "box",
      "x_max": 500,
      "x_min": 400,
      "y_max": 500,
      "y_min": 400,
  },
  {
      "name" : "car",
      "type": "polygon",
      "points": [{'x' : 450, 'y' : 350},
                 {'x' : 250, 'y' : 350},
                 {'x' : 100, 'y' : 350}]  # type list
  }
]
// INSTANCE LIST C
[
  {
      "name" : "pedestrian",
      "type": "box",
      "x_max": 500,
      "x_min": 400,
      "y_max": 500,
      "y_min": 400,
  },
  {
      "name" : "lion",
      "type": "polygon",
      "points": [{'x' : 450, 'y' : 350},
                 {'x' : 250, 'y' : 350},
                 {'x' : 100, 'y' : 350}]  # type list
  }
]

All this instance_lists represents objects that will appear at different frames of the video. So now the only thing we have to do is indicate on which frame, each of the instance list should appear. Let's say for example that our video has 350 frames, and that
Instance list A is on frame 5, 7, and 86
Instance B is on frame 14, 57
Instance C is on frame 345

The frame packet we build to represent this relation is the following. Note that what we do is put the frame number as the key in the JSON and the instance list as the value.

// Frame packet map example
{
5: instance_list_a_json_object,
7: instance_list_a_json_object,
86: instance_list_a_json_object,
  
14: instance_list_b_json_object,
57: instance_list_b_json_object,
  
345: instance_list_c_json_object,
  
}

With this data structure you will be able to send video files to Diffgram using the SDK that already contain labeled data.


Did this page help you?