Instance List Format Reference
When using Diffgram's Python SDK or API, you will have to upload instances following a format we've specified for the creation labels and attributes. This format is a JSON file (or python dictionary in case of the SDK).
File format for a label instance
A label instance is an object representing a label applied to a specific position in an image/video.
The following JSON shows all the possible fields you and add to a label instance, depending on the type you will use or not use some of them.
{
"name" : "Dog",
"number": 54,
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
"start_token": 0,
"end_token": 0,
"start_sentence": 0,
"front_face": { // For Cuboids only
"width": 0,
"height": 0,
"top_left": {"x":0, "y": 0},
"top_right": {"x":0, "y": 0},
"bot_left": {"x":0, "y": 0},
"bot_right": {"x":0, "y": 0}
},
"rear_face": { // For Cuboids only
"width": 0,
"height": 0,
"top_left": {"x":0, "y": 0},
"top_right": {"x":0, "y": 0},
"bot_left": {"x":0, "y": 0},
"bot_right": {"x":0, "y": 0}
},
"nodes": [{"x": 0, "y": 0, "id": "id_1"}, {"x": 2, "y": 2, "id": "id_2"}], // For keypoints instances
"edges": [{"from": "id_1", "to": "id_2"}], // For keypoints instances
"center_x": 0, // For ellipses (ellipse center point)
"center_y": 0, // For ellipses (ellipse center point)
"width": 0, // For ellipses & boxes
"height": 0, // For ellipses & boxes
"rotation": 0, // rotation of ellipse in radians
"p1": 0, // For cuadratic curve
"p2": 0, // For cuadratic curve
"cp": 0, // For cuadratic curve (control point)
"end_sentence": 0,
"center_3d": {
"x": 0.4629900947224047,
"y": 0.8248468727434732,
"z": 0.6090215354871086
},
"dimensions_3d": {
"depth": 0.2751620109855608,
"height": 0.30507509076935757,
"width": 0.2751620109855608
},
"position_3d": {
"x": 0.4629900947224047,
"y": 0.8248468727434732,
"z": 0.6090215354871086
},
"rotation_euler_angles": {
"x": 0,
"y": 0,
"z": 0
},
"start_char": 0,
"end_char": 0,
"model_id": 1, // To attach the instance to a model
"model_ref": 'myref', // To attach the instance to a model
"model_run_id": 1, // To attach the instance to a model run
"model_run_ref": 'myrunref', // To attach the instance to a model run
"y_min": 400,
"points": [{'x' : 450, 'y' : 350},// For points, lines & polygons
{'x' : 250, 'y' : 350},
{'x' : 100, 'y' : 350}] // type list
}
Key | Description | Example Values |
---|---|---|
name | The name of the label you're adding to the image. This name should be an Exact Match of an existing label in Diffgram. If you want you can provide the label_file_id instead of this key to give the exact id of the label. | String values: "Dog", "Cat", "Flower", "Airplane" |
label_file_id | The exact ID of the labe inside Diffgram. See how to get this id by reading here | Number: 57,25,966 |
number | This is the reference to the sequence number if the file is part of a video sequence. For images this is not necessary. Each number will be unique to the label you provide, read | The key is optional, remove it if not used or if using an image. Value is a number: i.e: 548, 587, 64 |
type | The instance type you want to use. Depending on the type of the instance you will need to provide more data for the label instance to be correctly created. | 'box', 'polygon', 'point', 'geo_point', 'geo_circle', 'geo_polyline', 'geo_polygon', 'geo_box', 'cuboid', 'tag', 'line', 'text_token', 'ellipse', 'curve', 'keypoints', 'cuboid_3d', 'global', 'relation', 'audio' |
x_max | The upper bound of the x coordinates for a bounding box. Only used when key "type" has the value "box" | Numbers: 635,857,10,50 |
x_min | The lower bound of the x coordinates for a bounding box. Only used when key "type" has the value "box" | Numbers: 635,857,10,50 |
y_max | The upper bound of the y coordinates for a bounding box. Only used when key "type" has the value "box" | Numbers: 635,857,10,50 |
y_min | The lower bound of the y coordinates for a bounding box. Only used when key "type" has the value "box" | Numbers: 635,857,10,50 |
points | An array definfing each of the points of a instance of type "line", "polygon", or "point". This key is an Array of objects. Each object with the following format: {"x": int, "y": int} | [{"x": 42, "y": 3}, {"x": 12, "y": 25}] |
start_token | The starting token index of the label (in the context of a text file) | Numbers, 1,4,5 |
end_token | The end token index of the label (in the context of a text file). If the label is just a single word you will see the same number as the start token here. | Numbers: 1,4,5 |
start_sentence | The starting sentence index of the label (in the context of a text file) | Numbers: 1,4,5 |
end_sentence | The end token index of the label (in the context of a text file). If the label is inside a single sentence you will see the same number as the start token here. | Numbers: 1,4,5 |
start_char | In the context of a single word text file label. This will indicate the starting character inside the word that was labeled | Numbers: 1,4,5 |
end_char | In the context of a single word text file label. This will indicate the ending character inside the word that was labeled | Numbers: 1,4,5 |
sentence | The current sentence number where the label occured. | Numbers: 1,4,5 |
front_face | An object containing the corners of the front face of a cuboid with it's width and height. | {"width": 0, "height": 0, "top_left" {"x":0, "y": 0}, "top_right" {"x":0, "y": 0}, "bot_left" {"x":0, "y": 0}, "bot_right" {"x":0, "y": 0}} |
rear_face | An object containing the corners of the rear face of a cuboid with it's width and height. | {"width": 0, "height": 0, "top_left" {"x":0, "y": 0}, "top_right" {"x":0, "y": 0}, "bot_left" {"x":0, "y": 0}, "bot_right" {"x":0, "y": 0}} |
center_x | X coordinate of the center of an ellipse | Numbers: 1,4,5 |
center_y | Y coordinate of the center of an ellipse | Numbers: 1,4,5 |
rotation | Rotation (in radians) of the ellipse | Float: 0.475 |
p1 | First point of a curadractic curve | {"x":0, "y": 0} |
p2 | Second point of a curadractic curve | {"x":0, "y": 0} |
cp | Control point of cuadratic curve | curadractic |
nodes | List of points in a key points instance. Each point has an x,y coordinate a color, and an occlussion flag. | [ { "x": this.mouse_position.x, "y": this.mouse_position.y, "id": uuidv4(), "occluded": undefined, "left_or_right": undefined, "name": undefined, "ordinal": undefined }; ] |
edges | Defines relations between key point instances. from contains the node ID from the nodes list where the relation starts and to contains the other end of the relation. | [ { "from": "my_id", "to": "my_other_id" } ] |
center_3d | The center of the 3D shape (Cuboid) | { "x": 0.4629900947224047, "y": 0.8248468727434732, "z": 0.6090215354871086 } |
dimensions_3d | The dimensions of the 3D shape (Cuboid) | { "depth": 0.2751620109855608, "height": 0.30507509076935757, "width": 0.2751620109855608 } |
rotation_euler_angles | Rotation of the 3D shape | { "x": 0, "y": 0, "z": 0 } |
position_3d | Position of the 3D Shape | { "x": 0, "y": 0, "z": 0 } |
Special considerations for some fields:
-
This is a special key that you can use to group together multiple labels. Each instance that has the same sequence number will be added as an element of the sequence.The "number" key (Or more specifically sequence number) in the file instance
For example:
If I have 2 labels called "Dog" and "Cat". And I have 2 sequence for each one:- Sequence A is tracking a bulldog dog.
- Sequence B is tracking a golden retriever dog.
- Sequence C is tracking a white cat.
- Sequence D is tracking a black cat.
From here you have 2 options:
- First Option
- You can create sequence numbers 1 and 2 for sequence A and B, related to the label dog
- You can create sequence numbers 1 and 2 for sequence C and D, related to the label cat.
It will look something like this
**( The following examples assume we are just using bounding box, if you use other instance types, you might require to add other keys):
// This will be attached to the Bulldog sequence: Sequence A
{
"name" : "dog",
"number": 1,
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
}
// This will be attached to the Golden Retriever sequence: Sequence B
{
"name" : "dog",
"number": 2,
"type": "box",
"x_max": 700,
"x_min": 485,
"y_max": 520,
"y_min": 750,
}
// This will be attached to the White Cat sequence: Sequence C
{
"name" : "cat",
"number": 1,
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
}
// This will be attached to the Black cat sequence: Sequence D
{
"name" : "cat",
"number": 2,
"type": "box",
"x_max": 700,
"x_min": 485,
"y_max": 520,
"y_min": 750,
}
- Second Option
- You can create sequence numbers 1 and 2 for sequence A and B, related to the label dog
- You can create sequence numbers 3 and 4 for sequence C and D, related to the label cat.
It will look something like this
// This will be attached to the Bulldog sequence: Sequence A
{
"name" : "dog",
"number": 1,
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
}
// This will be attached to the Golden Retriever sequence: Sequence B
{
"name" : "dog",
"number": 2,
"type": "box",
"x_max": 700,
"x_min": 485,
"y_max": 520,
"y_min": 750,
}
// This will be attached to the White Cat sequence: Sequence C
{
"name" : "cat",
"number": 3,
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
}
// This will be attached to the Black cat sequence: Sequence D
{
"name" : "cat",
"number": 4,
"type": "box",
"x_max": 700,
"x_min": 485,
"y_max": 520,
"y_min": 750,
}
The idea is that the sequence number is always related to a label, and is unique in the context of that label. So you can repeat sequence number as long as the label is different. If you add the same sequence number to the same label name (or ID), it will take it as part of the same sequence.
So if you give Diffgram something like this:
// This will be attached to the Bulldog sequence: Sequence A
{
"name" : "dog",
"number": 1,
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
}
// This will ALSO attached to the Bulldog sequence Sequence A!
// Maybe because the dog moved a little on another frame:
{
"name" : "dog",
"number": 1,
"type": "box",
"x_max": 700,
"x_min": 485,
"y_max": 520,
"y_min": 750,
}
You are attaching muliple instances to the same sequence, thus tracking the object across multiple frames in a video.
The "attributes" key
Attributes are special "extra" features that a label might have. You can read about them in the Attributes section.
See some examples here: Uploading Files With Attributes
Essentially you can delcare 4 types of attributes:
"attribute_groups_reference": [
// Multiple select attribute
{
"id": 2,
"kind": "multiple_select",
"is_root": true,
"name": "carwheeltag",
"prompt": "Please select all that apply",
"show_prompt": true
},
// Single select attribute
{
"id": 3,
"kind": "select",
"is_root": true,
"name": "selectwheel",
"prompt": "Please select just one special thing",
"show_prompt": true
},
// Free text attribute
{
"id": 4,
"kind": "text",
"is_root": true,
"name": "freewheel",
"prompt": "What are your thoughts on this label?",
"show_prompt": true
},
// Radio button attribute
{
"id": 5,
"kind": "radio",
"is_root": true,
"name": "clean",
"prompt": "Select something please",
"show_prompt": true
}
]
You can then reference the attributes inside your instance list. Here's an example:
{
"type": "box",
"x_min": 977,
"y_min": 549,
"x_max": 1152,
"y_max": 754,
"attribute_groups": {
"2": [
{
"display_name": "This is the selected option for the attribute group with ID:2",
"id": 7,
"name": 7
}
],
"3": {
"display_name": "This is the selected option for the attribute group with ID:3",
"id": 9,
"name": 9
},
"4": "This is the text of the free text question..",
"5": {
"display_name": "This is a radio button",
"id": 12,
"name": 12
}
},
"name": "my label"
}
You can red more about Attribute Types
Instance List Format
It's pretty uncommon to have just one instance on an image or video. So to group them together we use what we call an instance list. An instance list is simply a list of the instance objects described in the section above. An image can have an instance list, and a video can have one instance_list on each frame. Here's an example of an instance list
{
"instance_list": [
{
"name" : "Dog",
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
},
{
"name" : "Airplane",
"type": "polygon",
"points": [{'x' : 450, 'y' : 350},
{'x' : 250, 'y' : 350},
{'x' : 100, 'y' : 350}] # type list
}]
}
Frame packet map format
Now that you know the format of an instance and an instance list, you can understand what a frame packet map is. A frame packet map is used only for video files. This object relates an instance_list with a specific frame on a video. To explain the format let's say we have 3 instance lists in JSON like this:
// INSTANCE LIST A
[
{
"name" : "flower",
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
},
{
"name" : "cat",
"type": "polygon",
"points": [{'x' : 450, 'y' : 350},
{'x' : 250, 'y' : 350},
{'x' : 100, 'y' : 350}] # type list
}
]
// INSTANCE LIST B
[
{
"name" : "Dog",
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
},
{
"name" : "car",
"type": "polygon",
"points": [{'x' : 450, 'y' : 350},
{'x' : 250, 'y' : 350},
{'x' : 100, 'y' : 350}] # type list
}
]
// INSTANCE LIST C
[
{
"name" : "pedestrian",
"type": "box",
"x_max": 500,
"x_min": 400,
"y_max": 500,
"y_min": 400,
},
{
"name" : "lion",
"type": "polygon",
"points": [{'x' : 450, 'y' : 350},
{'x' : 250, 'y' : 350},
{'x' : 100, 'y' : 350}] # type list
}
]
All this instance_lists represents objects that will appear at different frames of the video. So now the only thing we have to do is indicate on which frame, each of the instance list should appear. Let's say for example that our video has 350 frames, and that
Instance list A is on frame 5, 7, and 86
Instance B is on frame 14, 57
Instance C is on frame 345
The frame packet we build to represent this relation is the following. Note that what we do is put the frame number as the key in the JSON and the instance list as the value.
// Frame packet map example
{
5: instance_list_a_json_object,
7: instance_list_a_json_object,
86: instance_list_a_json_object,
14: instance_list_b_json_object,
57: instance_list_b_json_object,
345: instance_list_c_json_object,
}
With this data structure you will be able to send video files to Diffgram using the SDK that already contain labeled data.
Updated over 1 year ago