Importing Instances Walkthrough
Support for existing instances, pre-labeling, production error reviewing.
Overview
Any process that can generate an Instance can be integrated with Diffgram.
Examples
- Predictions from your deep learning algorithm, or other proprietary algorithm.
- Use for pre-labeling, correcting production errors, real time labeling.
- Migrate from another system
Block Diagram
Process: How to get the data into Diffgram
- Map your Data to Diffgram format.
- Attach Data to File at Import
- Verify Data
1. Map your Data to Diffgram format.
Goal:
Map the data structure to the structure defined here.
Depending on your existing data structure this may be fairly straight forward, or may be a lot of work. Check out the example code for ideas on starting points.
The SDK will throw a variety of exceptions if the data is not in the right format, so don't worry about "breaking" it. You can keep the same string names as in your Project, and same sequence numbers you are using externally.
Let's try it out! :)
Key Concepts
- We will build up from the Instance, to a List of Instances, to (in the case of Video) the Frame Packet Map
- Labels default to map by string name to existing Diffgram labels.
- The data may be Organized by all existing Diffgram concepts, such as Project and Directory
When to Map?
- It's often best to map it at run time, instead of after the fact. Because data in memory is usually easier to work with.
- If the data is already in another format, it will need to be converted to the format we describe below. E.g. Imagine a CSV with headers like
[x_min, y_min, class]
. If each row represents an Instance, it may require mapping each row to a dictionary object where E.g{x_min : row_value_list[0]}
.
Image format overview
An image may have more then one Instance. Therefore we relate an image to an instance_list
.
Video format overview
A video has many frames, and each frame is like an image. Each frame can have many instances.
Instance Format
Here's an example of mocking a single Instance . See the full example here
Name, Label Name
Goal: Map your existing Label to match a Label Template Relationship as defined in your Diffgram Project.
AKA: {Class, target, y, id, name, label}
- Exact match case: If you have
dog
(String) defined as a label in your data, and also define a Labeldog
in Diffgram thenname : dog
is correct. - Extra step case: If you define it as an internal id eg
map = {1 : Dog}
, then you need to map it back to the string name defined in Diffgram. Egname : map[1]
.
Label Templates Must Be Defined First
- Because the definition of Label Templates has implications in the rest of the project,
undefined
names will throw an error. eg. If no Label Templatecat
exists in the project it returns an error if an Instance withname : cat
is attempted. - There is a new label method, so you could optionally use that to automatically create labels if desired or create the Label Templates in the UI.
Advanced:
- label_file_id: A
label_file_id
may be provided, corresponding to the Diffgramlabel_file_id
and it will supersede a string labelname
. - May define a generic "object" Label in Diffgram, and use this to populate only the "spatial" aspects. Then use Attributes to create further meaning.
Exact match for string name in Diffgram
The
name
provided is expected to an exact match for an existing label in the Project.
Sequences (Video, Optional)
Sequence
Please read Video Sequences for more information on this concept.
The general idea of sequence is the following:
- A sequence has multiple instances of the label at different frames of a video. For example if you are tracking a bird flying, you create a label called "bird", and then you create multiple instances of that label in the different frames of the video. What you did there was a sequence of the bird flying around.
- A Label can have many sequences. For example, you can have 10 birds flying around and you want to track them all. Then you just create one 10 sequences for the label "bird"
Goal: If you have existing Sequences, you may declare them here.
AKA: {tracks, sub_sequence, external_id}
Default: If anumber
is not provided it will default to1
.
To represents your id for that sequence, populate number
. (Type int
)
Advanced: A sequence_id
(Diffgram) supersedes a number
(External).
Box
{
"name" : str,
"type": "box",
"number": sequence_number, # Video, Optional, Defaults to 1 if Not Provided.
"x_max": int,
"x_min": int,
"y_max": int,
"y_min": int
}
Polygon, Point, Line
point = {'x' : int, 'y' : int}
{
"name" : str,
"type": "polygon", # or (point, line)
"number": sequence_number, # Video, Optional, Defaults to 1 if Not Provided.
"points": [point, point, point] # type list
}
Example function to mock an Box instance
def mock_box_from_external_format(
sequence_number: int = None,
name : str = None):
return {
"name" : name,
"number": sequence_number,
"type": "box",
"x_max": random.randint(500, 800),
"x_min": random.randint(400, 499),
"y_max": random.randint(500, 800),
"y_min": random.randint(400, 499)
}
- It's assumed instances are
machine_made
unless flagged otherwise.
An instance must implement all of the keys shown. Additional keys will be ignored.
Image format detail
instance = {}
instance_list = [instance, instance, instance]
Video format detail
A video may have instances in every frame, or only a handful of frames.
We need to relate each frame
that has instances to it's relevant instance_list
.
We expect a dictionary with a key value map of keyframe_number : instance_list
int : list
Frame Packet Map
frame_packet_map = {
0 : instance_list, # where 0 is an example of frame 0
6 : instance_list,
9 : instance_list
}
2. Attach Data to File at Import
SDK Version >= 0.2.1
These examples assume a project object exists, but otherwise include sample data and mocking of data. (Sometimes based on past defined functions.) The "new" part is the single shown before each example.
Image attach example
instance_list = instance_list
def test_existing_instances_image(project):
signed_url = "https://storage.googleapis.com/diffgram_public/example_data/000000001323.jpg"
instance_list = []
for i in range(3):
instance_list.append(
mock_box_from_external_format(name = "cat"))
result = project.file.from_url(
signed_url,
media_type="image",
instance_list=instance_list
)
Video attach example
This can be passed to from_url, along with the media file
frame_packet_map = frame_packet_map
result = project.file.from_url(
signed_url,
media_type="video",
frame_packet_map=frame_packet_map
)
- We assume there are no instances for a given frame if no key is provided for that frame
Below mock_frame_packet_map()
is an example of creating the frame packet map that builds on the mock_box_from_external_format()
example.
This can be a starting point for converting your data to the above format.
def mock_frame_packet_map(
number_of_frames:int = None,
number_of_sequences: int = 1):
frame_packet_map = { }
name = ["cat"]
for i in range(number_of_frames):
frame_packet_map[i] = []
for j in range(1, number_of_sequences + 1):
frame_packet_map[i].append(mock_box_from_external_format(
sequence_number = j,
name = name) )
return frame_packet_map
def test_existing_video_instances(project):
signed_url = "https://storage.googleapis.com/diffgram_public/example_data/challenge_videoTrim.mp4"
frame_packet_map = mock_frame_packet_map(
number_of_frames = 20,
number_of_sequences = 1)
result = project.file.from_url(
signed_url,
media_type="video",
frame_packet_map=frame_packet_map
)
3. Verifying the data in Diffgram
Input
Go to project Import to see the results and some forms of errors.
Example of Errors
This includes validation errors on Instances
Studio
Image verify
Note in this example, we were using the random
function to generate instances so there is no expected correlation between the space and the cat pictured.
- Verify Instances, ie count of instances is correct
Video verify
- Verify sequences count is correct
- Verify a few frames
Machine made flag
By default the machine_made
flag is set and can be verified on the instance list.
Updated about 4 years ago