Compound Files Ingestion

A Guide For Uploading Compound Files In Diffgram.

Compound files are a concept in Diffgram where you can combine multiple media assets into a single File entity making it easier to reason a group your data and manage large annotation projects. Some examples of compound file use cases are.

  • Multiple Page Document Pictures
  • Multi Camera Annotation Scenes
  • Pictures that are related in some meaningful way to the business.

Supported Child Files

Images are fully supported.
Version 1.18.0 Added Support for Text.
We plan to support all media types.

Uploading Compound Files Using The SDK

For uploading files with the SDK, you will need the Diffgram SDK version 0.10.2 or greater. We will start by uploading a 2 document page:

from diffgram import Project
from diffgram.file.compound_file import CompoundFile

project = Project(host = "https://diffgram.com",
		  project_string_id = "replace_with_project_string",
		  client_id = "replace_with_client_id",
		  client_secret = "replace_with_client_secret")

parent = CompoundFile(
    project=project, 
    name='myFirstCompoundFile', 
    directory_id=project.default_directory.id
)

parent.add_child_from_local(path='path/to/your_file.jpg')
parent.add_child_from_local(path='path/to/your_second_file.jpg')

parent.upload()

Uploading Files With Connections to Storage Providers

You can upload files with connection and blob path using the add_child_from_blob_path method

# Replace the connection ID with the Diffgram Connection ID of your storage provider.
parent.add_child_from_blob_path(blob_path = 'my/path/to/blob', bucket_name = "my_bucket", connection_id = 25, instance_list = [])
parent.upload()

Here the instance_list allows you to add labels and attributes during upload. For more information on that check the following guides:

Full example

from diffgram import Project
from diffgram.file.compound_file import CompoundFile

project = Project(host = "https://diffgram.com",
		  project_string_id = "replace_with_project_string",
		  client_id = "replace_with_client_id",
		  client_secret = "replace_with_client_secret")

parent = CompoundFile(
    project=project, 
    name='myFirstCompoundFile', 
    directory_id=project.default_directory.id
)

parent.add_child_from_blob_path(blob_path = 'my/path/to/blob', bucket_name = "my_bucket", connection_id = 25, instance_list = [])
parent.add_child_from_blob_path(blob_path = 'my/path/to/blob', bucket_name = "my_bucket", connection_id = 25, instance_list = [])

parent.upload()

Upload

The add methods add it locally to the SDK compound file object.
Upload triggers the actual API transmission.
So, once you have added all the child files necessary, you can start the upload.

# Upload the files.
parent.upload()

The response is a python dict with the input id, batch id, and other metadata.

{
  "ann_is_complete": "None",
  "bucket_name": "None",
  "connection_id": "None",
  "count_instances_changed": "None",
  "created_time": "2023-01-24T18:31:34.129171",
  "hash": "38183bfec2fbed86968ac4d8f7eb305338f9c67318ff016811854463b15f10eb",
  "id": 2967,
  "input": {
    "archived": "False",
    "batch_id": "None",
    "created_time": "Tue, 24 Jan 2023 18:31:34 GMT",
    "description": "None",
    "directory": {
      "id": 106,
      "nickname": "Default"
    },
    "file_id": 2967,
    "id": 2227,
    "instance_list": "None",
    "media_type": "compound",
    "mode": "None",
    "newly_copied_file_id": "None",
    "original_filename": "myFirstCompoundFile",
    "parent_file_id": "None",
    "percent_complete": 100,
    "processing_deferred": "False",
    "raw_data_blob_path": "None",
    "retry_count": 0,
    "retry_log": "None",
    "source": "from_compound",
    "status": "success",
    "status_text": "None",
    "task_id": "None",
    "time_last_attempted": "None",
    "time_updated": "None",
    "total_time": "None",
    "update_log": {
      "error": {},
      "info": {},
      "success": "False"
    },
    "video_split_duration": 30,
    "video_was_split": "None"
  },
  "original_filename": "myFirstCompoundFile",
  "state": "added",
  "time_last_updated": "2023-01-24T18:31:34.149508",
  "type": "compound",
  "video_id": "None",
  "video_parent_file_id": "None"
}

Uploading Using Direct API Calls

For Direct API Calls, the process is divided in 2 steps.

  1. First create the parent file using the Compound File Create Endpoint
  2. After you have an ID for the compound file, upload your files with the usual API calls, but now providing the compound file ID on the parent_id parameter in any of the below endpoints.

🚧

Parent ID must be a compound file

Using File ID of a file that is not compound may cause unexpected behaviors. Please make sure to use the ID obtained as a result of calling the Compound File Create Endpoint

Obtaining Annotations From Compound Files

Once you have annotated your compound files. You can extract it in 2 different ways:

  1. Via Exports
  2. Using The SDK data streamer.

Note: It's important to understand that for each compound file, you will have N+1 different ID's, where N is the number of children in the compound files. For example if you have a compound file with 2 images, you will have 3 IDs, one for the root compound file and 2 for each of the child image files.

You can see the file IDs by clicking on the input row in the "Import" section of your project.