Compound Files Ingestion

A Guide For Uploading Compound Files In Diffgram.

Compound files are a concept in Diffgram where you can combine multiple media assets into a single File entity making it easier to reason a group your data and manage large annotation projects. Some examples of compound file use cases are.

  • Multiple Page Document Pictures
  • Multi Camera Annotation Scenes
  • Pictures that are related in some meaningful way to the business.

🚧

Compound Files Are Supported For Images Only

Later versions of Diffgram will add support for text, audio, video and combinations of those.

Uploading Compound Files Using The SDK

For uploading files with the SDK, you will need the Diffgram SDK version 0.10.2 or greater. We will start by uploading a 2 document page:

from diffgram import Project
from diffgram.file.compound_file import CompoundFile

project = Project(host = "https://diffgram.com",
          project_string_id = "replace_with_project_string",
          client_id = "replace_with_client_id",
          client_secret = "replace_with_client_secret")

parent = CompoundFile(
    project=project, 
    name='myFirstCompoundFile', 
    directory_id=project.default_directory.id
)

parent.add_child_from_local(path='path/to/your_file.jpg')
parent.add_child_from_local(path='path/to/your_second_file.jpg')

parent.upload()

Uploading Files With Connections to Storage Providers

You can upload files with connection and blob path using the add_child_from_blob_path method

# Replace the connection ID with the Diffgram Connection ID of your storage provider.
parent.add_child_from_blob_path(blob_path = 'my/path/to/blob', connection_id = 25)
parent.upload()

Upload

The add methods add it locally to the SDK compound file object.
Upload triggers the actual API transmission.
So, once you have added all the child files necessary, you can start the upload.

# Upload the files.
parent.upload()

The response is a python dict with the input id, batch id, and other metadata.

{
  "ann_is_complete": "None",
  "bucket_name": "None",
  "connection_id": "None",
  "count_instances_changed": "None",
  "created_time": "2023-01-24T18:31:34.129171",
  "hash": "38183bfec2fbed86968ac4d8f7eb305338f9c67318ff016811854463b15f10eb",
  "id": 2967,
  "input": {
    "archived": "False",
    "batch_id": "None",
    "created_time": "Tue, 24 Jan 2023 18:31:34 GMT",
    "description": "None",
    "directory": {
      "id": 106,
      "nickname": "Default"
    },
    "file_id": 2967,
    "id": 2227,
    "instance_list": "None",
    "media_type": "compound",
    "mode": "None",
    "newly_copied_file_id": "None",
    "original_filename": "myFirstCompoundFile",
    "parent_file_id": "None",
    "percent_complete": 100,
    "processing_deferred": "False",
    "raw_data_blob_path": "None",
    "retry_count": 0,
    "retry_log": "None",
    "source": "from_compound",
    "status": "success",
    "status_text": "None",
    "task_id": "None",
    "time_last_attempted": "None",
    "time_updated": "None",
    "total_time": "None",
    "update_log": {
      "error": {},
      "info": {},
      "success": "False"
    },
    "video_split_duration": 30,
    "video_was_split": "None"
  },
  "original_filename": "myFirstCompoundFile",
  "state": "added",
  "time_last_updated": "2023-01-24T18:31:34.149508",
  "type": "compound",
  "video_id": "None",
  "video_parent_file_id": "None"
}

Uploading Using Direct API Calls

For Direct API Calls, the process is divided in 2 steps.

  1. First create the parent file using the Compound File Create Endpoint
  2. After you have an ID for the compound file, upload your files with the usual API calls, but now providing the compound file ID on the parent_id parameter in any of the below endpoints.

🚧

Parent ID must be a compound file

Using File ID of a file that is not compound may cause unexpected behaviors. Please make sure to use the ID obtained as a result of calling the Compound File Create Endpoint