From URL is preferred method of input over from_local
From Python SDK
url, string, valid url to resource
media_type, string, in ["image", "video"], default "image"
instance_list, dictionary, the structure for an image, describing the labels that should be added with the file
frame_packet_map, dictionary, the structure for a video, describing the labels that should be added with the file.
result = project.file.from_url( signed_url) result = project.file.from_url( signed_url, media_type="video")
A common pattern, a short example is below, and a long form example is here.
job = project.job.new() for signed_url in signed_url_list: result = project.file.from_url( signed_url, job = job )
Returns Immediately, Long Running Background Operation
It returns the input_id and places the input in the queue.
Inputs are processed in the background by our distributed workers. Depending on factors such as type of file, account type, and system load, your file may be available in Diffgram almost immediately or may take several hours.
# pip install google-cloud-storage from google.cloud import storage import time import os from diffgram import Project project = Project( client_id = "", client_secret = "", project_string_id = "" ) """ Docs https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html?highlight=generate_signed_url#google.cloud.storage.blob.Blob.generate_signed_url To get the service account .json https://cloud.google.com/iam/docs/creating-managing-service-accounts Storage bucket is your storage bucket name and path is as shown in google, starting from bucket, as shown """ SERVICE_ACCOUNT = "\shared\helpers\your_account.json" CLOUD_STORAGE_BUCKET = "diffgram-sandbox" # Helper function def get_gcs_service_account(gcs): path = os.path.dirname(os.path.realpath(__file__)) + "/" + SERVICE_ACCOUNT return gcs.from_service_account_json(path) gcs = storage.Client() gcs = get_gcs_service_account(gcs) bucket = gcs.get_bucket(CLOUD_STORAGE_BUCKET) # Path to file in cloud storage path_list = [ "dog_ate_my_homework.mov", "nuclear_launch_codes.mov", "lions_tigers_bears_oh_my.mov"] blob_expiry = int(time.time() + (60 * 60 * 24 * 30)) for path in path_list: blob = bucket.blob("directory_example/" + path) # Generate signed url (string) using google sdk # this gives temporary access to the resource signed_url = blob.generate_signed_url(expiration=blob_expiry) # Let's see what it looks like print(signed_url) # Use the Diffgram from_url() method result = project.file.from_url( signed_url, media_type="video") print(result)
Smaller units of work improves annotation quality
A video file sent with a
video_split_duration will be parsed into separate video files based on the duration given.
For example, a 30 minute video with a duration of 30 seconds will be split into 60 videos.
- Valid range is: 2 and 180 seconds. Recommend 30 seconds or less.
- Limit of 100 videos created.
- Parsed videos inherit properties like job and directory from their parent.
# SDK >= 0.1.7.5 result = project.file.from_url( video_split_duration = 30 ) # Full result = project.file.from_url( signed_url, media_type="video", job = job, video_split_duration = 30 )
Note, the status for the parent video will show success upon splitting the videos into clips, and then each of the videos status can be tracked separately.
We will attempt to determine the type automatically in this order:
- By the URL. Where the filename starts from the last "/" and the extension is after the first "."
These are both valid:
- https: ... a/b/c/filename.extension?otherstuff...
- https: ... a/b/c/filename.extension
- Fall back to the metadata 'Content-Type' header.
ie the URL was https: ... a/b/c/filename?something (No
.after the last
/), so the extension is treated as None triggering the fallback. Example:
If a media type cannot be determined it will throw an error on the Input object.
Why is URL ahead of 'Content-Type'?
We have found surprisingly often that Content-Type is not set, or is set to a confusing value, such as
application/octet-stream when it cannot be determined (by the host, ie google).
By placing it first it "just works" even when Content-Type exists, but is invalid. And still allows expert users to purposely exclude an extension and set a Content-Type.