Audio Annotation Guide

Guide to use audio annotation interface

This guide will help you upload Audio data to Diffgram and label the Audio. Finally we will generate an export JSON, one of many export options, for ingestion on any training model you have.

Pre-Requisites

  1. A Working Diffgram Installation (either with docker or directly on diffgram.com) Install
  2. An audio file, you can download a sample from here. So far Diffgram supports .mp3, .wav and .flac formats
  3. We assume you already have a project created. Project Concept 101

The Diffgram Python SDK is optional.

1. Uploading file to the Diffgram

The easiest way to start with Diffgram audio annotation is to upload audio files through the Diffgram UI.
To start importing data, click on the "Project" button on the main menu and find the "Import" button:

When you are on the Import page, click the "Start New Data Upload" button and follow the instructions (keep in mind that for now, we support only .mp3, .wav and .flac files for the audio annotation)

After a few seconds you will be able to see your files on the import page:

To start annotating, simply press "File ID" of the file you want to open.
For production use, see Tasks.

2. Overview of the interface

If you are already worked with Diffgram before, the audio annotation interface is similar to the rest of the interfaces. Easy!

If you are completely new to Diffgram, our screen is divided into 3 main parts:

  • Toolbar - panel with all the available tools
  • Sidebar - where you can see a list of the created instances
  • Annotation field - a place where your file is being displayed and you can annotate it

Toolbar
We have a pretty minimalistic toolbar, where you can perform the next operations: undo/redo, select label schema, select label, save status, move to previous and next files, see available hotkeys

Sidebar
The sidebar is the container where you can see all the created instances and can modify them. The instance list includes the next data:

  • Id (visible only for super admins) - unique database id of the instance
  • Type - the of the instance ("audio") with the corresponding color of the instance label
  • Name - a label name
  • Action - available action for the instances. So far there are two actions: "Change Label Template" and "Delete Instance"

Annotation field
On this part of the screen, you will be able to see uploaded audio(with zoom slider and audio play buttons) and all the instances you have created

3. Audio annotation

At this point, we assume you have the audio file uploaded and you are familiar with the UI of the audio interface, so we can jump on annotating our audio file.

To create an audio region annotation, double click to the time where annotation should start and you will see another border of the region is following your mouse. When it's on the right place - click one more time and you will see that annotation is created:

To play part that was annotated - just press region on the annotation field

To modify existing annotations, press on the border you need to move and drag it to the right place

4. Export The Data

To generate export files, click the "Project" button on the main menu, where you will see the "Export" option:

On the export page, select the dataset you want to export and press "Generate":

For other export concepts see Export

5. Future development and contribution

The Diffgram team is still working to deliver you the best audio annotation interface possible, but if you encounter any issues, you can always create an issue on github or shoot us a message on our slack.

Features in the pipeline:

  • Implement attributes for the labels
  • Implement user scripts
  • Support audio transcription
  • Render labels on top of the regions

Join our slack community and join the conversation! Community


Did this page help you?