Working on Production Deep Learning?
Create the highest quality training data with Diffgram


91% Take Weeks & Months to get to a "Seed" Dataset
And most take months to go from seed to Beta.
August 2020 Survey
Virtually everyone plans to iterate
Yet only 14% have dedicated solutions
Introducing Diffgram - Patent Pending Database & Control System
What if your team could get the first version ready in 1/10 the time? And use that same process for Production Systems?
Diffgram does this by making the process "non-blocking". Patent Pending. 3-4x ROI on staff time.*
*Based on internal study vs without Diffgram.
Where does it sit in my stack?
Diffgram is software in-between raw input data and Deep Learning algorithms. Diffgram covers all 5 major sub categories.
What is the value?
- Immediate Value
- Ongoing Value
- Long Term Value
Turns manual processes on the order of weeks into a few days
In comparison to blocking manual processes that take on the order of weeks to complete, with Diffgram your data science team can start working on algorithms as soon as a "Seed" set is completed. This turns a multi-week startup process into days . This benefit is for every single dataset!
Literal Annotation Interface
Diffgram provides a best in class concrete interface to view, curate, quality assurance, and do original annotation work.
Data pipelines
These graphs are dynamically created based on simple user inputs. The core principle is the ability to "watch" a dataset. Task templates are created by your team. The Diffgram task system manages the rest.
Diffgram includes both the literal annotation interface and pipelines.
1) Data Pipelines
Super easy UI based setup. (API & SDK too)


2) World Class Training Data for Vision and NLP
Customizable tooling supports spatial types including Box, Polygon, Lines and more. Attribute support for groups of nested labels, free text, and multiple select. Video up to 60 FPS and 4k.
Diffgram partners with leading providers for NLP interfaces. Use the power of Diffgram for your data and state of the art NLP interfaces.
One click Cloud Integrations
Compatible with your cloud data in AWS, Google, Azure, or Private cloud. Use frameworks like Tensorflow, Pytorch or commercial AutoML services. Connect your models for Pre Labeling.
Update your data - Stop wasting datasets!
Imagine for example you have class "sign" and it's performing at say ~80% average precision.
How do you improve it?
One approach is to break it into multiple sub classes eg
sign_yield
sign_highway
sign_warning
etc.
Then you can run the data and determine
sign_yield -> 80%
sign_highway -> 50% ouch
sign_warning -> 70%
And repeat
sign_highway_small
sign_highway_rare
sign_highway_n
But how do you actually do that in code? With Training Data? In Production?
A mature product with an AMAZING Roadmap
With over 6,500 commits to the core code base, over 30 million of annotations done, and thousands of projects, Diffgram is the most mature, modern system available.
Our plan of Intent in 2021 and 2022 is to:
- The easiest and most powerful data system including integration, pipelines, and tasks
- Be further integrated with other providers. Diffgram is a one stop shop for your data.
- The most powerful video labeling studio
Help shape our plan of Intent!
Understanding the Total Cost of Ownership
Executive Summary
It's commonly assumed that the literal annotation is the biggest cost center. In fact there are many costs involved, such as Administering Datasets, Data Prep, Curation, Set Iteration, and more, that cumulatively far exceed annotation. Annotation, is in fact one of the ways value is added to the system.
Read more on Total Cost of Ownership
Compatibility and Common Questions
Yes - you can run the software anywhere you desire - just like a database
- This is "Core" to us, so we want to build it in house no matter the cost
- We handle {insert large number} of records - how do we know it can scale
- We can build this ourselves in {6, 12, ...n} months
Updated 10 months ago