What is Diffgram?
Complete training data platform for machine learning delivered as a single application.
Training Data Platform in One Application
Diffgram provides multiple training data tools in one single application.
Ingest
Store
Workflow
Annotation
Annotation Automation
Stream to Training
Explore
Debug
Secure and Private
Who is Diffgram for?
Project Admins, Data Engineers, Machine Learning Leaders, AI Experts, Software Engineers, Data Scientists, Data Annotators and Subject Matter Experts.
Why Diffgram?
Diffgram solves this by bringing all the functions of a complex toolchain directly into one application. Providing multiple training data tools in one single application.
Solution Example - How it fits in your stack
Here is an example with Sagemaker. Note you can substitute the buckets for GCP, Azure, or your on preferred storage. The "Brain" icon could be a data scientist with a local notebook.
Secure Data Labeling: Kubernetes Install or Use our Cloud
Diffgram installs on your secure environment. See Security Policies.
Context
In a world where algorithms are becoming shared knowledge, the training data is the primary differentiation. As projects evolve it becomes clear that iterating on data, creating multiple datasets, and refreshing data, are all keys to project success.
In this context it helps to think of a training data platform as something akin to {git, a coding editor, and a database}. What works for teams exploring and learning is different from what's needed for Proof of Concepts and Production systems.
Diffgram helps your team ship and maintain supervised deep learning systems faster, with better performance and less risk.
Pain Points
Blocked or Stalled AI projects
Is your project always "6 months away"? Are you seeing diminishing returns in your efforts?
Labeling annotation effort overrunning team
Is your team spending more time managing datasets and outsourcing operations then Data Science?
Unclear path to production performance
Do you have a great prototype and model serving infrastructure, but don't have a clear path to get the performance you need in production?
Vendor lock-in
Are you pinning your production system on an API call to a single, hard to replace, vendor? Or a single outsourced team?
Getting started
Do you have a new project and want to get started with this new paradigm from day 1?
Solutions
Shift your manual Data Science, Data Engineering, and MLOps concepts into Diffgram. This gives you new degrees of freedom to ship your system.
Gain Retraining - the big lever for performance improvement
Retrain faster by connecting the data flow from your application to annotators and back to your training system. Improve your frequency of retraining from months to weeks or even as fast as daily. Continuously improve your models.
Ship your project faster
Deploy early and often with human in the loop. Combine with retraining to creates a safety net to ship earlier.
Improve performance with the Data Stream Paradigm
Get your data up to closer to production distribution by turning data sets into data streams. Data streams are smaller sliding windows of data more relevant to the exact usage. Engineers integrate the data streams directly into their application. Data Science can experiment with many data stream configurations.
Control your Training Data
Frees up your Data Science teams to focus on the control of the project, and less on the administration details.
Reduce lock-in with multi-vendor approach
Host Diffgram on your cloud. Use a hybrid of your own team and outsourced staff and API calls. Work with many outsourced vendors from one secure environment.
Path to get started
Deep Learning always starts with Data. With Diffgram your team can get started with a well defined Data Pipeline and User Interface. Customize the components as your needs grow, and even define your own Interfaces.
Not on a Data Science team? Some general info:
Deep Learning represents a new way to think about software. It compliments traditional logic (ie If statements) with teachable logic. This fills a significant void and dramatically changes the types of problems that can be automated.
Training DataTraining data is data that's ready to be used by AI systems. It's created by combining raw data with human centered meaning. For example, combining an image with a box identifying an object. The encoded meaning can be relatively simple, for example a single bounding box, or complex, such as a time series video with a graph of attributes.
Updated over 2 years ago