Development Cycles & Benefits

📘
This doc is older and may not align with our brand standards

The Cycle

Core system areas

Data pipeline
Human supervision for AI
Team communication and administration

Diffgram Benefits

Feature	Diffgram	No system	Improvement
Frequency of Retraining	Hourly, Daily, or Weekly	2-6+ months	>72x
Number of Datasets	100s, 1000s.	1-10	>100x
Near real time options	Yes	No	0 -> 1

Diffgram shifts these concepts to become functions of your system (and level of integration with Diffgram). Experiment with many datasets and combinations. Improve performance by fine tuning sets for each store/location/sub system.

The net effect is that it enables the Data Science team to scale deep learning products. A system can ship with the human fall back (near real time option). Then as new stores and hard label cases are found datasets are split and only the rare, hard cases worked on.

Your system can undergo less initial evaluation - worries about ongoing performance are reduced because there is a clear path to continued retraining.

Development & Operation Cycles

Lean MLOps

New Paradigm

Diffgram does not provide any model evaluation or deploy. This diagram is provided to speak to the overall mindset and mental concepts at play.

Area of Focus	New	Old	Trade offs
Model training & optimization	Tight connection to training data. Expectation that model will be regularly retrained Training data Input/Output is part of core application.	Manual data input/output Manual organization Training is "one off"	Pros Performance becomes primarily a function of training data Better performance Better path to fix data issues Cons Requires a system (like Diffgram). Requires more engineering effort in general
Model evaluation	Set conditions for valid automatic deploy (ie above a threshold) Sliding window approach on data for statistical evaluation methods, ie last 30, 60, or 180 days of data Strong human integration. In addition to statistical analysis stronger “one off” and reasonableness checks.	Manually assess validity, ie looking at an Area under a Curve chart or a Confusion Matrix	Pros Can maintain same rigor in terms of statistical checks, with tighter control on data relevancy. Supports high frequency retraining and deploy often Cons Requires more engineering effort in general
Model deployment	Deploy early in fail-safe way. ie Starting as internal recommendations only Deploy often with human in the loop Heuristics to check for correctness	Mental model of trying to get a desired level of accuracy prior to deploying.	Pros: Projects become unstuck, no longer trying to achieve "perfect" Projects more closely align with real world data Reduces costs by shipping sooner and improves performance Cons: Requires clear communication on expected failure modes Increases operational effort.

Streaming Data - No Static Datasets

In this new paradigm there are no real "datasets" in the sense that the data is only static for the exact moment of training and evaluation. Data becomes more like a dynamic "channel" in that we have a continuous stream of new training data and it's using a conditional slice of it. The initial training process is effectively just a larger slice of the ongoing stream.

Updated about 4 years ago

What’s Next

Comparison

📘This doc is older and may not align with our brand standards

The Cycle

Diffgram Benefits

Development & Operation Cycles

Lean MLOps

New Paradigm

Streaming Data - No Static Datasets

📘
This doc is older and may not align with our brand standards