Classic and New Supervised Machine Learning
Human Supervised Training Data is different from Classic Training Data. It’s new. It has different goals, involves different skill sets, and uses different algorithmic approaches.
Discovery vs Automation
In a Classical case we usually don’t know what the answer is and we wish to discover it.
In the new, Supervised case we already know what’s correct and we wish to structure this understanding so it can be repeated by an AI/ML program.
Examples of Classic cases:
- We don’t know what movie preferences someone has so we wish to discover them.
- We don’t know what is causing a weather pattern and we want to discover it.
Examples of New Supervised cases and areas:
- Understand or "read" a document and repeat common actions on it.
- Cashier-less Checkout
- Visual Sports Analysis
- Self Driving
- Computer Vision
- NLP, driven by humans adding annotations
- Time Series
Spreadsheets vs Unstructured Data
In the the Classic case, the spreadsheet, the data is already fixed. There is little to label. Where as in the New case the raw data doesn’t mean anything by itself. Humans must add labels to control the meaning.
For example here, the videos of the street don't mean anything without human labels, as shown below:
This means there are more degrees of freedom and more capabilities. In other words, in the classic context there is only indirect human control, whereas in the new context there is direct human control.
|Use case:||Recommender systems|
Visual Sports Analysis
|Human Team||Data Science Primary||End User, Labeler and Subject Matter Expert|
Data Science as Partner
Deeper Business and Normal Engineering engagement
|Feature Store||Feature Store Software||Supervised Training Data Catalog|
This fulfills a similar conceptual idea of organizing and searching existing work.
|Data Prep ETL||Classic ETL tools.||Supervised Training Data ETL|
|Data Formats||Tabular, logs, text, series||Videos, images, audio files, geo-spatial, point clouds, unstructured text.|
Generated by an existing separate system.
Generally can’t “change” the raw data, beyond “Cleaning” it.
Capture of raw data + human supervision
Can capture “new” data “after the fact”
Can generate novel data by adding novel Schema.
Best Practices with Architecture
It's important to have a different process for Supervised Training Data. The end users and implementation details are very different. Supervised must have it's own named processes to be successful.
Updated 4 months ago