Vertex AI: Train model
Train on Vertex AI platform with your Diffgram data
Overview
Vertex AI is a managed machine learning platform that provides you with all of Google’s cloud services in one place to deploy and maintain AI models.
Conceptually the idea is:
- Integrated. Saves having to write custom scripts to move and transform data from Diffgram
- Event driven. Run based on completion of other concepts like pre-labeling, human tasks, ingestion, etc.
While in theory VertexAI handles the "hard" part, we have found that actually getting data to it in the right format can be surprisingly difficult. Diffgram handles all of this difficulty for you including:
- Migrate data from any cloud provider to a GCP bucket
- Create .jsonl file with annotations to import to Vertex AI
- Create Vertex AI dataset and import files and annotations
- Trigger model training
Diffgram is open source. If you want to contribute more Vertex AI tools as Diffgram Actions feel free to add a PR in our github repository and follow the Actions Dev guide:
End result:
A trained model in Vertex AI platform.
Prerequisites
- A Working Diffgram Installation (either with docker or directly on diffgram.com)
- A labeled image dataset on Diffgram
- Google cloud account with billing enabled
Google Billing
Be aware that any charges for model training are charged directly by google to you in your google account. Diffgram is handling the integrations only.
1. Creating connection
Vertex AI requires you to store all your dataset on cloud storage. Since on diffgram we support various static storages the first step to is to create an GCP connector: open Project context menu and then click Connections
Click + to create a new connection, and select "Vertex AI" from the list:
Fill all the required fields, and press Test to verify your connection is working file:
2. Setup the workflow
Open the Project menu one more time and click on the New Workflow:
To train you Vertex AI model, you should have annotated dataset. For the demo purpose, a Default will be used, but you can use any diffgram dataset you want.
- Model name - name of the model that will be used to on Vertex AI
- Bucket name - name of the bucket that will be used to store your dataset. Note that all the files from bucket will used for model training, so we recommend to create a new empty bucket
- Connection - select Vertex AI connection that was created on the step 1
- Training node hours - default is set to 20
- Select model type - default is set to MOBILE_TF_VERSATILE_1
Node Hours Billing
Google may bill a lot for a seemingly small number of node hours. Be aware of this value. We suggest trying lower values to start.
Activate the workflow and click Train. If you refresh page, you should be able to see a new action run in the list with the status "Running"
After action is done, action status should be changed from "Running" to "Finished":
Context
You can connect this action to other steps, like the completion of human tasks.
Inspecting Results
Navigate to vertex AI to view the results. You can deploy and use the model as you see fit.
Future Work
We are exploring various model prediction methods based on these models. Join the conversation on slack or github.
Updated over 2 years ago