Vertex AI: Train model

Train on Vertex AI platform with your Diffgram data

Overview

Vertex AI is a managed machine learning platform that provides you with all of Google’s cloud services in one place to deploy and maintain AI models.

Conceptually the idea is:

  • Integrated. Saves having to write custom scripts to move and transform data from Diffgram
  • Event driven. Run based on completion of other concepts like pre-labeling, human tasks, ingestion, etc.

While in theory VertexAI handles the "hard" part, we have found that actually getting data to it in the right format can be surprisingly difficult. Diffgram handles all of this difficulty for you including:

  • Migrate data from any cloud provider to a GCP bucket
  • Create .jsonl file with annotations to import to Vertex AI
  • Create Vertex AI dataset and import files and annotations
  • Trigger model training

View backend code for action.

Diffgram is open source. If you want to contribute more Vertex AI tools as Diffgram Actions feel free to add a PR in our github repository and follow the Actions Dev guide:

End result:

A trained model in Vertex AI platform.

3024

Prerequisites

  1. A Working Diffgram Installation (either with docker or directly on diffgram.com)
  2. A labeled image dataset on Diffgram
  3. Google cloud account with billing enabled

🚧

Google Billing

Be aware that any charges for model training are charged directly by google to you in your google account. Diffgram is handling the integrations only.

1. Creating connection

Vertex AI requires you to store all your dataset on cloud storage. Since on diffgram we support various static storages the first step to is to create an GCP connector: open Project context menu and then click Connections

3024

Click + to create a new connection, and select "Vertex AI" from the list:

3024

Fill all the required fields, and press Test to verify your connection is working file:

2419

2. Setup the workflow

Open the Project menu one more time and click on the New Workflow:

To train you Vertex AI model, you should have annotated dataset. For the demo purpose, a Default will be used, but you can use any diffgram dataset you want.

3024
  • Model name - name of the model that will be used to on Vertex AI
  • Bucket name - name of the bucket that will be used to store your dataset. Note that all the files from bucket will used for model training, so we recommend to create a new empty bucket
  • Connection - select Vertex AI connection that was created on the step 1
  • Training node hours - default is set to 20
  • Select model type - default is set to MOBILE_TF_VERSATILE_1

🚧

Node Hours Billing

Google may bill a lot for a seemingly small number of node hours. Be aware of this value. We suggest trying lower values to start.

3024

Activate the workflow and click Train. If you refresh page, you should be able to see a new action run in the list with the status "Running"

3024

After action is done, action status should be changed from "Running" to "Finished":

3024

Context

You can connect this action to other steps, like the completion of human tasks.

Inspecting Results

Navigate to vertex AI to view the results. You can deploy and use the model as you see fit.

Future Work

We are exploring various model prediction methods based on these models. Join the conversation on slack or github.