Ingesting Instances from MongoDB Tutorial

Add Instances and Files Using MongoDB right into Diffgram

To get started, let's use the MongoDB Python SDK and start by connecting to the Database


A mongo DB service running on localhost. If you need help setting up a Mongo DB Server check out this guide

1. Connect to the Database

from pymongo import MongoClient

# create a client instance of MongoClient
client = MongoClient('localhost', 27017)

# access the database by name
db = client.mydatabase

2. (Optional) Create some Test data

# select the collection
chat_conversations = db.chat_conversations

# create multiple documents to be inserted
conversations = [
    {"sender": "Alice", "message": "Hello Bob, how are you?", "timestamp": "2022-05-01 10:00:00", "blob_path": "conversation1.blob"},
    {"sender": "Bob", "message": "I'm doing great Alice, thanks for asking!", "timestamp": "2022-05-01 10:05:00", "blob_path": "conversation1.blob"},
    {"sender": "Alice", "message": "That's great to hear Bob!", "timestamp": "2022-05-01 10:10:00", "blob_path": "conversation1.blob"},
    {"sender": "Bob", "message": "How about you? How are you doing?", "timestamp": "2022-05-01 10:15:00", "blob_path": "conversation1.blob"},
    {"sender": "Alice", "message": "I'm doing well too, thanks for asking!", "timestamp": "2022-05-01 10:20:00", "blob_path": "conversation1.blob"}

# insert the documents into the collection
result = chat_conversations.insert_many(conversations)

# print the inserted documents' IDs
print("Inserted documents IDs:", result.inserted_ids)

3. Query the Data

Now we can query the data to get it into a python variable to prepare it for ingestion to Diffgram

# select the collection
chat_conversations = db.chat_conversations

# query all the data from the collection and put it in a variable
conversations_data = list(chat_conversations.find())

# print the conversations data

4. Connect to Your Diffgram Project

 project = Project(
   project_string_id = "<your_project_id>",
   client_id = "<your_client_id>",
   client_secret = "<your_client_secret>"

5. Ingest the data to Diffgram

for conversation in conversations_data:
    file_name = conversation['blob_path']
    text_data = conversation['message']
    project.file.from_text_data(file_name=file_name, text_data=text_data)

Now you should see the data on the Import Section of your Diffgram project ready to be annotated!