Ingesting Instances from MongoDB Tutorial

Add Instances and Files Using MongoDB right into Diffgram

To get started, let's use the MongoDB Python SDK and start by connecting to the Database

Pre-Requisites

A mongo DB service running on localhost. If you need help setting up a Mongo DB Server check out this guide

1. Connect to the Database

from pymongo import MongoClient # create a client instance of MongoClient client = MongoClient('localhost', 27017) # access the database by name db = client.mydatabase

2. (Optional) Create some Test data

# select the collection
chat_conversations = db.chat_conversations

# create multiple documents to be inserted
conversations = [
    {"sender": "Alice", "message": "Hello Bob, how are you?", "timestamp": "2022-05-01 10:00:00", "blob_path": "conversation1.blob"},
    {"sender": "Bob", "message": "I'm doing great Alice, thanks for asking!", "timestamp": "2022-05-01 10:05:00", "blob_path": "conversation1.blob"},
    {"sender": "Alice", "message": "That's great to hear Bob!", "timestamp": "2022-05-01 10:10:00", "blob_path": "conversation1.blob"},
    {"sender": "Bob", "message": "How about you? How are you doing?", "timestamp": "2022-05-01 10:15:00", "blob_path": "conversation1.blob"},
    {"sender": "Alice", "message": "I'm doing well too, thanks for asking!", "timestamp": "2022-05-01 10:20:00", "blob_path": "conversation1.blob"}
]

# insert the documents into the collection
result = chat_conversations.insert_many(conversations)

# print the inserted documents' IDs
print("Inserted documents IDs:", result.inserted_ids)

3. Query the Data

Now we can query the data to get it into a python variable to prepare it for ingestion to Diffgram

# select the collection chat_conversations = db.chat_conversations # query all the data from the collection and put it in a variable conversations_data = list(chat_conversations.find()) # print the conversations data print(conversations_data)

4. Connect to Your Diffgram Project

project = Project( project_string_id = "<your_project_id>", client_id = "<your_client_id>", client_secret = "<your_client_secret>" )

5. Ingest the data to Diffgram

for conversation in conversations_data: file_name = conversation['blob_path'] text_data = conversation['message'] project.file.from_text_data(file_name=file_name, text_data=text_data)

Now you should see the data on the Import Section of your Diffgram project ready to be annotated!


Did this page help you?