Ingesting Instances from MongoDB Tutorial
Add Instances and Files Using MongoDB right into Diffgram
To get started, let's use the MongoDB Python SDK and start by connecting to the Database
Pre-Requisites
A mongo DB service running on localhost. If you need help setting up a Mongo DB Server check out this guide
1. Connect to the Database
from pymongo import MongoClient
# create a client instance of MongoClient
client = MongoClient('localhost', 27017)
# access the database by name
db = client.mydatabase
2. (Optional) Create some Test data
# select the collection
chat_conversations = db.chat_conversations
# create multiple documents to be inserted
conversations = [
{"sender": "Alice", "message": "Hello Bob, how are you?", "timestamp": "2022-05-01 10:00:00", "blob_path": "conversation1.blob"},
{"sender": "Bob", "message": "I'm doing great Alice, thanks for asking!", "timestamp": "2022-05-01 10:05:00", "blob_path": "conversation1.blob"},
{"sender": "Alice", "message": "That's great to hear Bob!", "timestamp": "2022-05-01 10:10:00", "blob_path": "conversation1.blob"},
{"sender": "Bob", "message": "How about you? How are you doing?", "timestamp": "2022-05-01 10:15:00", "blob_path": "conversation1.blob"},
{"sender": "Alice", "message": "I'm doing well too, thanks for asking!", "timestamp": "2022-05-01 10:20:00", "blob_path": "conversation1.blob"}
]
# insert the documents into the collection
result = chat_conversations.insert_many(conversations)
# print the inserted documents' IDs
print("Inserted documents IDs:", result.inserted_ids)
3. Query the Data
Now we can query the data to get it into a python variable to prepare it for ingestion to Diffgram
# select the collection
chat_conversations = db.chat_conversations
# query all the data from the collection and put it in a variable
conversations_data = list(chat_conversations.find())
# print the conversations data
print(conversations_data)
4. Connect to Your Diffgram Project
project = Project(
project_string_id = "<your_project_id>",
client_id = "<your_client_id>",
client_secret = "<your_client_secret>"
)
5. Ingest the data to Diffgram
for conversation in conversations_data:
file_name = conversation['blob_path']
text_data = conversation['message']
project.file.from_text_data(file_name=file_name, text_data=text_data)
Now you should see the data on the Import Section of your Diffgram project ready to be annotated!
Updated over 1 year ago