Production Installation
Overview
In this article we will cover:
- Planning
- Key technical resources
- Step-by-step guides for AWS, GCP, and Azure.
Planning
Diffgram is installed on your hardware. You control the installation process.
Resources:
- Conceptual overview of Install Setup tasks.
- Backend Architecture overview.
An initial TODO list looks something like:
- Reading this page and relevant linked resources
- Choosing your hardware plan, for example spinning up a new K8s cluster or installing on existing
- Following the steps in the Tutorials.
K8s Kubernetes Resources
We recommend kubernetes for production installations.
Kubernetes Helm Chart Example
Kubernetes Github Action Example
Database Assumptions
We currently only support using a managed database service for postgres for Production. Using a cluster hosted database is only supported for testing and staging.
During the helm installation process, the database Schema will automatically be created.
During updates, migrations will attempt to be automatically performed using Alembic. You may also manually perform migrations using the provided Alembic scripts. in shared/alembic
.
Rolling Updates
The system is generally resilient to rolling updates but still recommends performing upgrades during non-peak hours.
Production Data Protection
Please give care during installation to consider hardware resources, configuration, and read the production installation guide above before starting. Misconfigured setups can lead to data loss.
Step By Step Guides
Please note that due to the natural changes in systems over time the tutorials may not reflect the exact UI or steps required. Contact us with any support questions here.
AWS
Azure
Bare-metal
You may run Diffgram on any infrastructure that supports Python/Docker/K8s. A storage option for bare-metal is MinIO.
After Installation
After installation, your team can start to setup and configure Diffgram
We also recommend setting up:
- Monitoring and alerting
- Practicing Accessing Logs
- Practicing monitoring resource usage
- Dry run of updating to the latest version
Updates
Resources
Open Issues
https://github.com/diffgram/diffgram/issues/853
Additional Context
Why Kubernetes for Production?
We recommend K8s, in part because:
- Provides support for rolling updates. It's easy to get the latest version and update. github actions
- Scale compute resources, easily scale up for ingestion or more users, and auto scale down
- Provides a standard security model including ingress and setup for optional http site for easy access
And more.
K8s is a complicated system, but in general you do not need to know much about it to use it with Diffgram. Especially on Azure and GCP, you can be up and running in a few hours and there is very little maintenance required in most cases.
While there is a small hardware cost for the k8s control node, for most production installations this small cost will save a lot of human admin headaches and give you a much better overall experience.
We are here to support you and with Enterprise Training Data Platform, have the optional services available including
- One time initial setup of your k8s cluster (planning support only or turn key)
- Ongoing, fully managed service of your k8s cluster(s)
Docker Compose Not Supported for Production
Use of Docker Compose in production voids all SLAs and support terms.
K8s Alternatives
K8s Alternatives may be supported under sales agreements.
Updated over 1 year ago