Production Installation

Overview

In this article we will cover:

  • Planning
  • Key technical resources
  • Step-by-step guides for AWS, GCP, and Azure.

Planning

Diffgram is installed on your hardware. You control the installation process.

Resources:

An initial TODO list looks something like:

  • Reading this page and relevant linked resources
  • Choosing your hardware plan, for example spinning up a new K8s cluster or installing on existing
  • Following the steps in the Tutorials.

K8s Kubernetes Resources

We recommend kubernetes for production installations.

Kubernetes Helm Chart Example

Kubernetes Helm

Kubernetes Github Action Example

github actions

Database Assumptions

We currently only support using a managed database service for postgres for Production. Using a cluster hosted database is only supported for testing and staging.

During the helm installation process, the database Schema will automatically be created.

During updates, migrations will attempt to be automatically performed using Alembic. You may also manually perform migrations using the provided Alembic scripts. in shared/alembic.

Rolling Updates

The system is generally resilient to rolling updates but still recommends performing upgrades during non-peak hours.

📘

Production Data Protection

Please give care during installation to consider hardware resources, configuration, and read the production installation guide above before starting. Misconfigured setups can lead to data loss.

Step By Step Guides

Please note that due to the natural changes in systems over time the tutorials may not reflect the exact UI or steps required. Contact us with any support questions here.

AWS

AWS Guide

Azure

Azure Guide

Bare-metal

You may run Diffgram on any infrastructure that supports Python/Docker/K8s. A storage option for bare-metal is MinIO.

After Installation

After installation, your team can start to setup and configure Diffgram

We also recommend setting up:

  • Monitoring and alerting
  • Practicing Accessing Logs
  • Practicing monitoring resource usage
  • Dry run of updating to the latest version

Updates

Update Production

Resources

Open Issues

https://github.com/diffgram/diffgram/issues/853

Additional Context

Why Kubernetes for Production?

We recommend K8s, in part because:

  1. Provides support for rolling updates. It's easy to get the latest version and update. github actions
  2. Scale compute resources, easily scale up for ingestion or more users, and auto scale down
  3. Provides a standard security model including ingress and setup for optional http site for easy access
    And more.

K8s is a complicated system, but in general you do not need to know much about it to use it with Diffgram. Especially on Azure and GCP, you can be up and running in a few hours and there is very little maintenance required in most cases.

While there is a small hardware cost for the k8s control node, for most production installations this small cost will save a lot of human admin headaches and give you a much better overall experience.

We are here to support you and with Enterprise Training Data Platform, have the optional services available including

  1. One time initial setup of your k8s cluster (planning support only or turn key)
  2. Ongoing, fully managed service of your k8s cluster(s)

Docker Compose Not Supported for Production

Use of Docker Compose in production voids all SLAs and support terms.

K8s Alternatives

K8s Alternatives may be supported under sales agreements.