Capacity Planning

📘
Audience
This document is for engineers.

Introduction

In order to capacity plan your Diffgram cluster there are a few things to consider

End Users
Media Processing Profile
Workflow
Database
Node Pools and AutoScaling

As a process, the main intent with these specifications is that they balance providing a safe load handling ability, at a reasonable cost, to get started. They can then be further refined to your specific needs.

End Users

Factors

Number of simultaneously active users
Media type

Guideline for K8s compute:
1 vCPU per 10 simultaneous users for heavy usage
1 vCPU per 20 simultaneous users for moderate usage

Memory
2 GB for every 1 vCPU

Replica to vCPU ratio:
4 vCPUs per Replica
So if 20 vCPUs are needed then we suggest
5 replicas with 4 each.

Minimum
2 vCPU cores
4 GB memory
24 GB disk

📘
Simultaneous Users
For example, if you have a team of 100 people, but only 20 working at any given time, then you can use 20 to scope capacity. To word that in the inverse, a system planned for 20 simultaneous users could easily server 100s of total users, so long as only 20 are using it at any given moment in time.

📘
AutoScale
We recommend using AutoScale. For example, you may need 10x the capacity during working hours and approaching zero capacity at night.

Database

The database is usually the heavier block for end user type operations.
1 vCPU per 5 simultaneous users for heavy usage
1 vCPU per 10 simultaneous users for moderate usage

Storage
100 GB per 100 million annotations

So for example, for heavy usage, for 100 simultaneously active users, we recommend 20 vCPU cores for the database. 20 simultaneously active users is 4 vCPU cores.

In general memory settings will follow CPU settings. For example about 5 GB memory per vCPU.
Not the CPU to memory ratios is different for the various services, and for the database.

Database Connections

Each instance of the default or walrus service will be able to use 1 or more connections. The connection pool size is controlled by the DATABASE_CONNECTION_POOL_SIZE env variable

The ideal size for a connection pool is

large enough that no client has to wait for a connection during normal load
as small as possible within the above limit

You can read more about optimizing Postgres connections here: https://www.cybertec-postgresql.com/en/estimating-connection-pool-size-with-postgresql-database-statistics/

Gunicorn

In the docker files the Gunicorn entry point is defined, including a number of workers.

Workers

Each worker acts as a replica, so if you have 3 walrus workers for each pod and 2 pods you will have 6 total copies.

ENTRYPOINT gunicorn --bind :8080 --timeout 120 --worker-class sync --workers 3 --no-sendfile main:app

Optimizing

These recommendations are just meant as a rough benchmark to collect data and then optimize for your specific case.

It's unlikely you will need substantially more then then this, and quite likely you can reduce these numbers once the patterns are established.

If you are getting into the really high tier of hardware and it feels off, please create a support ticket. There may be queries for your specific case we can optimize and reduce hardware load.

Media Processing (Walrus)

Factors

Pass by Ref or Diffgram processing media
Media type
Expected throughput
Even driven or more batch focused

In general, if you are using a pass by reference method then the Media Processing (Walrus) should not need to scale in a very large way.

Memory settings are dependent on media type.
At a minimum memory must be double the largest single file type. (e.g. for a 10 GB video, it must be at least 20 GB). For image cases this is usually not relevant, since a 5 MB image is 200x less then 1 GB of memory.

Guideline:
60 images per minute per vCPU

Workflow

If you are using machine learning workflow, installed directly on Diffgram, then you may need to configure hardware on a per installed application basis.

Node Pools

The default recommendation for each node is Standard_D16as_v4 or similar.
That's 16 vCPU and 64 GB of memory.

AutoScaling

The default recommendation is to use AutoScaling to handle variance in load.
It should be able to scale to a minimum of 2 nodes to do updates, depending on your load factors you may need many more.

Updated about 2 years ago

Capacity Planning

📘
Audience

Introduction

End Users

📘
Simultaneous Users

📘
AutoScale

Database

Database Connections

Gunicorn

Workers

Optimizing

Media Processing (Walrus)

Workflow

Node Pools

AutoScaling

📘Audience

Introduction

End Users

📘Simultaneous Users

📘AutoScale

Database

Database Connections

Gunicorn

Workers

Optimizing

Media Processing (Walrus)

Workflow

Node Pools

AutoScaling

📘
Audience

📘
Simultaneous Users

📘
AutoScale