Capacity Planning
Audience
This document is for engineers.
Introduction
In order to capacity plan your Diffgram cluster there are a few things to consider
- End Users
- Media Processing Profile
- Workflow
- Database
- Node Pools and AutoScaling
As a process, the main intent with these specifications is that they balance providing a safe load handling ability, at a reasonable cost, to get started. They can then be further refined to your specific needs.
End Users
Factors
- Number of simultaneously active users
- Media type
Guideline for K8s compute:
1 vCPU per 10 simultaneous users for heavy usage
1 vCPU per 20 simultaneous users for moderate usage
Memory
2 GB for every 1 vCPU
Replica to vCPU ratio:
4 vCPUs per Replica
So if 20 vCPUs are needed then we suggest
5 replicas with 4 each.
Minimum
2 vCPU cores
4 GB memory
24 GB disk
Simultaneous Users
For example, if you have a team of 100 people, but only 20 working at any given time, then you can use 20 to scope capacity. To word that in the inverse, a system planned for 20 simultaneous users could easily server 100s of total users, so long as only 20 are using it at any given moment in time.
AutoScale
We recommend using AutoScale. For example, you may need 10x the capacity during working hours and approaching zero capacity at night.
Database
The database is usually the heavier block for end user type operations.
1 vCPU per 5 simultaneous users for heavy usage
1 vCPU per 10 simultaneous users for moderate usage
Storage
100 GB per 100 million annotations
So for example, for heavy usage, for 100 simultaneously active users, we recommend 20 vCPU cores for the database. 20 simultaneously active users is 4 vCPU cores.
In general memory settings will follow CPU settings. For example about 5 GB memory per vCPU.
Not the CPU to memory ratios is different for the various services, and for the database.
Database Connections
Each instance of the default
or walrus
service will be able to use 1 or more connections. The connection pool size is controlled by the DATABASE_CONNECTION_POOL_SIZE
env variable
The ideal size for a connection pool is
- large enough that no client has to wait for a connection during normal load
- as small as possible within the above limit
You can read more about optimizing Postgres connections here: https://www.cybertec-postgresql.com/en/estimating-connection-pool-size-with-postgresql-database-statistics/
Gunicorn
In the docker files the Gunicorn entry point is defined, including a number of workers.
Workers
Each worker acts as a replica, so if you have 3 walrus workers for each pod and 2 pods you will have 6 total copies.
ENTRYPOINT gunicorn --bind :8080 --timeout 120 --worker-class sync --workers 3 --no-sendfile main:app
Optimizing
These recommendations are just meant as a rough benchmark to collect data and then optimize for your specific case.
It's unlikely you will need substantially more then then this, and quite likely you can reduce these numbers once the patterns are established.
If you are getting into the really high tier of hardware and it feels off, please create a support ticket. There may be queries for your specific case we can optimize and reduce hardware load.
Media Processing (Walrus)
Factors
- Pass by Ref or Diffgram processing media
- Media type
- Expected throughput
- Even driven or more batch focused
In general, if you are using a pass by reference method then the Media Processing (Walrus) should not need to scale in a very large way.
Memory settings are dependent on media type.
At a minimum memory must be double the largest single file type. (e.g. for a 10 GB video, it must be at least 20 GB). For image cases this is usually not relevant, since a 5 MB image is 200x less then 1 GB of memory.
Guideline:
60 images per minute per vCPU
Workflow
If you are using machine learning workflow, installed directly on Diffgram, then you may need to configure hardware on a per installed application basis.
Node Pools
The default recommendation for each node is Standard_D16as_v4
or similar.
That's 16 vCPU and 64 GB of memory.
AutoScaling
The default recommendation is to use AutoScaling to handle variance in load.
It should be able to scale to a minimum of 2 nodes to do updates, depending on your load factors you may need many more.
Updated over 1 year ago