Old Concerns & Objections

🚧

This doc is out of date and pending removal.

This is "Core" to us, so we want to build it in house no matter the cost

The Core vs Non-Core discussion is an important one. Let's consider the context. For an automotive brand, they may differentiate on application level features, such as ability to drive on multi-lane highway. In 2020 it's unlikely that they would build their own version of Postgres.

The database is common across industries and within the industry, including automotive competitors. While they could create their own version of Postgres it would be uneconomical and impractical. The database supports the Core features, but the database tech itself is Non-Core.

Diffgram is similar to a database here. While having Diffgram gives an advantage vs not having Diffgram, building an "in-house" Diffgram is unlikely to convey the level of strategic advantage that is normally the concern of Core vs Non-Core.

Our data is important - we can't trust a 3rd party with it

Diffgram is deploy able on your private cloud. The source code can be inspected to ensure there are no remote access points or other liabilities. We provide upgrade and migration scripts for regular improvements, but it's the customer's responsibility to actually execute them (for private cloud installs).

We can build this ourselves in {6, 12, ...n} months

If you devote a team to this, perhaps they could try. Before you go down the path consider a few things:

  • With Diffgram you can start now. If your competitor doesn't have Diffgram yet (or worse, they do!) this is a big advantage.
  • Diffgram is continually updated. Major Version 2 is scheduled for release in Oct 2020. Do you really want to devote a team of engineers, on an going basis, to a non-core part of your product?
  • Diffgram is affordable, likely lower then the cost of that team. To prove this, consider that Diffgram is used in every industry. That means that we can afford to invest engineering cycles and share that burden across industry.

We handle {insert large number} of records - how do we know it can scale

Feel free to try the shared version anytime for a general feel of some of the features. We are happy to do Proof of Concepts to demonstrate and test with limited risk.

We can also provide the following general guidance.

  • Because training data is generally related to human review, and human review takes time, the amount of actual database data recorded is lower compared to some other types of products. The scaling concerns are more with the Architecture, in which case Diffgram's years of experience pay off.

  • Scaling has a variety of dimensions. Diffgram has been used on multi-million instance sets and is tested for billion-instance sets. Our plan of intent is to support the absolute largest datasets required. Diffgram has robust and flexible capabilities across a well defined set of areas of responsibility and external boundaries.

  • Diffgram has been used for large scale sets since 2019. While this is a new area Diffgram is one of the relatively mature modern platforms in the space.

  • Diffgram has a project level abstraction which fits with Postgres 12's partitioning system. It can do high QPS look ups across 8192 partitions. Each partition can store on the order of 100s of millions of records. GCP can host 31 TB per instance. Diffgram can support multiple DB instances per Diffgram Instance. And you can run multiple Diffgram instances. Diffgram works for the largest F100 companies and will work for your workload too.

  • Test cases with millions of records that run in acceptable performance bounds. Tested concurrent high throughput video loads. The file importing system has robust monitoring and retry capabilities. And is distributed to multiple worker nodes by default.


Did this page help you?