Q: What is the amount of data that we need to store?
Answer: Let's assume a few 100 TB.
Q: Do we need to support updates?
A: Yes.
Q: Can the size of the value for a key increase with updates?
A: Yes. In other words, its possible a sequence of keys could co-exist on one server previously, but with time, they grew to a size where all of them don't fit on a single machine.
Q: Can a value be so big that it does not fit on a single machine?
A: No. Let's assume that there is an upper cap of 1GB to the size of the value.
Q: What would the estimated QPS be for this DB?
A: Let's assume around 100k
Total storage size : 100 TB as estimated earlier
Total estimated QPS : Around 10M
Q: What is the minimum number of machines required to store the data?
A: Assuming a machine has 10TB of hard disk, we would need minimum of 100TB / 10 TB = 10 machines to store the said data. Do note that this is bare minimum. The actual number might be higher if we decide to have replication or more machines incase we need more shards to lower the QPS load on every shard.
Q: Is Latency a very important metric for us?
A: No, but it would be good to have a lower latency.
Q: Consistency vs Availability?
A: As the question states, we need tight consistency and partitioning. Going by the CAP theorem ( Nicely explained at http://robertgreiner.com/2014/08/cap-theorem-revisited/\, we would need to compromise with availability if we have tight consistency and partitioning. As is the case with any storage system, data loss is not acceptable.
Q: Is sharding required?
A: Lets look at our earlier estimate about the data to be stored. 100TB of data can’t be stored on a single machine.
Let's say that we somehow have a really beefy machine which can store that amount of data, that machine would have to handle all of the queries ( All of the load ) which could lead to a significant performance hit.
Tip: You could argue that there can be multiple copies of the same machine, but this would not scale in the future. As my data grows, its possible that I might not find a big beefy enough machine to fit my data.
So, the best course of action would be to shard the data and distribute the load amongst multiple machines.
Q: Should the data stored be normalized?
http://www.studytonight.com/dbms/database-normalization.php
Q: Can I shard the data so that all the data required for answering my most frequent queries live on a single machine? 0
A: Most applications are built to store data for a user ( consider messaging for example. Every user has his / her own mailbox ). As such, if you shard based on every user as a row, its okay to store data in a denormalized fashion so that you won’t have to query information across users. In this case, lets say we go with storing data in denormalized fashion.
A: If the data is normalized, then we need to join across tables and across rows to fetch data. If the data is already sharded across machine, any join across machines is highly undesirable ( High latency, Less indexing support ).
With storing denormalized information however, we would be storing the same fields at more than one place. However, all information related to a row ( or a key ) would be on the same machine. This would lead to lower latency.
However, if the sharding criteria is not chosen properly, it could lead to consistency concerns ( After all, we are storing the same data at multiple places ).
Q: How many machines per shard ? How does a read / write look in every shard?
Q: Can we keep just one copy of data? 1
A: Since there is only one copy of the data, reading it should be consistent. As long as there are enough shard to ensure a reasonable load on each shard, latency should be acceptable as well. Reads and writes would work exactly how they work with a single DB just that there would be a row -> shard -> machine IP ( Given a row, tell me the shard it belongs to and then given the shard, give me the machine I should be querying / writing to ) resolution layer in between.
There is just one tiny problem with this model. What if the machine in the shard goes down? Our shard will be unavailable ( which is fine as governed by the CAP theorem ). However, what if the machine dies and its hard disk becomes corrupt. We suddenly run into the risk of losing the data which is not acceptable. Imagine losing all your messages because your shard went down and the hard disk got corrupted. That means we definitely need more than one copy of data being written with us.