Even Data Can't Escape Physics: It Has Temperature, and It Has Gravity!
And these hidden forces impact your costs.
Thank you to our sponsors who keep this newsletter free:
Multiplayer auto-documents your system, from the high-level logical architecture down to the individual components, APIs, dependencies, and environments. Perfect for teams looking to streamline system design and documentation management without the manual overhead.
Data often seems abstract—a flow of bits and bytes we can't touch. People upload photos, stream movies, and send emails without thinking about what happens behind the scenes.
However, understanding the properties of data is critical for building these systems, as ignoring them can result in wasted money.
The first concept to understand is Temperature, which is associated with Storage & Access.
Hot, Warm, and Cold data
Storing data is not just about saving it and forgetting about it. You need to understand how often your user accesses the data and for how long you should keep it.
Depending on how often you need to access data, you can categorize it into:
Hot Data
What is it? Data that you need often and fast.
Where is it Stored? On fast storage like SSDs or even in memory.
Examples: Things like product recommendations or cached search results.
Cost: Storing hot data is expensive, but accessing it is cheap because it's always ready to go.
Warm Data
What is it? Data that's accessed occasionally, like once a month.
Where is it Stored? On slower but still accessible storage, e.g., Amazon S3 Infrequently Accessed Tier, Google Nearline.
Examples: Older logs or data that are not as frequently needed. This could be data that you use for reporting or analytics.
Cost: It is cheaper to store than hot data, but accessing it costs a bit more.
Cold Data
What is it? Data are rarely accessed and primarily kept for long-term storage.
Where is it Stored? On the cheapest storage options, like HDDs or cloud archive services.
Examples: Old backups or records that you keep for compliance reasons.
Cost: It is very cheap to store but can be slow and expensive to access.
Ideally, you have a balanced data distribution, with most data cold, some data warm, and only a small portion hot. The idea is to optimize both performance and storage costs.
Deciding How Long to Keep Data: Retention
Retention, or how long you should keep data, is a completely different animal, and it is based on four pillars:
Value
Is this data critical for you, or can it be recreated if needed? You should keep Important data for longer.
Time
For data stored in fast-access places like memory, set a time limit (TTL) for how long it stays there before moving it to cheaper storage.
Compliance
Some laws require you to keep data for a certain amount of time or delete it after a specific period. Make sure your data storage practices follow these rules.
Cost
Storing data costs money. To save on storage costs, you can automate deleting or archiving data when it’s no longer needed.
Yes, Data Costs Money to Move: Gravity
The final piece of the puzzle is Gravity. "The larger the data, the stronger its pull." This means that larger data is harder to move since it is more likely that other services, applications, and even additional data will be attracted to it and interact with it locally.
Moving data, especially out of a cloud provider's network, can be expensive. These costs are called Data Egress Fees and are a practical manifestation of data's "gravitational force."
For example, when Zoom chose Oracle as its cloud provider in 2020, one big reason was cost. If Zoom had used Amazon Web Services (AWS), they might have paid over $11 million a month just in data egress fees. With Oracle, they spent less than $2 million.
This shows how data egress fees can affect big business decisions.
A practical example of Data Egress Fees.
In this setup, you won’t need to pay data transfer charges for data replication to read replicas within the same Region. Still, you will be charged for data replication to read replicas deployed across Regions ($0.02/GB out).
Final Thoughts
Data isn't just an abstract idea; it's a real thing that uses Storage, Energy and Costs Money. You need to understand this to make better decisions about storing and managing your data.
Data can't escape the laws of physics or economics, nor can we.
Don't just store data—manage it!
Save this for your next Storage Decision.
System Design Classroom is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Articles I enjoyed this week
Why Is Redis a Distributed Swiss Army Knife by
Design a Scalable Notification Service - System Design Interview by
The best way to test Web APIs by
Solving Problems by Sorting by
Modern Rockstars and Old-school Rockstars by
References
Fundamentals of Data Engineering, By Joe Reis & Matt Housley
Exploring Data Transfer Costs for AWS Managed Databases
Thank you for reading System Design Classroom. If you like this post, share it with your friends!
Excellent post! Understanding data temperature is key for efficient storage.
Super valuable insights Raul with some cool analogies.
Data is usually the foundation of most useful applications and is a big factor in the cost.
Also, thanks for the mention!