Many of us fall into the trap of trying to perfect every part of the system. But perfection isn't practical, and most of the time, it just wastes resources and money.
Not every feature of your system needs to be lightning-fast or perfectly resilient. Instead, focus on what matters most for each part of your application.
If you follow user behavior and prioritize correctly, you can make your system both efficient and cost-effective.
Let's break down a Problem:
Imagine you need to design a hotel reservation system. You need to support the following features:
• Search for rooms
• View room details
• Booking a room
• View confirmation details
But here's the thing: not all of these features need to work at the same speed. Each one has different demands based on how users interact with it.
User Flow and Priorities
Most users begin by searching for rooms. This needs to be super fast because it's their first impression, and let's be honest, no one likes waiting.
After the search, about 10% of users click on a room to see its details.
And only around 3% of those users actually book a room.
This means that the bulk of users drop off after the search stage.
Here's how the design should accommodate these user behaviors:
Search and View Details need low latency. Results have to be instant to keep people interested.
Booking and Confirmation need to focus on resilience, availability, and transactional reliability. Booking isn't just about speed—it's about making sure everything works right, from payment to inventory updates.
Thank you to our sponsors who keep this newsletter free:
Multiplayer auto-documents your system, from the high-level logical architecture down to the individual components, APIs, dependencies, and environments. Perfect for teams looking to streamline system design and documentation management without the manual overhead.
Designing Each Feature
API Gateway
The API Gateway serves as the front door of the system—a single entry point for all requests.
This gateway keeps everything organized and makes it easier to enforce security, logging, and monitoring consistently across every feature.
Search API
For the Search API, speed is everything.
Users want instant results, so you need something quick and efficient. You can use a specialized search index—a super-organized digital filing cabinet that makes finding room results nearly instant.
It makes all the difference for that first user experience.
To achieve this, you can leverage Elasticsearch and Kafka.
Elasticsearch is a powerful open-source distributed search engine with near real-time search capabilities. It allows you to index room data in a way that optimizes query speed. It can handle a large number of search requests at the same time, ensuring users always receive fast and relevant results.
Elasticsearch also means you can provide features like:
Autocomplete
Filters
Sorting
To keep the search index up to date, you can use Kafka along with Change Data Capture (CDC).
Whenever there is a change in the booking or room inventory in the main database, the CDC mechanism captures this change and publishes an event to Kafka.
Then, using a Kafka Data Sink Connector, we can keep Elasticsearch updated in near real-time.
Room Details Service
When users select a specific room, you may also want the details to load fast.
The Room Details Service reads a lot from caching and CDNs (Content Delivery Networks). Using CDNs for images and static descriptions allows you to serve data without constantly hitting up the primary database or storage.
This reduces load times and makes it easy to keep users engaged. Plus, since Search and Details are closed, you can put them in the same API
Booking and Confirmation API
Booking is where things get serious.
People booking are making a commitment, and you needed to ensure everything worked perfectly.
This service needs to handle booking transactions, manage inventory, send booking confirmations, and process payments. Transactional integrity was crucial—everything must happen correctly or nothing.
For this, you need to use retry logic and queues to handle issues without losing user requests.
The priority here is resilience.
You can also benefit from Kafka here since it can act as an event streaming platform that allows services to communicate asynchronously and reliably.
This enables other services, like inventory management or notifications, to respond in real-time to updates.
Putting it all together
The design has several benefits, particularly in how different services can scale independently and meet user needs effectively:
Elasticsearch, paired with Kafka for updates, can scale independently. This approach ensures near real-time updates without directly loading the main database, which is critical when handling a large number of search queries simultaneously.
The Booking API should be focused on transactional integrity and resilience, so it scales differently to ensure consistency and reliability. It utilizes Kafka for event-driven updates, which helps distribute the load across multiple services.
Kafka with Change Data Capture (CDC) ensures that all system parts stay updated in real-time without needing direct synchronization. This event-driven approach reduces latency and keeps the system consistent while enabling different services like inventory management, notifications, and search to react to changes as they occur.
To build an effective system, you need to understand how users behave at different stages of the process:
Search and View Details: Focus on speed and low latency. Caching and indexing can make these features quick and responsive.
Booking and Confirmation: Emphasize resilience and transactional integrity. Reliability should be the main focus because booking involves money and availability.
Aligning the system’s design with user needs is the way to create better systems and optimize your resources at the same time.
Design for real user needs is the way to improve experience and efficiency.
System Design Classroom is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Articles I enjoyed this week
How Amazon S3 Works by
Top 5 Caching Strategies Explained by
Arrays by
Thank you for reading System Design Classroom. If you like this post, share it with your friends!
Great post Raul.
Each part of the system has a unique need. I suppose something like the 80/20 rule could provide a good foundation for this. For example, in the case of latency, a limited number of components can yield maximum results.
Thanks for the shoutout, my friend!