System Design : Rate Limit Strategy at Application level

Rohit Modi

Jan 8, 20235 min read

Hi, a warm welcome to tech-maze, in today's article we will explore how limiting the rate of request help an application to perform well and flawlessly.

Rate limiting can be done at various levels based on the situation, but mostly following two ways used by many application so we are going to check following two approaches in this article.

1. Rate limit at network level : we limit the rate of passing the request to the application at network level, to achieve this, we have few algorithms available to implement and one of them which is quite famous is Leaky bucket algorithm, for more details about this algorithm, please visit earlier article published on this platform, sharing the link below.

https://www.tech-maze.info/post/system-design-leaky-bucket-algorithm-and-its-uses

2. Limit the rate at application level :

To limit the processing of request at the application level, one or more of the following strategies can be applied ---

Rate limit with caching tool.

We all know that caching tools such as Redis, Gemfire, Memcached etc. are very useful for caching frequently used data, it saves time and resources by avoiding database hit all the time.

Caching tool also provide a feature which help us to delete the data after a certain time period which makes it ideal choice for rate limiting.

Each object, which application saves in cache memory saves with TTL (time to live) attribute, TTL indicates the duration in which object will be accessible and alive in cache memory, when this duration completes , object automatically gets removed from the cache memory, hence creates the space for new object.

now the question is how we can choose the TTL duration for the objects and the answer is, we have to get the user's activity metrics

for this we can consider at minimal level following two hypothesis

- how many time a user visit the application

- how much time a user spend on the application on average

Based on this data we can decide time to live attribute for each object which saves in cache memory, in general this is configured in between 12-18 minutes.

Lets visualize and understand what we have discussed above with the following diagram

Steps : -

a. client sends a request to the application

b. API gateway from the application receives the request

c. once validation and verification is done at API gateway, API gateway forward the request to request-processing service

d. request-processing service make a request to fetch the object from cache memory

d1. if object exist and not expired then it will return the object to the request- processing service and request-processing service delegate the object to API gateway which eventually forward the object as response to the client.

d2. if object does not exist and expired then request-processing service forward this request to load balancer where load balancer select the server based on implemented algorithms (Round Robin, Least connection etc.) to process the request.

e. server return the response back to the request-processing service and request- processing service save this object which is tagged with the TTL into cache memory and forward the object to API gateway.

f. API gateway return this object to client as a response.

Rate limit with message queue.

This is another technique to limit the rate of flow of request to the application, lets check the below diagram for understanding and then will break down the details of request processing.

Steps -

a. client sends the request to the application which is received by the API

gateway

b. API gateway forward incoming request to one of the queue, which is big in size, can hold a large number of request hence accept all incoming requests.

c. request stored in large queue move to smaller queue with a particular rate lets say 30 request per seconds, this value is configurable so based on our application capacity/requirement, we can configure this.

d. then request picked up by load balancer and distribute among the available

server.

In this way we can make sure that even incoming request to our application is

huge, application processes a limited number of request per second/minute/hour.

Rate limit with multiple api.

This is one of the technique used by most of the application to avoid failure or down time, this approach is quite easy to implement and its developer centric hence preferred choice among developers.

In this approach, we need to come up with a threshold value, lets say its 10000, then request will be taken up as below

service-1 - if number of request doesn't surpass this threshold value, request will be

redirected to this service by API gateway,

service-2 - if number of request exceed the threshold value then request will be

processed by this service

Threshold value can be decided by analyzing 6 month to 1 year data, on average how many request our application receives per second/minutes/hour/day and processed successfully and then we compute a number which is 90% (its our choice

of value, hence decide carefully ) of total number of request and consider that as threshold value.

for example - if on average, request processed successfully by our application is 10000/hour then 9000 will be our threshold value.

So we have seen rate limiting strategy at application level, all the request which comes to the application goes through these strategies, this is the fundamental need of high performant web apps, but there are special cases as well where rate limit algorithm doesn't run all the time, it runs only when it satisfies certain set of rules.

Lets take an example of WhatsApp, during the non festive season, load on this app is usual, messages processed from one user to another user/group in usual manner, each user get all kind of notifications such as sent, delivered, read, status etc.

But during the festive season such as New Year/Christmas/Deepawali etc, everyone tends to send message to all their contacts because of that request increases in huge numbers which eventually increases the load on WhatsApp servers.

In this case, WhatsApp rate limiting algorithms gets enabled and starts processing the request in following manner.

- WhatsApp filter out the logic which notify users with delivered/read notification, selection of user/group takes place in random fashion.

- during festive season, the main intension/requirement of users are to be able to send the message, so users get sent/received notification in real time, rest of the notification either ignored or notify at later point of time.

- reduce the time to live for images/files so that, file storage must not get flooded

by images/files.

These kind of strategy has been followed by many applications to make sure that user experience can not be compromised.

That's all folks in this article, keep visiting, feel free to share this article and share your suggestion/thoughts or any other approach in comment section.

A Big thank you for checking out my article, this really encourage me to come up with more topics.

System Design : Rate Limit Strategy at Application level

Recent Posts

Comments