If you write code in Java and have tried to build services with predictable tail latencies, then you know what I am talking about. One of the major reasons for observing random latency spikes in your service calls is garbage collection.
What is garbage collection ? It is the process that the JVM does to free up heap memory that is no longer needed. It allows you to write Java code without worrying about manually managing memory.
When is an object considered garbage ? when there isn't a pointer to the object in the heap, at which point it becomes unreachable and can be removed. Take the following Java code:
public final class Utils {
public static void printRandomList() {
List<Integer> values = new ArrayList<>();
values.add(1);
values.add(2);
values.add(3);
System.out.println("Values: " + values);
}
}
A pointer which points to the list of integers (values
) on the heap will get created in the stack when the method is called, and this pointer gets removed from the stack after the method execution is completed which leaves the heap memory occupied. This is when the garbage collector comes in, finds that the list of integers on the heap is not accessible as there are no pointer references and it de-allocates it from memory. But this is not so straight forward and sometimes it can also affect the overall performance of your system, more on this below.
Java Garbage Collector
Java uses a generational garbage collector, it is based on the hypothesis that most of the objects in a program can be cleaned up early and only a very few objects tend to live longer during the lifetime of the program. So the JVM splits memory into two generations, young and old. The young generation of memory contains objects that are usually cleaned up in a short span of time and the clean up occurs frequently and is in-expensive. Objects that survive multiple clean up cycles in the young gen (continue to have direct/indirect references) are moved to old genereation where the GC happens less frequently and is relatively more expensive. In a JVM managed program, the user does not have control over when GC is run, it is typically executed in the background but it has a side effect of pausing application threads to free up memory. This is called Stop-The-World (STW) pauses and as a result can cause latency spikes, i.e. your application threads are paused from somewhere between 10ms to 100ms depending on whether it is a minor or major GC activity (i.e. clean up of young gen vs old gen).
This lack of control in when GC is run and the pause in application threads can result in unpredicable latencies, this can be a bummer if your clients have a strict latency requirement.
How can I reduce the effect on tail latency due to GC activity ?
- Use Rust - If having a predictable tail latency is crucial (e.g. you are building a database), then maybe a garbage collected lanugage like Java is not the right choice, in which case a language like Rust will allow you to extract more predictable performance as heap alloc/dealloc are handled explicitly via memory ownership and borrowing rules.
- Upgrade JDK/Try another GC alogrithm - G1GC is the default garbage collection algorithm from JDK11, this algorithm attempts to balance latency and throughput. However, if your heap sizes are really large, you could use ZGC or Shenandoah which are more experimental garbage collection algorithms that have relatively lower pause times.
- Increase heap size - If you can increase heap size or update the memory size allocated to young generation vs old generation, you can adjust the amount of minor and major gc activity that occurs in your application. This will have a direct impact on your latency.
- Heap dumps - Take regular heap dumps to analyze your heap memory, it will allow you to detect any leaks and understand how your heap is growing as the program evolves. This will also help you tune the GC to your liking.
- Tune the garbage collector - The JVM exposes a bunch of parameters that you can use to optimize for latency and throughput
- Request Hedging - Another case where horizontally scaling is better than vertically scaling. If you have multiple hosts, then you can send a single request from a client to more than one host from the load balancer and then use the quickest one to respond back to the client. This helps flatten the heavy tail distribution of your latency