Hello Atlassian Community.
Thiago Masutti here from the Atlassian Premier Support team.
Running a Java-based product, such as Jira, that serves thousands of users is inherently prone to incidents. In such cases, swiftly identifying the root cause is crucial for resolution.
Java-related issues can lead to significant disruptions. As support engineers we encounter these daily across various instances. As your instance grows in size and complexity, encountering such issues becomes inevitable.
Common examples of those problems are (not an extensive list):
Sometimes a Jira instance may suffer from these problems and they will cause a disruption to all users; or a subset of users that were accessing an affected node from a Jira Data Center cluster.
Being able to quickly identify the cause is key for a faster resolution of the problem.
Today I'm sharing an example of how it can help identifying one type of problem, which is the Out of Memory Error (OOM).
Imagine an instance outage. After restarting, a Support Zip file is generated and parsed with JAGS.
From the landing dashboard we have important information with insights on the possible problem.
Within the Overall Metrics section we have some characteristics of the problem:
- A spike on CPU utilization.
- Full GC activity.
- Tomcat thread pilling up.
- Database query latency spiking.
These metrics alone might not pinpoint the issue, but the General Problems section reveals:
- 25 OOM errors in logs.
- A heap dump file might have been generated.
Let's check first some details from the Garbage collection, heap and memory usage dashboard.
This dashboard gives us details about the following:
- Heap and off-heap memory utilization.
- Garbage collection (GC) activity.

On this example we can see the following characteristics:
- High heap consumption.
- Full GC (G1 old generation) activity.
If we extend a bit the time window to check the heap utilization we can see heap utilization was healthy on the past days, but something is requiring more memory recently.
With that information we can move to the "Out of memory errors" dashboard.
Initially it gives us a glance on relevant information:
- If JVM properties that would be relevant to the troubleshoot are adjusted.
- The distribution of OOM errors and of heap dump files generated by node.

Expanding the other sections we can see more details about the OOM errors over time.
An important panel is the Log entries about heap dump generated.
The heap dump generated during the OOM is paramount to continue the investigation and identify the root cause. This panel would show this file, as well as its size.
In this case the heap dump couldn't be saved because of lack of permissions on the chosen directory.
JAGS aids in quickly identifying OOM errors as incident causes. For root cause analysis, heap dump examination is necessary.
Partner with Atlassian Support for assistance.
Check more about Atlassian Support here.
Kind regards,
Thiago Masutti
Premier Support Engineer | Atlassian
2 comments