Blocking due to application problems: Case Study

Performance Monitor
23/5/2024
Michael Roedeske
Table of contents

Initial situation

Using the DBPLUS PERFORMANCE MONITOR software, let's take a look at the loading time of April 20. Immediately, we note that while the loading time of the query usually varies between 12 and 15 seconds per snapshot, in some periods there is a significant jump to more than 2000 seconds.

Such an anomaly can have a serious impact on the overall performance of the database. The extended query processing time suggested that the system was pointing to a deeper, possibly systemic problem. This initial finding called for a more detailed examination of the causes of these delays, guiding the problem-solving process towards specific performance bottlenecks.

On the way to identifying the problem

After identifying anomalies when using DBPLUS PERFORMANCE MONITOR software, the next logical step was to look for a possible correlation of locks with executions, reads from disk, buffer writes, etc. However, only after analyzing the statistics of locks, the pattern became clear - there was a significant overlap of moments when the duration of queries increases and when problems with blockages increase.

Detailed analysis of the lock

To start a thorough analysis, we go to the tab lock history, which records all lockout events on a given day. The purpose of this section is to give us a chronology of events. We click on individual points on the timeline to access comprehensive data on each lock event.

How we will carry out the analysis of the blockade:

  • Access to detailed information: By selecting a specific point on the timeline, we get more details about each lock event. This includes the time the lock occurred, its duration, and the specific database elements that were affected.
  • Analysis of blocked and pending sessions: The most important are not only sessions that actively maintain locks, but also those that are in a state of waiting due to these blockages. For each session, critical data such as session ID, status (active, pending, dormant) and the exact start time of the transaction are displayed.
  • Analysis of waiting times: An essential feature of the block history tab is the ability to show how long sessions have been waiting due to app crashes. This is especially useful for identifying sessions that cause significant delays, providing clear goals for further investigation or immediate action.

Analysis of the session

The session in question was consistently labeled “dormant” in many snapshots. This status indicates that while the session was connected and the transaction was initiated, no active queries were executed for an extended period of time. The session started trading at 11:55, but did not show any activity for 257 seconds in the initial observation. Subsequent snapshots indicated that this inactivity continued and the session was still not performing any queries. This prolonged inactivity put resources on hold while blocking other trades, leading to longer waiting times for other sessions.

Root Cause Analysis

The analysis showed that the user's application did not properly manage its transactions. The dormant state of the transaction indicated that it was initiated but was not actively processed due to the lack of queries executed or the absence of orders to finalize the transaction, such as approve or withdraw. This mismanagement of transactions suggests that the app may not have robust error handling or transaction timeout policies that could prevent prolonged inactivity.

Impact on the system

Improper handling of transactions can lead to several problems in the database environment, mainly by creating unreleased locks. In this case, the lack of proper management of transaction statuses by the application resulted in unnecessary locks by the application on resources that were not actively used. This caused other transactions to be blocked and degraded the overall performance of the system. This situation worsened during periods of peak workload, when the cumulative effect of many inefficient transactions slowed down the operation of the system.

How to deal with it?

There is a lot to do to prevent such situations in the future:

  • Implement query optimization: Review and optimize slow-running queries. Use indexing strategies and optimize SQL queries to reduce execution time and resource blocking.
  • Adjusting the lock timeout settings: Set appropriate timeout settings for transactions to prevent long wait times. This can help you automatically complete trades that cause lockdowns if they exceed a certain time threshold.
  • INCarrying out load balancing: Spreading the database load across multiple servers or instances. This will help mitigate the impact of heavy queries and reduce the occurrence of blockages during peak hours.
  • Concurrency control mechanisms: Implementation of advanced concurrency control mechanisms. Solutions such as optimistic concurrency or row versioning help you manage data consistency without the need for large blocks.
  • Regular performance monitoring: Use tools more often monitoring such as DBPLUS PERFORMANCE MONITOR. They are essential to identify performance bottlenecks early and take proactive action before they affect system performance.

Related articles