Performance problems rarely occur in isolation. Often, a single problem can trigger a cascade of related problems, each of which amplifies the overall impact on the system. This case study analyzes such a scenario in the T10 database on g1rush Server.
On June 21, 2024 at 13:04:32, DBPLUS PERFORMANCE MONITOR flagged a critical error: ORA-1652, indicating the inability to extend the temporary segment by 128 in the TEMPORARY_DATA tablespace in the T10 database on g1rush Server. This error occurred due to simultaneous sessions overwhelming the temporary space of tables, consuming more than 30 GB.
Highlighted in the chart, the spike in resource utilization immediately caught the company's attention. This overflow interfered with ongoing operations and indicated inefficient resource management.
Further investigation revealed a persistent high load on the processor, even after reducing the number of active processes. The cyclical nature of this load, peaking during periods of high activity, suggested deeper problems with query optimization and resource allocation.
The graph above shows busy time (CPU), system time (CPU), and user time (CPU) with a clear pattern of cyclic peaks. This persistent high load required us to delve into specific queries and processes that consume CPU resources.
The next step was to analyze the specific queries contributing to the high CPU load. In particular, the query for the identifier 3445255751 showed significant changes in execution time and schedule. It was suspected that these changes were related to the increased execution time of another query, ID 904000402, due to the use of the shared function.
The analysis showed that query 3445255751 experienced run-time fluctuations, which had an impact on overall performance. Comparing execution plans, we identified plan hash 232983816 as a potential cause of inefficiency.
This comparison highlighted the need to stabilize query performance by forcing a specific plan. The change in the implementation strategy suggested the need for consistent monitoring and timely adjustment of plans.
As we continued our investigation, we noticed significant network load, with data transfers as high as 52GB per hour. This indicated a significant flow of data, which may have exacerbated the observed performance problems.
Analysis of network traffic showed a large number of bytes received and sent by SQL*net, highlighting the impact of data transfer on overall performance.