Performance is one of the big watchwords in the IT industry; nowhere is it more often used than in the database arena, and it’s usually not a good sign. End users know nothing, really, of the inner workings of the database yet they are the first to point the ‘finger of blame’ at the database when the processing speed slows or queries take longer than they would like to return data. [For end users, performance is time, and the longer a process takes the more dismal performance seems to be. Given that perspective their complaints seem justified.] Yes, there ARE times when the database is at fault — ‘missing’ indexes or stale statistics can cause execution plans to change, as can inserts and updates to heavily transacted tables (which many times goes back to stale statistics since they now no longer accurately reflect the data and its distribution in the table). Bugs in the database software can also cause issues by causing the gathered statistics to miscalculate values or generate ‘useless’ frequency histograms, such as on single-valued not null columns. Such histograms can direct the optimizer to choose a path it wouldn’t normally choose when the statistics and histograms contain useful data.
Here is where the problem can get complicated: it may not be the database at fault when performance is down. Yes, it’s the database server you’re getting data from. Yes, things were running just fine until today. By ‘logical’ deduction it’s the database at fault, right? Not necessarily. The database also has to deal with the operating system, the storage system, the available memory and the network to get its work done. Problems in any of those areas are likely to adversely affect database performance. So it’s not necessarily the database causing the problem. How do you know where the problem originates? Oracle does provide some tools to help in assessing where a problem is likely to be, but you may also need to involve System Administrators, Network Administrators and/or Storage Administrators to fully investigate some issues. Let’s take a look at the Oracle supplied tools, along with some operating system utilities that can be used to diagnose performance issues.
At the first sign of trouble many look to the DBA to investigate the database for bottlenecks or other performance-robbing possibilities that surely must exist. Oracle provides at least two tools in releases 10.1 and later to investigate the database for performance problems — the AWR report, a general overall ‘picture’ of the database ‘health’ that can give direction to such a quest and the ASH report, a more detailed ‘picture’ for a given window of activity that provides more detailed wait information. These utilities should be used at the first report of performance issues to determine if the problem is database-related or caused by another area, such as the operating system or network. Starting with the AWR (Automatic Workload Repository) report the two sections to begin an investigation, in my estimation, are Wait Events Statistics and SQL Statistics. The first provides a system-wide view of the waits encountered and the times those waits consumed. Background Wait Events and the Wait Event Histogram sections can provide some insight into what was occurring during the report window; waits to watch are I/O related waits (reads, writes) which can signal issues with a query plan or issues with the storage (issues not addressable by the DBA). The SQL Statistics list SQL statements in order of several areas, including overall elapsed time, consumed CPU time, I/O wait time and physical reads. The SQL statistics can usually point to ‘bad’ plans, making performance problems a database issue because such plans are the result of bad or stale statistics. In 11.2 and later releases dynamic sampling can mitigate missing statistics by automatically sampling the data prior to generating a query execution plan. We won’t get into the details of using these reports to effect database tuning procedures as that’s a topic for another article. The intent here is to provide a means to determine IF the problem is a database issue or if it’s caused by outside ‘forces’. For releases earlier than 10.1 Oracle provides Statspack, the pre-cursor to AWR and the next step after the original database snapshot utilities utlbstat and utlestat. This utility can provide a very good snapshot of database activity, although not quite as robust as the AWR report. It is still available in 11.2 if you would like to install it. AWR and ASH are utilities which are part of the Diagnostic and Tuning Pack for OEM, and thus need to be licensed if they are to be used. Statspack doesn’t require such a license.
The “Top 5 Timed Foreground Events” section gives an overview of the wait events which consumed the most time during the report window:
Top 5 Timed Foreground Events
|Event||Waits||Time(s)||Avg wait (ms)||% DB time||Wait Class|
|log file sync||8||0||11||30.72||Commit|
|Disk file operations I/O||1||0||1||0.33||User I/O|
|os thread startup||0||0||0.00||Concurrency|
|log file single write||0||0||0.00||System I/O|
Host CPU (CPUs: 4 Cores: 2 Sockets: 1)
This provides a quick look at the ‘worst offenders’ in the wait event category (although they may not be performance-robbing events). Start here then work down to the Operating System Statistics section:
Operating System Statistics – Detail
The Wait Event Histogram Detail can be very informative:
Wait Event Histogram Detail (64 msec to 2 sec)
- Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
- Units for % of Total Waits: ms is milliseconds s is 1024 milliseconds (approximately 1 second)
- % of Total Waits: total waits for all wait classes, including Idle
- % of Total Waits: value of .0 indicates value was <.05%; value of null is truly 0
- Ordered by Event (only non-idle events are displayed)
|% of Total Waits|
|Event||Waits 64ms to 2s||<32ms||<64ms||<1/8s||<1/4s||<1/2s||<1s||<2s||>=2s|
|ADR block file read||3||81.3||12.5||6.3|
|Disk file operations I/O||1||98.2||1.8|
|control file parallel write||15||98.7||.8||.3||.1|
|control file sequential read||139||94.0||4.7||.8||.3||.1||.0|
|db file parallel read||2||50.0||50.0|
|db file parallel write||185||91.4||5.7||1.8||.8||.1||.2|
|db file scattered read||63||93.5||5.0||1.2||.3|
|db file sequential read||142||94.2||4.5||1.2||.1||.0|
|direct path read||1||100.0|
|enq: CR – block range reuse ckpt||16||82.2||15.6||1.1||1.1|
|log file parallel write||82||92.5||4.6||2.6||.3||.1|
|log file switch completion||2||50.0||50.0|
|log file sync||1||90.0||10.0|
Wait Event Histogram Detail (4 sec to 2 min)
No data exists for this section of the report.
Wait Event Histogram Detail (4 min to 1 hr)
No data exists for this section of the report.
- ordered by DB Time
|Service Name||DB Time (s)||DB CPU (s)||Physical Reads (K)||Logical Reads (K)|
Such histograms can show where the preponderance of waits lie making it easier to determine if they are problematic.
Next on the list is the SQL Statistics report. There are several sections, the one shown is ordered by elapsed time:
SQL ordered by Elapsed Time
- Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
- % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
- %Total – Elapsed Time as a percentage of Total DB time
- %CPU – CPU Time as a percentage of Elapsed Time
- %IO – User I/O Time as a percentage of Elapsed Time
- Captured SQL account for 1.1E+04% of Total DB Time (s): 0
- Captured PL/SQL account for 2.3E+03% of Total DB Time (s): 0
|Elapsed Time (s)||Executions||Elapsed Time per Exec (s)||%Total||%CPU||%IO||SQL Id||SQL Module||SQL Text|
|5.88||1||5.88||2070.31||15.66||79.60||acc988uzvjmmt||delete from WRH$_SYSMETRIC_HIS…|
|2.49||44||0.06||876.07||10.66||83.39||2mp99nzd9u1qp||delete from histgrm$ where obj…|
|2.14||769||0.00||753.49||4.38||96.01||db78fxqxwxt7r||select /*+ rule */ bucket, end…|
|1.57||212||0.01||551.64||21.91||74.49||43ruajmwv9fus||insert into histgrm$(obj#, int…|
|1.42||1||1.42||501.54||7.67||86.00||0sfsrf0qzvsjv||delete from WRH$_SYSMETRIC_SUM…|
|1.19||22||0.05||418.37||2.63||93.41||d89c1mh5pvbkz||MMON_SLAVE||SELECT /*+ rule */ U.NAME, OT….|
|0.83||1||0.83||291.84||13.18||77.73||2whm2vvjb98k7||delete from WRH$_SQL_PLAN tab …|
|0.66||1||0.66||231.35||2.38||98.81||28bm5y53nujvz||delete from WRH$_RESOURCE_LIMI…|
|0.54||1||0.54||188.64||0.00||91.89||4y3yvmbav7uwb||delete /*+ UNNEST(@SEL$2) UNNE…|
|0.53||22||0.02||185.34||0.00||96.67||63kf3an2j0pkc||insert into sys.wri$_optstat_h…|
|0.51||1||0.51||179.21||9.20||75.31||96w2cbx7ntmtv||delete from WRH$_BG_EVENT_SUMM…|
|0.50||1||0.50||175.27||6.27||94.54||31sdu97k3baq1||delete from WRH$_SHARED_POOL_A…|
|0.44||1||0.44||156.14||17.60||80.45||1h7zt6jks80pa||delete from WRH$_ENQUEUE_STAT …|
|0.42||1||0.42||146.27||15.03||94.38||bzscyq07w79ab||delete /*+ dynamic_sampling(4)…|
|0.38||2,361||0.00||133.20||24.75||80.65||96g93hntrzjtr||select /*+ rule */ bucket_cnt,…|
|0.35||1||0.35||123.71||4.44||94.32||9v9n97qj8z1dg||delete /*+ dynamic_sampling(4)…|
|0.35||22||0.02||122.04||27.02||66.45||2rn08n06qr261||update histgrm$ set obj# = dec…|
|0.34||1||0.34||121.06||0.00||96.73||6ajkhukk78nsr||begin prvt_hdm.auto_execute( :…|
|0.31||1||0.31||107.67||0.00||97.51||48ffykkmk7sgc||delete /*+ dynamic_sampling(4)…|
|0.28||1||0.28||99.65||0.00||79.94||350myuyx0t1d6||insert into wrh$_tablespace_st…|
|0.27||375||0.00||94.36||17.47||87.22||bvn085xhxfn9f||insert into sys.wri$_optstat_h…|
|0.26||161||0.00||90.72||6.06||90.14||74anujtt8zw4h||select o.owner#, o.name, o.nam…|
|0.25||1||0.25||86.79||6.33||95.67||47x8f8z2hx1a2||delete from WRH$_MUTEX_SLEEP t…|
|0.24||1||0.24||85.22||6.45||92.64||az8cj7dfnpp45||delete from WRH$_MEM_DYNAMIC_C…|
|0.23||1||0.23||81.60||6.73||41.10||3mqvkt9as1phq||merge /*+ dynamic_sampling(mm…|
|0.22||1||0.22||76.17||0.00||87.96||du2bqfpfj9au2||delete from WRH$_LIBRARYCACHE …|
|0.21||1||0.21||73.67||14.92||93.36||b7ba0zt64wtc7||delete from WRH$_RSRC_CONSUMER…|
|0.20||44||0.00||71.59||7.68||71.51||as3uq6ggb3gx6||delete from hist_head$ where o…|
|0.20||89||0.00||71.52||30.73||92.39||3ktacv9r56b51||select owner#, name, namespace…|
|0.20||10||0.02||71.14||0.00||99.04||c6awqs517jpj0||select /*+ index(idl_char$ i_i…|
|0.20||1,098||0.00||70.92||23.24||58.47||3c1kubcdjnppq||update sys.col_usage$ set equa…|
|0.19||22||0.01||68.12||56.47||8.79||9dy457uf4uxzd||MMON_SLAVE||SELECT /*+ all_rows */ ‘”‘||UI…|
|0.19||1||0.19||67.57||0.00||0.00||f318xdxdn0pdc||insert into wrh$_log (snap_id,…|
|0.19||113||0.00||67.01||0.00||84.34||59vjj34vugaav||delete from obj$ where obj# = …|
|0.18||10||0.02||64.60||0.00||96.35||cvn54b7yz0s8u||select /*+ index(idl_ub1$ i_id…|
|0.18||375||0.00||63.34||43.38||53.87||95mpkn5xz9001||insert into hist_head$(obj#, i…|
|0.18||22||0.01||62.59||0.00||71.91||dr277b6yv83uy||MMON_SLAVE||SELECT COLUMN_NAME FROM DBA_PA…|
|0.18||671||0.00||62.17||17.68||0.00||5ms6rbzdnq16t||select job, nvl2(last_date, 1,…|
|0.18||1||0.18||61.81||8.89||96.19||7f8bfqbjxdssr||delete from WRH$_LOG tab where…|
|0.17||1||0.17||61.43||17.89||95.77||7g732rx16j8jc||insert into WRH$_SERVICE_STAT …|
|0.16||1||0.16||54.88||80.11||0.00||dayq182sk41ks||insert into wrh$_memory_target…|
|0.15||252||0.00||54.58||30.21||39.65||7ng34ruy5awxq||select i.obj#, i.ts#, i.file#,…|
|0.15||1||0.15||54.51||80.65||0.00||bm2pwrpcr8ru6||select sga_size s, sga_size_fa…|
|0.15||1||0.15||51.42||106.87||0.00||bunssq950snhf||insert into wrh$_sga_target_ad…|
|0.14||1||0.14||49.59||0.00||93.41||084z6qq3wwkdc||delete from WRH$_SQL_WORKAREA_…|
|0.14||1||0.14||48.68||0.00||90.79||17g3cqghntwb6||delete from WRH$_PGA_TARGET_AD…|
|0.13||22||0.01||44.60||0.00||35.38||7kc5p5bw4uz55||update tabpart$ set dataobj# =…|
|0.12||1||0.12||41.02||26.79||91.44||83hpfc7nhjpvg||delete from WRH$_IOSTAT_FILETY…|
|0.12||22||0.01||40.72||121.45||16.37||267s83vd49cfa||MMON_SLAVE||SELECT PARTITION_NAME FROM DBA…|
|0.11||66||0.00||40.29||13.64||87.09||9fg7fjbwmv7v8||insert into sys.wri$_optstat_t…|
|0.11||1||0.11||39.34||13.97||65.25||9xcfxn794wp5h||select 1 from WRH$_LATCH_CHILD…|
|0.11||1||0.11||37.29||0.00||98.77||3hy3nssu9shxj||select /*+ FIRST_ROWS(1) PARAL…|
|0.10||6||0.02||36.11||0.00||63.61||9wncfacx0nj9h||insert into smon_scn_time (thr…|
|0.10||2||0.05||35.13||0.00||1.98||934ur8r7tqbjx||SELECT DBID FROM V$DATABASE|
|0.10||1||0.10||33.93||0.00||74.29||1v44r7vam2wbt||delete from WRH$_IOSTAT_FUNCTI…|
|0.10||1||0.10||33.68||0.00||96.85||cs78htw2dhb0x||delete from WRH$_IC_DEVICE_STA…|
|0.09||66||0.00||33.17||0.00||92.58||gc7b0drtzbyc6||select max(intcol#) from hist_…|
|0.09||89||0.00||33.07||0.00||92.64||8swypbbr0m372||select order#, columns, types …|
|0.09||1||0.09||32.97||16.66||86.94||8h77nkt8bnukh||delete from WRH$_IOSTAT_DETAIL…|
|0.09||1||0.09||31.42||17.49||83.52||7gfhbhxrxx8wf||delete /*+ dynamic_sampling(4)…|
This section can help determine which queries may be creating performance issues; using the sql_id and text you can retrive the plans to see if a rewrite may improve the situation. It can also show where statistics may not be current, although it would also be a good idea to run a 10053 trace against each suspect query to report what statistics the optimizer is using.
The ASH, or Active Session History, report digs deeper into important wait events and timings, providing a better picture of the wait activity during the requested window; a section of this report is shown below:
|SQL ID||Planhash||Sampled # of Executions||% Activity||Event||% Event||Top Row Source||% RwSrc||SQL Text|
|572fbaj0fdw2b||1648993636||1||7.69||CPU + Wait for CPU||5.77||SELECT STATEMENT||5.77||select output from table(dbms_…|
|572fbaj0fdw2b||1648993636||1||7.69||db file sequential read||1.92||** Row Source Not Available **||1.92|
|6ajkhukk78nsr||1||3.85||CPU + Wait for CPU||3.85||** Row Source Not Available **||3.85||begin prvt_hdm.auto_execute( :…|
|1bx8mgs8by25x||2223429142||1||1.92||CPU + Wait for CPU||1.92||HASH JOIN||1.92||select coalesce( p1s.parameter…|
|2asgk01xtb6p0||3702291220||1||1.92||control file sequential read||1.92||FIXED TABLE – FULL||1.92||select d.dbid dbid , d.name db…|
|317v5hnvvd49h||1656119837||1||1.92||CPU + Wait for CPU||1.92||UPDATE||1.92||UPDATE wrh$_seg_stat_obj ob S…|
Again, use this report to initially determine if the problem is database-related; if that is the case then this report should provide a wealth of data aimed at directing your database/query tuning efforts. If there is no indication of database problems then other areas are likely at fault. Waits to examine from both reports include SQL*Net more data to client, Disk file operations I/O and os thread startup. The first wait in that list can be an indicator of network problems, the second storage issues (array problems, pending disk failures) or CPU issues (excessive time for O/S thread startups can indicate over-taxed CPUs which may be caused by other, non-database processes).
After examining reports generated by AWR, ASH or Statspack you find the database is not the source of the performance issue you may be able to discern which area outside of the database is the likely suspect. SQL*Net more data to client waits, if the time per wait is relatively high, can indicate problems in the network layer. Utilities such as netstat can show configured network interfaces and network traffic data on those interfaces. Example output is shown below:
$ netstat -ginw Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg bondeth0 1500 0 5084620269 0 0 0 3923352780 0 0 0 BMmRU bondeth0:1 1500 0 - no statistics available - BMmRU bondeth0:3 1500 0 - no statistics available - BMmRU bondeth0:4 1500 0 - no statistics available - BMmRU eth0 1500 0 42182099 0 0 0 21397203 0 0 0 BMRU eth4 1500 0 305021 0 0 0 281488 0 0 0 BMsRU eth5 1500 0 5084315248 0 0 0 3923071292 0 0 0 BMsRU ib0 64000 0 2419856775 0 1 0 2178061942 3 271 0 BMsRU ib1 64000 0 21305668 0 0 0 0 0 0 0 BMsRU lo 16436 0 4495752311 0 0 0 4495752311 0 0 0 LRU $
The utility reports transmit and receive statistics, including dropped packets and packet errors. Running netstat -ginw may show which local interface is having issues; send such output to the Network Administrator so he or she can interpret the results and take appropriate action.
What if the problem isn’t the database, and it isn’t the network? The next area that can adversely affect performance is the storage tier. Statspack and ASH/AWR reports also provide disk read and write wait information. Again, if the wait time for an individual read or write wait is high (for example, a single wait over 300 milliseconds) it may be the storage that’s at fault. As disks age their performance can deteriorate as the alignment the platter had when new is compromised by wear and tear around the spindle. Firmware can also be an issue, especially when disks are upgraded and the firmware is not. This is not an area for the DBA to assess, but it is one he or she can report to the Storage Administrator for further investigation.
Network-related issues may also be a problem with either the local network interface card (NIC) or the cabling. This is, again, a task for the Network Administrator to assess and address, but you may be able to ‘point’ him or her in the right direction with output from the ping utility. Provided you have two or more systems that can ‘see’ the same endpoint a ping comparison may show network problems local to the database server. For example if server A can ping server C with response times less than 20 milliseconds and server B, using the same subnet, reports ping times in the 100 millisecond or greater range this could indicate either cabling issues or a NIC that may be misconfigured or failing. This isn’t likely to be valid if the two servers are on different subnets which is why it’s necessary to get the Network Administrator involved as soon as you suspect a network issue. He or she can provide needed configuration data and can better analyze and diagnose such problems.
And yet another area where performance can suffer is at the operating system level. A system starved for CPU, memory or both because of a temporarily high load can severly slow down database processes along with many other non-database-related services. The System Administrator may need to be called in on the issue to report what processes were running during a particular period when response time was slow. Remember that the end users equate performance with response time; a large, resource-intensive report that runs once a month can seriously slow down other database activity during that same period. System backups can also consume resources normally available for user and database processes. Knowing the period of poor performance can greatly assist the System Administrator investigating the issue. If it’s a defined window, for example from 8 PM to Midnight on Tuesdays, the System Administrator can track down the process causing the slowdown by monitoring the system activity during that period. It may be a process that can’t be stopped or rescheduled, but at least the end users will know what is causing their normally fast transactions to run much slower than usual. As mentioned before when you know it’s not a database issue get the Administrators of the suspected systems involved, because the earlier you get more eyes on the problem the faster the cause can be found and a solution put in place.
The DBA is the ‘go-to’ person when the database is to blame for performance issues (long run times for specific queries, as an example). But, when the DBA finds nothing awry in the database, other parties may need to get involved. Again, just because it seems to be the database at fault doesn’t mean that the database IS at fault; it could be other, related areas affecting performance. Knowing what database reports can be of use is the first step in being able to assess a situation so other areas, such as network and storage, can be brought in if need be. Knowing who to involve is just as important as knowing that a performance issue exists. Communication is key in these situations; a network or storage administrator may ask for additional information regarding the problem; this is where the AWR and/or Statspack reports may be helpful. Learn how to generate these reports now (if you don’t already know) so that when you need them you can concentrate on the issues at hand rather than how to generate a report.
Performance is a term that can have a broad interpretation, depending upon who is doing the interpreting. End users may more often report performance issues when a report that ran fine yesterday now takes much longer to complete because all they may see is their process running slowly. Knowing when the problem is the database and when it isn’t is key to getting such issues diagnosed and resolved. Hopefully this article has given you some tools to help in that regard; use them wisely and communicate well so that you can be part of the solution by knowing who else to involve when it isn’t the database that is at fault.
Now, on with the show!