Oracle Tips and Tricks — David Fitzjarrell

July 1, 2014

What A Performance!

Filed under: General,Performance,stats — dfitzjarrell @ 10:09

Performance is one of the big watchwords in tht IT industry; nowhere is it more often used than in the database arena, and it’s usually not a good sign. End users know nothing, really, of the inner workings of the database yet they are the first to point the ‘finger of blame’ at the database when the processing speed slows or queries take longer than they would like to return data. [For end users performance is time, and the longer a process takes the more dismal performance seems to be.] Yes, there ARE times when the database is at fault — ‘missing’ indexes or stale statistics can cause execution plans to change, as can inserts and updates to heavily transacted tables (which many times goes back to stale statistics since they now no longer accurately reflect the data and its distribution in the table). Bugs in the database software can also cause issues by causing the gathered statistics to miscalculate values or generate ‘useless’ frequency histograms, such as on single-valued not null columns. Such histograms can direct the optimizer to choose a path it wouldn’t normally choose when the statistics and histograms contain useful data.

Here is where the problem can get complicated: it may not be the database at fault when performance is down. Yes, it’s the database server you’re geting data from. Yes, things were running just fine until today. By ‘logical’ deduction it’s the database at fault, right? Not necessarily. The database also has to deal with the operating system, the storage system, the available memory and the network to get its work done. Problems in any of those areas are likely to adversely affect database performance. So it’s not necesarily the database causing the problem. How do you know where the problem originates? Oracle does provide some tools to help in assessing where a problem is likely to be, but you may also need to involve System Administrators, Network Administrators and/or Storage Administrators to fully investigate some issues. Let’s take a look at the Oracle supplied tools, along with some operating system utilities that can be used to diagnose performance issues.

At the first sign of trouble many look to the DBA to investigate the database for bottlenecks or other performance-robbing possibilities that surely must exist. Oracle provides at least two tools in releases 10.1 and later to investigate the database for performance problems — the AWR report, a general overall ‘picture’ of the database ‘health’ that can give direction to such a quest and the ASH report, a more detailed ‘picture’ for a given window of activity that provides more detailed wait information. These utilities should be used at the first report of performance issues to determine if the problem is database-related or caused by another area, such as the operating system or network. Starting with the AWR (Automatic Workload Repository) report the two sections to begin an investigation, in my estimation, are Wait Events Statistics and SQL Statistics. The first provides a system-wide view of the waits encountered and the times those waits consumed. Background Wait Events and the Wait Event Histogram sections can provide some insight into what was occurring during the report window; waits to watch are I/O related waits (reads, writes) which can signal issues with a query plan or issues with the storage (issues not addressable by the DBA). The SQL Statistics list SQL statements in order of several areas, including overall elapsed time, consumed CPU time, I/O wait time and physical reads. The SQL statistics can usually point to ‘bad’ plans, making performance problems a database issue because such plans are the result of bad or stale statistics. In 11.2 and later releases dynamic sampling can mitigate missing statistics by automatically sampling the data prior to generating a query execution plan. We won’t get into the details of using these reports to effect database tuning procedures as that’s a topic for another article. The intent here is to provide a means to determine IF the problem is a database issue or if it’s caused by outside ‘forces’. For releases earlier than 10.1 Oracle provides Statspack, the pre-cursor to AWR and the next step after the original database snapshot utilities utlbstat and utlestst. This utiility can provide a very good snapshot of database activity, although not quite as robust as the AWR report. It is still available in 11.2 if you would like to install it. AWR and ASH are utilities which are part of the Diagnostic and Tuning Pack for OEM, and thus need to be licensed if they are to be used. Statspack doesn’t require such a license.

The “Top 5 Timed Foreground Events” section gives an overview of the wait events which consumed the most time during the report window:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   0 82.42  
log file sync 8 0 11 30.72 Commit
Disk file operations I/O 1 0 1 0.33 User I/O
os thread startup 0 0   0.00 Concurrency
log file single write 0 0   0.00 System I/O

Host CPU (CPUs: 4 Cores: 2 Sockets: 1)

This provides a quick look at the ‘worst offenders’ in the wait event category (although they may not be performance-robbing events). Start here then work down to the Operating System Statistics section:

Operating System Statistics – Detail

Snap Time Load %busy %user %sys %idle %iowait
20-Jun 07:04:42 0.00          
20-Jun 08:00:45 0.00 7.12 4.65 2.48 92.88 0.00

The Wait Event Histogram Detail can be very informative:

Wait Event Histogram Detail (64 msec to 2 sec)

  • Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
  • Units for % of Total Waits: ms is milliseconds s is 1024 milliseconds (approximately 1 second)
  • % of Total Waits: total waits for all wait classes, including Idle
  • % of Total Waits: value of .0 indicates value was <.05%; value of null is truly 0
  • Ordered by Event (only non-idle events are displayed)
    % of Total Waits
Event Waits 64ms to 2s <32ms <64ms <1/8s <1/4s <1/2s <1s <2s >=2s
ADR block file read 3 81.3 12.5 6.3          
Disk file operations I/O 1 98.2 1.8            
control file parallel write 15 98.7 .8 .3 .1        
control file sequential read 139 94.0 4.7 .8 .3 .1   .0  
db file parallel read 2   50.0   50.0        
db file parallel write 185 91.4 5.7 1.8 .8 .1 .2    
db file scattered read 63 93.5 5.0 1.2 .3        
db file sequential read 142 94.2 4.5 1.2 .1 .0      
direct path read 1     100.0          
enq: CR – block range reuse ckpt 16 82.2 15.6   1.1   1.1    
log file parallel write 82 92.5 4.6 2.6 .3 .1      
log file switch completion 2     50.0 50.0        
log file sync 1 90.0 10.0            

Wait Event Histogram Detail (4 sec to 2 min)

No data exists for this section of the report.

Wait Event Histogram Detail (4 min to 1 hr)

No data exists for this section of the report.

Service Statistics

  • ordered by DB Time
Service Name DB Time (s) DB CPU (s) Physical Reads (K) Logical Reads (K)
SYS$USERS 0 0 0 2
SYS$BACKGROUND 0 0 8 436
smedley 0 0 0 0
smedleyXDB 0 0 0 0

Such histograms can show where the preponderance of waits lie making it easier to determine if they are problematic.

Next on the list is the SQL Statistics report. There are several sections, the one shown is ordered by elapsed time:

SQL ordered by Elapsed Time

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
  • %Total – Elapsed Time as a percentage of Total DB time
  • %CPU – CPU Time as a percentage of Elapsed Time
  • %IO – User I/O Time as a percentage of Elapsed Time
  • Captured SQL account for 1.1E+04% of Total DB Time (s): 0
  • Captured PL/SQL account for 2.3E+03% of Total DB Time (s): 0
Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
5.88 1 5.88 2070.31 15.66 79.60 acc988uzvjmmt   delete from WRH$_SYSMETRIC_HIS…
4.60 22 0.21 1618.80 27.16 64.39 70vs1d7ywk5m0 MMON_SLAVE begin dbms_stats.copy_table_st…
2.49 44 0.06 876.07 10.66 83.39 2mp99nzd9u1qp   delete from histgrm$ where obj…
2.14 769 0.00 753.49 4.38 96.01 db78fxqxwxt7r   select /*+ rule */ bucket, end…
1.57 212 0.01 551.64 21.91 74.49 43ruajmwv9fus   insert into histgrm$(obj#, int…
1.42 1 1.42 501.54 7.67 86.00 0sfsrf0qzvsjv   delete from WRH$_SYSMETRIC_SUM…
1.19 22 0.05 418.37 2.63 93.41 d89c1mh5pvbkz MMON_SLAVE SELECT /*+ rule */ U.NAME, OT….
0.83 1 0.83 291.84 13.18 77.73 2whm2vvjb98k7   delete from WRH$_SQL_PLAN tab …
0.80 22 0.04 281.90 25.34 67.82 8sd37ztuwkyyc MMON_SLAVE begin dbms_swrf_internal.massa…
0.66 1 0.66 231.35 2.38 98.81 28bm5y53nujvz   delete from WRH$_RESOURCE_LIMI…
0.54 1 0.54 188.64 0.00 91.89 4y3yvmbav7uwb   delete /*+ UNNEST(@SEL$2) UNNE…
0.53 22 0.02 185.34 0.00 96.67 63kf3an2j0pkc   insert into sys.wri$_optstat_h…
0.51 1 0.51 179.21 9.20 75.31 96w2cbx7ntmtv   delete from WRH$_BG_EVENT_SUMM…
0.50 1 0.50 175.27 6.27 94.54 31sdu97k3baq1   delete from WRH$_SHARED_POOL_A…
0.49 22 0.02 171.58 12.81 66.96 480utgnq0t6mf MMON_SLAVE begin dbms_stats.unlock_partit…
0.44 1 0.44 156.14 17.60 80.45 1h7zt6jks80pa   delete from WRH$_ENQUEUE_STAT …
0.42 1 0.42 146.27 15.03 94.38 bzscyq07w79ab   delete /*+ dynamic_sampling(4)…
0.38 2,361 0.00 133.20 24.75 80.65 96g93hntrzjtr   select /*+ rule */ bucket_cnt,…
0.35 1 0.35 123.71 4.44 94.32 9v9n97qj8z1dg   delete /*+ dynamic_sampling(4)…
0.35 22 0.02 122.04 27.02 66.45 2rn08n06qr261   update histgrm$ set obj# = dec…
0.34 1 0.34 121.06 0.00 96.73 6ajkhukk78nsr   begin prvt_hdm.auto_execute( :…
0.31 1 0.31 107.67 0.00 97.51 48ffykkmk7sgc   delete /*+ dynamic_sampling(4)…
0.28 1 0.28 99.65 0.00 79.94 350myuyx0t1d6   insert into wrh$_tablespace_st…
0.27 375 0.00 94.36 17.47 87.22 bvn085xhxfn9f   insert into sys.wri$_optstat_h…
0.26 161 0.00 90.72 6.06 90.14 74anujtt8zw4h   select o.owner#, o.name, o.nam…
0.25 1 0.25 86.79 6.33 95.67 47x8f8z2hx1a2   delete from WRH$_MUTEX_SLEEP t…
0.24 1 0.24 85.22 6.45 92.64 az8cj7dfnpp45   delete from WRH$_MEM_DYNAMIC_C…
0.23 1 0.23 81.60 6.73 41.10 3mqvkt9as1phq   merge /*+ dynamic_sampling(mm…
0.22 1 0.22 76.17 0.00 87.96 du2bqfpfj9au2   delete from WRH$_LIBRARYCACHE …
0.21 1 0.21 73.67 14.92 93.36 b7ba0zt64wtc7   delete from WRH$_RSRC_CONSUMER…
0.20 44 0.00 71.59 7.68 71.51 as3uq6ggb3gx6   delete from hist_head$ where o…
0.20 89 0.00 71.52 30.73 92.39 3ktacv9r56b51   select owner#, name, namespace…
0.20 10 0.02 71.14 0.00 99.04 c6awqs517jpj0   select /*+ index(idl_char$ i_i…
0.20 1,098 0.00 70.92 23.24 58.47 3c1kubcdjnppq   update sys.col_usage$ set equa…
0.19 22 0.01 68.12 56.47 8.79 9dy457uf4uxzd MMON_SLAVE SELECT /*+ all_rows */ ‘"’||UI…
0.19 1 0.19 67.57 0.00 0.00 f318xdxdn0pdc   insert into wrh$_log (snap_id,…
0.19 113 0.00 67.01 0.00 84.34 59vjj34vugaav   delete from obj$ where obj# = …
0.18 10 0.02 64.60 0.00 96.35 cvn54b7yz0s8u   select /*+ index(idl_ub1$ i_id…
0.18 375 0.00 63.34 43.38 53.87 95mpkn5xz9001   insert into hist_head$(obj#, i…
0.18 22 0.01 62.59 0.00 71.91 dr277b6yv83uy MMON_SLAVE SELECT COLUMN_NAME FROM DBA_PA…
0.18 671 0.00 62.17 17.68 0.00 5ms6rbzdnq16t   select job, nvl2(last_date, 1,…
0.18 1 0.18 61.81 8.89 96.19 7f8bfqbjxdssr   delete from WRH$_LOG tab where…
0.17 1 0.17 61.43 17.89 95.77 7g732rx16j8jc   insert into WRH$_SERVICE_STAT …
0.16 48 0.00 56.37 48.74 48.27 4h8danbxj5nz6   begin dbms_utility.validate(:b…
0.16 1 0.16 54.88 80.11 0.00 dayq182sk41ks   insert into wrh$_memory_target…
0.15 252 0.00 54.58 30.21 39.65 7ng34ruy5awxq   select i.obj#, i.ts#, i.file#,…
0.15 1 0.15 54.51 80.65 0.00 bm2pwrpcr8ru6   select sga_size s, sga_size_fa…
0.15 1 0.15 51.42 106.87 0.00 bunssq950snhf   insert into wrh$_sga_target_ad…
0.14 1 0.14 49.59 0.00 93.41 084z6qq3wwkdc   delete from WRH$_SQL_WORKAREA_…
0.14 1 0.14 48.68 0.00 90.79 17g3cqghntwb6   delete from WRH$_PGA_TARGET_AD…
0.13 22 0.01 44.60 0.00 35.38 7kc5p5bw4uz55   update tabpart$ set dataobj# =…
0.12 1 0.12 41.02 26.79 91.44 83hpfc7nhjpvg   delete from WRH$_IOSTAT_FILETY…
0.12 22 0.01 40.72 121.45 16.37 267s83vd49cfa MMON_SLAVE SELECT PARTITION_NAME FROM DBA…
0.11 66 0.00 40.29 13.64 87.09 9fg7fjbwmv7v8   insert into sys.wri$_optstat_t…
0.11 1 0.11 39.34 13.97 65.25 9xcfxn794wp5h   select 1 from WRH$_LATCH_CHILD…
0.11 1 0.11 37.29 0.00 98.77 3hy3nssu9shxj   select /*+ FIRST_ROWS(1) PARAL…
0.10 6 0.02 36.11 0.00 63.61 9wncfacx0nj9h   insert into smon_scn_time (thr…
0.10 2 0.05 35.13 0.00 1.98 934ur8r7tqbjx   SELECT DBID FROM V$DATABASE
0.10 1 0.10 33.93 0.00 74.29 1v44r7vam2wbt   delete from WRH$_IOSTAT_FUNCTI…
0.10 1 0.10 33.68 0.00 96.85 cs78htw2dhb0x   delete from WRH$_IC_DEVICE_STA…
0.09 66 0.00 33.17 0.00 92.58 gc7b0drtzbyc6   select max(intcol#) from hist_…
0.09 89 0.00 33.07 0.00 92.64 8swypbbr0m372   select order#, columns, types …
0.09 1 0.09 32.97 16.66 86.94 8h77nkt8bnukh   delete from WRH$_IOSTAT_DETAIL…
0.09 1 0.09 31.42 17.49 83.52 7gfhbhxrxx8wf   delete /*+ dynamic_sampling(4)…

This section can help determine which queries may be creating performance issues; using the sql_id and text you can retrive the plans to see if a rewrite may improve the situation. It can also show where statistics may not be current, although it would also be a good idea to run a 10053 trace against each suspect query to report what statistics the optimizer is using.

The ASH, or Active Session History, report digs deeper into important wait events and timings, providing a better picture of the wait activity during the requested window; a section of this report is shown below:

SQL ID Planhash Sampled # of Executions % Activity Event % Event Top Row Source % RwSrc SQL Text
572fbaj0fdw2b 1648993636 1 7.69 CPU + Wait for CPU 5.77 SELECT STATEMENT 5.77 select output from table(dbms_…
572fbaj0fdw2b 1648993636 1 7.69 db file sequential read 1.92 ** Row Source Not Available ** 1.92
6ajkhukk78nsr   1 3.85 CPU + Wait for CPU 3.85 ** Row Source Not Available ** 3.85 begin prvt_hdm.auto_execute( :…
1bx8mgs8by25x 2223429142 1 1.92 CPU + Wait for CPU 1.92 HASH JOIN 1.92 select coalesce( p1s.parameter…
2asgk01xtb6p0 3702291220 1 1.92 control file sequential read 1.92 FIXED TABLE – FULL 1.92 select d.dbid dbid , d.name db…
317v5hnvvd49h 1656119837 1 1.92 CPU + Wait for CPU 1.92 UPDATE 1.92 UPDATE wrh$_seg_stat_obj ob S…

Again, use this report to initially determine if the problem is database-related; if that is the case then this report should provide a wealth of data aimed at directing your database/query tuning efforts. If there is no indication of database problems then other areas are likely at fault. Waits to examine from both reports include SQL*Net more data to client, Disk file operations I/O and os thread startup. The first wait in that list an be an indicator of network problems, the second storage issues (array problems, pending disk failures) or CPU issues (excessive time for O/S thread startups can indicate over-taxed CPUs which may be caused by other, non-database processes).

After examining reports generated by AWR, ASH or Statspack you find the database is not the source of the performance issue you may be able to discern which area outside of the database is the likely suspect. SQL*Net more data to client waits, if the time per wait is relatively high, can indicate problems in the network layer. Utilities such as netstat can show configured network interfaces and network traffic data on those interfaces. Example output is shown below:

$ netstat -ginw
Kernel Interface table
Iface       MTU   Met      RX-OK RX-ERR RX-DRP RX-OVR      TX-OK TX-ERR TX-DRP TX-OVR Flg
bondeth0     1500   0 5084620269      0      0      0 3923352780      0      0      0 BMmRU
bondeth0:1   1500   0      - no statistics available -                                BMmRU
bondeth0:3   1500   0      - no statistics available -                                BMmRU
bondeth0:4   1500   0      - no statistics available -                                BMmRU
eth0         1500   0   42182099      0      0      0   21397203      0      0      0 BMRU
eth4         1500   0     305021      0      0      0     281488      0      0      0 BMsRU
eth5         1500   0 5084315248      0      0      0 3923071292      0      0      0 BMsRU
ib0         64000   0 2419856775      0      1      0 2178061942      3    271      0 BMsRU
ib1         64000   0   21305668      0      0      0          0      0      0      0 BMsRU
lo          16436   0 4495752311      0      0      0 4495752311      0      0      0 LRU
$

The utility reports transmit and receive statistics, invluding dropped packets and packet errors. Running netstat -ginw may show which local interface is having issues; send such output to the Network Administrator so he or she can interpret the results and take appropriate action.

What if the problem isn’t the database, and it isn’t the network? The next area that can adversely affect performance is the storage tier. Statspack and ASH/AWR reports also provide disk read and write wait information. Again, if the wait time for an individual read or write wait is high (for example, a single wait over 300 milliseconds) it may be the storage that’s at fault. As disks age their performance can deteriorate as the alignment the platter had when new is compromised by wear and tear around the spindle. Firmware can also be an issue, especially when disks are upgraded and the firmware is not. This is not an area for the DBA to assess, but it is one he or she can report to the Storage Administrator for further investigation.

Network-related issues may also be a problem with either the local network interface card (NIC) or the cabling. This is, again, a task for the Network Administrator to assess and address, but you may be able to ‘point’ him or her in the right direction with output from the ping utility. Provided you have two or more systems that can ‘see’ the same endpoint a ping comparison may show network problems local to the database server. For example if server A can ping server C with response times less than 20 milliseconds and server B, using the same subnet, reports ping times in the 100 millisecond or greater range this could indicate either cabling issues or a NIC that may be misconfigured or failing. This isn’t likely to be valid if the two servers are on different subnets which is why it’s necessary to get the Network Administrator involved as soon as you suspect a network issue. He or she can provide needed configuration data and can better analyze and diagnose such problems.

And yet another area where performance can suffer is at the operating system level. A system starved for CPU, memory or both because of a temporarily high load can severly slow down database processes along with many other non-database-related services. The System Administrator may need to be called in on the issue to report what processes were running during a particular period when response time was slow. Remember that the end users equate performance with response time; a large, resource-intensive report that runs once a month can seriously slow down other database activity during that same period. System backups can also consume resources normally available for user and database processes. Knowing the period of poor performance can greatly assist the System Administrator investigating the issue. If it’s a defined window, for example from 8 PM to Midnight on Tuesdays, the System Administrator can track down the process causing the slowdown by monitoring the system activity during that period. It may be a process that can’t be stopped or rescheduled, but at least the end users will know what is causing their normally fast transactions to run much slower than usual. As mentioned before when you know it’s not a database issue get the Administrators of the suspected systems involved, because the earlier you get more eyes on the problem the faster the cause can be found and a solution put in place.

The DBA is the ‘go-to’ person when the database is to blame for performance issues (long run times for specific queries, as an example). But, when the DBA finds nothing awry in the database, other parties may need to get involved. Again, just because it seems to be the database at fault doesn’t mean that the database IS at fault; it could be other, related areas affecting performance. Knowing what database reports can be of use is the first step in being able to assess a situation so other areas, such as network and storage, can be brought in if need be. Knowing who to involve is just as important as knowing that a performance issue exists. Communication is key in these situations; a network or storage administrator may ask for additional information regarding the problem; this is where the AWR and/or Statspack reports may be helpful. Learn how to generate these reports now (if you don’t already know) so that when you need them you can concentrate on the issues at hand rather than how to generate a report.

Performance is a term that can have a broad interpretation, depending upon who is doing the interpreting. End users may more often report performance issues when a report that ran fine yesterday now takes much longer to complete because all they may see is their process running slowly. Knowing when the problem is the database and when it isn’t is key to getting such issues diagnosed and resolved. Hopefully this article has given you some tools to help in that regard; use them wisely and communicate well so that you can be part of the solution by knowing who else to involve when it isn’t the database that is at fault.

Now, on with the show!

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 665 other followers

%d bloggers like this: