In 2008 Oracle introduced Exadata, and data warehousing, business intelligence and OLTP applications have benefited from this integrated technology stack. Exadata is not like anything most DBAs have ever seen or managed; it’s a tuned combination of database server, storage server and private interconnects that truly can outperform commodity hardware configurations. Let’s take a peek inside Exadata and get a glimpse of what Oracle hath wrought.
An Exadata machine is more than just the parts that comprise it. In its smallest configuration the rack houses two database servers running Oracle Enterprise Linux, a storage server managing storage ‘cells’ (running its operating systemn known as ‘cellsrv’) and a private interconnect known as InfiniBAND to allow the database servers to talk, albeit indirectly, to the storage server. Typically in this configuration the raw storage adds up to 104 terabytes, and it’s configured with normal redundancy providing, all told, 52 terabytes of storage. Accounting for the operating system and other management software the usable storage runs around 39 TB divided among diskgroups earmarked for database storage, recovery and an interesting feature known as DBFS (more on that later). How does all of this hardware, software, memory and storage work together? Quite nicely as Oracle has done an excellent job of creating a database appliance that can dramatically improve performance over conventional setups.
Exadata is built to efficiently use the resources to return results to the end user. Toward that end it has been designed to offload query tasks to the storage server, when possible, to reduce the amount of data passed to the calling session. Certain criteria must be met to offload to the storage server:
* Full table or index scans must be used
* Direct path reads must be used to access the data
If the first requirement is not met another option can trigger the offloading mechanism:
* Parallel query is in use
Parallel query, by default, uses direct path reads so even if full scans aren’t used offloading can occur. What is offloading? Exadata is smart enough to know that some operations can be transferred to the storage server and serviced by the cells directly resulting in far less data traversing the InfiniBAND ‘pipe’. Offloading itself doesn’t do this, it’s what Exadata does with the offloaded tasks that makes the difference. Offloading triggers a new execution path element called Smart Scans, which use another new item, Storage Indexes. Storage indexes are created in memory (so they’re resident as long as cellsrv is not restarted) and can contain up to 8 columns per storage segment. For Storage Indexes the storage cells are divided into segments of 1 MB and each segment has its own Storage Index. This configuration allows Exadata to create Storage Indexes such that on one segment 8 columns of table A are indexed, on the next segment 8 columns of table B are indexed, on the next segment 8 different columns from table A are indexed, and so on. Each storage index records the lowest and highest values from each column in the index, and also include NULL values. Indexing NULLs gives Storage Indexes their true power, as these are not ‘where do I find this value’ indexes as normal B-Tree indexes are. Storage indexes are used to tell Exadata where to NOT look for data; Exadata will skip over storage segments that have no possibility of containing the desired values and this dramatically speeds searches and data retrieval. If a Smart Scan is effected it can return data in mere seconds rather than the minutes it could take using a conventional database configuration (database server, switch, network storage array); remember, though, that not all queries submitted to Exadata will qualify for Smart Scans. That being the case the integrated design of the Exadata machine enables it to outperform conventional setups in most cases, with or without using a Smart Scan.
How do you know you’ve used a Smart Scan? The ‘cell smart table scan’ wait event or the ‘cell smart index scan’ wait event will show activity indicating a Smart Scan occurred. Nothing in the execution plan would be displayed to prove a Smart Scan was used; querying v$waitstat would be the only way to confirm that a Smart Scan was implemented.
Another interesting aspect of Exadata is that the database servers have no direct access to the storage cells. ASM is configured to be the bridge between the database and storage servers. The storage server has its own operating system and Oracle database which is how Smart Scans are executed. All of the components are integrated, which can be a challenge for organizations which separate UNIX/Linux system administration duties, Oracle DBA responsibilities, network configuration and storage duties as patching and maintenance for the Exadata machine is performed using bundles that ensure no software/firmware mismatches occur. Such mismatches could result in Exadata features not working properly or, at the worst case, failing to work at all. Oracle suggests a new role, the DMA (Database Machine Administrator) that has both Oracle admin rights and O/S admin rights (root) so that patching can be performed and verified by personnel trained in Exadata machine management. It is possible to divide the responsibilities provided there is good communication between the groups during maintenance outages.
Getting back to parallel query there is a new way, in 11.2, to manage parallel executions across sessions which is extremely effective in the Exadata environment — parallel statement queuing. This works in concert with another new feature, Auto Degree of Parallelism, which can dynamically reconfigure the parallelism during query execution. Auto DOP is controlled by the parallel_degree_policy parameter and when set to auto will implement auto DOP. No objects need to have DEGREE set as Oracle will determine, at run time, the optimal DOP for the query provided it will take longer than parallel_min_time_threshold, which defaults to 10 seconds. (As usual this parameter can also be set to a different value to enable more, or fewer, queries to run in parallel.) Oracle also provides some measure of control on how it determines the ideal DOP with the parallel_degree_limit parameter which defaults to CPU but can be set to CPU, IO, AUTO or an integer value. None of this will work on 11.2 if I/O calibration has not been completed. This is accomplished with the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure; it takes two IN parameters, the number of disks and the maximum disk latency, and three OUT parameters, placeholders for latency, maximum i/o per second and the maximum MB per second. The procedure can take as long as 15 minutes to run as it generates a random workload spread across all nodes in the RAC cluster. Once completed Auto DOP can be enabled and will function as expected. If the calibration has not been done then Oracle will bypass Auto DOP and report in the execution plan that the IO calibrate statistics are missing. Getting back to parallel statement queuing once parallel_degree_policy is set to auto and parallel_servers_target is set to a ‘reasonable’ value (‘reasonable’ meaning the desired maximum number of parallel servers you want to have running at one time) you’re ready to go. Queries will run using Auto DOP until the system hits the limit you set with parallel_servers_target at which time the remaining queries will be queued until resources (parallel query slaves) are available. By default it’s a First In / First Out queue — this behavior can be changed but I won’t cover that in this article. As slaves are made available by prior queries completing the next query in line is run and this continues until the system has freed enough resources to run the remaining queries in the queue or the queue comes to end. Monitoring the queue progress is fairly simple as the information is found in the V$SQL_MONITOR view; if the STATUS is QUEUED then the query is waiting for resources to be made available. V$PX_PROCESS_SYSSTAT can report how many slaves are in use; using both views will enable you to determine which queries are queued and when the next query will come out of the queue to be executed.
DBFS is a unique aspect of Exadata that allows Oracle to utilize a dedicated tablespace as a file system. It’s configured through the clusterware and mounted to a given directory at the O/S level to provide additional storage space for items such as expdp dump files, documents, patch bundle zip files, etc. It operates, at the O/S level, as a regular file system. One caveat of this configuration is the tablespace must be monitored for free space; if the free space drops below, say, 300 GB in a 2 TB file system the df -h call can hang and access to the mounted directories will be compromised. A simple extension of that datafile restores dbfs to it’s speedy self; since DBFS uses bigfile tablespaces (one per mounted file system) you can’t add another datafile to the tablespace to increase the storage (it’s designed that way to prevent ‘striping’ the O/S data across multiple datafiles which could be a performance hit on reads and writes). On reboots of the server some manual intervention is necessary to start the DBFS resource and mount the file system but this not difficult as the shell script to mount the file system is created in the DBFS setup and configuration procedures. More information on configuring DBFS is found in this Metalink document.
There is so much more to Exadata, more than can be put into a single article which is why there are several books on the subject. (My personal choice is “Expert Oracle Exadata”, from Apress, written by Kerry Osborne, Randy Johnson and Tanel Poder.) The intent of this article is to provide a starting point for the journey into Exadata by discussing some of the features and, well, wonders, of the Exadata machine and I hope my efforts have been successful. I am curently on a project where a client is migrating Oracle ERP databases to Exadata and I’m learning new areas of Exadata every day, and enjoying it immensely. Where once Exadata was a novelty it’s now a proven technology, implemented by many organizations world-wide. Some day soon it may come to you (if it hasn’t already); what I’ve covered here should prepare you for your first experience with Exadata.
X-Men all started somewhere.