DB2 Update: "DB2 and the Procedural DBA"

Craig S. Mullins Database Performance Management
Return to Home Page
As published in eServer Magazine, Mainframe Edition, April 2002
	A DB2 for z/OS Performance Roadmap By Craig S. Mullins Assuring optimal performance is one of a database administrator's (DBA's) biggest ongoing challenges. The loudest complaints about poor performance come from the users who must wait longer than they are used to waiting for their applications to respond. No one likes waiting, especially one who's unaccustomed to waiting. But what causes those once-fast applications to stall and deliver sub-par performance? If there were an easy answer to that question, many DBAs would be out of work. There's no short answer to this question, either. But, as a starting point, let’s examine the fundamentals of performance management and optimization and come up with a roadmap for tuning DB2 databases and applications. Every database application, at its core, requires three components in order to operate: the system, the database and the application. To deliver performance, the DBA must be able to monitor and tune each of these components. This is easier said than done. Tuning the System The *system* consists of the system software and hardware required for the application to provide service. This includes the computer itself, its disk subsystems, network connections and all peripherals. From a software perspective, the system includes the operating system, the file system, the DBMS itself, networking protocols and any related middleware such as transaction processors or message queues. To deliver consistent system performance, the DBA must have the resources to monitor, manage and optimize the performance of these disparate pieces of hardware and software. Some of the tasks required for system tuning include the proper allocation and management of memory structures (e.g., buffer pools, program cache area, etc.), storage management, integration of the DBMS with other system software, proper usage of database logs, and coordination of the operating system resources used by the DBMS. Additionally, the DBA must control the installation, configuration and migration of the DBMS software. If the system isn't performing properly, everything that uses the system will perform poorly. In other words, a poorly performing system impacts every database application. When managing DB2 system performance, the DBA's first job is ensuring that all of the allied agent address spaces are tuned and connected properly. This includes, but is not limited to, CICS, IMS/TM, TSO, WebSphere, and DB2 Connect. If not configured and tuned appropriately, all of these system software components interact with, and can affect the performance of, DB2 applications. For example, when DB2 data is accessed using CICS, multiple threads can be active simultaneously, giving multiple users concurrent access to a DB2 subsystem of a single CICS region. A mechanism named the CICS Attach Facility connects CICS with DB2. Using the CICS Attach Facility, you can connect each CICS region to only one DB2 subsystem at a time. You can connect each DB2 subsystem, however, to multiple CICS regions simultaneously. DB2 provides services to CICS via MVS TCBs. All of these TCBs reside in the CICS address space and perform cross-memory instructions to execute the SQL code in the DB2 database services address space (DSNDBM1). Furthermore, the resource control table (RCT) must be configured for each DB2 program that will run under CICS. The RCT applies only to CICS transactions that access DB2 data; it defines the manner in which DB2 resources will be used by CICS transactions. In particular, the RCT defines a plan for each transaction that can access DB2. Additionally, it defines parameters detailing the number and type of threads available for application plans and the DB2 command processor. DB2 DBAs also must ensure that appropriate DB2 system parameters are set using DB2 commands and DSNZPARMs. One of the most important areas for tuning here is memory usage. DB2 uses memory for buffer pools, the EDM pool, RID pool and sort pools to cache data and structures in memory. The better memory is allocated to these structures, the better DB2 will perform. When allocating DB2 buffer pools, keep these rules of thumb in mind: Don't allocate everything to a single buffer pool (e.g., BP0); use a multiple buffer pool strategy. Explicitly specify a buffer pool for every table space and index. Isolate the DB2 Catalog in BP0; put user and application DB2 objects into other buffer pools. Consider separating indexes from table spaces with each in their own dedicated buffer pools. Consider isolating heavily hit data into its own buffer pool to better control performance. Consider isolating sorts into a single buffer pool and tuning for mostly sequential access (e.g. BP7). Consider separating DB2 objects into separate buffer pools that have been configured for sequential verses random access. Forget about trying to follow a cookie-cutter approach to buffer pool management. Every shop must create and optimize a buffer pool strategy for its own data and application mix. DB2 offers the following buffer pool tuning “knobs” that can be used to configure virtual buffer pools to the type of processing they support: DWQT –this value is the deferred write threshold; it is expressed as a percentage of the virtual buffer pool that might be occupied by unavailable pages. When this threshold is reached DB2 will start to schedule write I/Os to externalize data. The default is 50%, which is likely to be too high for most shops. VDWQT – this value is the vertical deferred write threshold; it is basically the same as DWQT, but for individual data sets. The default is 10%, which once again is quite likely to be too high for many shops. VPSEQT – this value is the sequential steal threshold; it is a expressed as a percentage of the virtual buffer pool that can be occupied by sequentially accessed pages. Tune buffer pools for sequential access such as scans and sorting by modifying VPSEQT to a larger value. The default is 80%. VPPSEQT – this value is the sequential steal threshold for parallel operations; the default value is 50%. VPXPSEQT – this value is assisting parallel sequential threshold; it is basically the VPPSEQT for operations from another DB2 subsystem in the data sharing group. These parameters can be changed using the ALTER BUFFERPOOL command. Additionally, hiperpools can be created to back up DB2 virtual buffer pools with additional memory. DB2 provides several tuning knobs for hiperpools, too, including HPSIZE to adjust the size of hiperpools and HPSEQT to adjust the hiperpool sequential steal threshold. The EDM pool is used for caching internal structures used by DB2 programs. This includes DBDs, SKCTs, CTs, SKPTs, and PTs. It also includes the authorization cache for plans and packages, as well as the cache for dynamic SQL mini-plans. As a general rule, you should shoot for an 80 percent hit rate with the EDM pool; this means that only one out every five times should a structure need to be loaded from disk into the EDM pool. Finally, remember that buffer and EDM pool tuning are in-depth subjects that cannot be adequately covered in a high-level article such as this. Additionally, there is much more to proper DB2 system performance tuning than allied agent and memory tuning. Other system elements requiring attention include locking, logging, and Parallel Sysplex configuration and management for DB2 data-sharing shops. Tuning the Databases The second component of DB2 performance tuning is making sure the *database* is optimally created and maintained. The database stores the data that is used by the application. When the application needs to access data, it does so through DB2 to the database of choice. If the database is not optimally organized or stored, the data it contains will be difficult or slow to access. The performance of every application that requires this data will be negatively impacted. The first component of database optimization is assuring an optimal database design. Database design is the process of transforming a logical data model into a physical database design and then implementing the physical model as an actual database. Proper database design requires up-front data modeling and normalization – in other words, a logical data model is required before you can even begin to design a physical database. Assuming a logical model exists, it should be used as the basis for creating the physical database. The physical database is created by transforming the logical data model into a physical implementation based on an understanding of the DBMS to be used for deployment. But a one-to-one mapping of logical entity to physical table is unlikely to result in an optimal physical database design. Successfully developing a physical database design requires a good working knowledge of DB2's features. More precisely, the DBA must possess in-depth knowledge of physical database objects supported by DB2 as well as the physical structures and files required to support those objects. This must include knowledge of the manner in which DB2 supports indexing, referential integrity, constraints, data types and other features that augment the functionality of database objects. Armed with the correct information, the DBA can create an effective and efficient database from a logical data model. The DBA creating the physical DB2 database implementation should keep these rules of thumb in mind when building the databases: Keep the physical database as normalized as possible. However, performance should win out over aesthetics. In other words, don’t let data modelers dictate “physical” design. If you need to reorder columns in the table to optimize logging, do it. In general, put each table in its own table space. Exceptions can be made for very small tables such as code and reference tables. In general, favor partitioned and segmented table spaces over simple table spaces. And don’t specify a DSSIZE greater than 4 GB unless you really need a large table space (doing so will waste space). Do not create base table views. Use NULL sparingly. Use appropriate DB2 data types (e.g., use DATE for dates instead of CHAR or numeric data types). Consider using DB2 compression instead of using VARCHAR columns. With compression, there is less overhead and no programmatic handling is required. Favor using DB2 declarative RI instead of programmatic RI. It will usually perform better and is easier to administer. However, it's generally a bad idea to use RI for lookup and reference tables. Avoid the DDL defaults – they are usually wrong. Explicitly code every single DDL option when creating DB2 objects. Calculate the appropriate amount of free space for both PCTFREE and FREEPAGE based on the frequency of modification. Do not just default every table space to 10 percent free space. DB2 DBAs should always keep in mind that a proper indexing strategy can be the primary factor to ensure optimal performance of DB2 applications. To create appropriate indexes, the DBA must have the SQL that is to be used against the DB2 objects in order to understand the access patterns. But many times the DBA is called upon to create indexes well in advance of the creation of any SQL. This can be a challenging dilemma – how to create indexes to optimize SQL performance without any knowledge of the SQL. DBAs can develop some basic rules of thumb though. These steps can be used as a basic roadmap for DB2 index creation: First, take care of unique and primary key constraints by creating unique indexes on those columns. Then consider creating indexes for each foreign key to optimize RI. The next step is to examine the most heavily used queries. Look at the predicates and build indexes to support these queries. Consider overloading indexes by adding columns to encourage index only access. Next, examine queries that invoke sorts. Look for ORDER BY, GROUP BY, UNION, and SELECT DISTINCT. Consider building indexes to support these clauses so DB2 can avoid initiating an expensive sort. Look at your indexes and make sure you've chosen the first column wisely. In general, the first column of the index should have a high cardinality. In general, consider avoiding indexes on variable columns because DB2 will expand the VARCHAR column to its full length in the index. After coming up with a “first stab” indexing strategy consider the insert, update, and delete implications of those indexes. If the columns of those indexes must be modified in any way, DB2 will incur additional overhead keeping the indexes up-to-date. Over time, as data is modified and updated, DB2 will have to move the data around within the database. Such activity causes the data to become fragmented and inefficiently ordered. The longer the database remains online and the more changes made to the data, the more inefficient database access can become. To overcome disorganized and fragmented databases, the DBA can run a reorganization utility to refresh the data and make the database efficient once again. But the key to successful reorganization is to reorganize only when the database requires it; instead, some companies over-reorganize by scheduling regular database reorganization jobs to be run whether the database is fragmented, or not. This wastes valuable CPU cycles. But reorganization is only one of many database performance tasks performed by the DBA. Others include data set placement, partitioning for parallel access, managing free space, and assuring optimal compression. Tuning the Applications The third, and final, component of database performance is the *application* itself. Indeed, as much as 80 percent of all database performance problems are caused by inefficient application code. The application code consists of two parts: the SQL code and the host language code in which the SQL is embedded. SQL is simple to learn and easy to start using. But SQL tuning and optimization is an art that takes years to master. Some general rules of thumb for creating efficient SQL statements include: Let SQL do the work instead of the application program. For example, code an SQL join instead of two cursors and a programmatic join. Simpler is generally better, but complex SQL can be very efficient. Retrieve only the columns required, never more. Retrieve the absolute minimum number of rows by specifying every WHERE clause that is appropriate. When joining tables, always provide join predicates. In other words, avoid Cartesian products. Favor using Stage 1 and Indexable predicates. Avoid sorting if possible by creating indexes for ORDER BY, GROUP BY, and DISTINCT operations. Avoid black boxes – that is, avoid I/O routines that are called by programs instead of using embedded SQL. Avoid deadlocks by updating tables in the same sequence in every program. Issue data modification statements (INSERT, UPDATE, DELETE) as close as possible to the COMMIT statement as possible. Be sure to build a COMMIT strategy into every batch program that changes data. Failing to COMMIT can cause locking problems. Every DBMS provides a method of inspecting the actual access paths that will be used to satisfy SQL requests. The DBA must thoroughly understand the different types of access paths, and know which ones are best in a given situation. For DB2, the DBA must be able to interpret the output of the access path explanation produced by EXPLAIN. This information is encoded in the PLAN_TABLE and must be interpreted. To make matters more difficult, the PLAN_TABLE doesn't contain 100 percent of the information required to determine if the SQL will perform efficiently. The DBA (and, indeed, programmers too) must be able to read the PLAN_TABLE in conjunction with the SQL code, host language code, and information from the DB2 catalog to judge the efficiency and effectiveness of each SQL statement. Host language code refers to the application programs written in C, COBOL, Java, Visual Basic or the programming language du jour. SQL statements are usually embedded into host language code and it is quite possible to have finely tuned SQL inside of inefficient host language code. And, of course, that would cause a performance problem. The Bottom Line DBAs must understand all three aspects of database performance management. The system, the databases, and the applications all must be monitored and tuned to assure efficient DB2 performance. Furthermore, the DBA must be able to identify performance problems as they occur. Once identified, each problem must be analyzed to determine its cause. And only then can a proper tuning strategy be deployed to rectify the problem. The guidelines in this article can help, but remember – each performance management situation is different and general rules of thumb may not apply. This article only touches the tip of the iceberg when it comes to DB2 performance tuning and management. Database performance management is a complex and ever-changing discipline that requires rigorous attention and focus to master. And then, just when you seem to understand things, that new version of DB2 will come out that changes everything! But this is good. It might make life more difficult as you learn the latest DB2 nuances, but new releases invariably help to make DB2 applications more efficient. Good luck with DB2 for z/OS and happy performance tuning! From eServer Magazine, Inaugural Mainframe Edition, April 2002. © 2002 Craig S. Mullins, All rights reserved. Home.