Craig S. Mullins
              
Database Performance Management

Return to Home Page

September 1999

 

What's In A Name
A New Approach to Database Naming Standards

By Craig S. Mullins

The establishment and enforcing of naming conventions is often one of the first duties to be tackled when implementing new software.  Adequate thought and preparation is required in order for such a task to be successful.  What amount of effort should be extended in the creation of appropriate database naming standards?  Are current industry standards acceptable?  Shakespeare, many, many years ago, may have said it best when he wrote:

"What's in a name?  That which we call a rose

by any other name would smell as sweet." 

But, if that is true, then why do those of us in the DP industry spend so much time and effort developing and enforcing naming standards?  Maybe what something is called is not quite so trivial a matter after all!

Well, we know what Shakespeare was trying to convey:  the name by which we call something has no affect upon the actual object.  Calling a desk a lamp will not turn it into a lamp.  It is still a desk.  Sometimes DP professionals, and database developers in particular, forget this.  This article will provide a novel approach to database naming standards by ignoring all of the basic generalizations that were foisted upon us in Database 101 class and taking a fresh look at the naming of relational database objects.  The article will concentrate on DB2 objects, but the arguments made in the article are just as valid for other relational databases as well.

Tables and Their Cousins

In a relational database management system the primary data store is the table.  A table consists of multiple rows, each with a fixed and unchanging number of defined columns.  However, most RDBMS provide alternate means of accessing data from these tables.  For example, DB2 allows the following:

ALIAS

An alternate name that can be used in SQL statements to refer to a table or a view in the same or a remote DB2 subsystem.

SYNONYM

An alternate name that can be used in SQL statements to refer to a table or a view in the same DB2 subsystem.  Synonyms are accessible only by the synonym owner.

VIEW

An alternate representation of data from one or more table or views.

Each of these alternate means of access are similar in one way:  they all present data by means of values in rows and columns.  An end user need not know whether he is querying a table, an alias, a synonym, or a view.  To the user the results are the same – data represented by values in rows and columns.

This brings us to our first revision to the common wisdom for database naming standards:

Revision #1:  Use the exact same naming convention for tables, views, aliases, and synonyms.  These four objects all logically refer to the same thing in the relational model – a representation of data in terms of columns and rows. 

It is common for most shops to implement different naming conventions for each of these objects.  Does it really make sense for them to named differently?  Let's examine the pros and cons.

Consider your current table naming conventions.  If your shop is typical, you will find a convention that is similar to the one depicted in Figure 1.  If they are different, pause for a moment to ask yourself why.  This is almost an industry standard for table naming.  You most surely do not want force every DB2 table to begin with a T (or have a strategically embedded T within the table name).  The name of a DB2 table should accurately and succinctly convey the contents of the data it contains.  The naming convention displayed in Figure 1 accomplishes this.

Figure 1.  Typical Table Naming Standard

Prototype:                    loc.cr.aaa_tab
Example:                    CHICAGO.CREATOR.MKT_CONTACT

 

Each table name is composed of the following segments:

 

     loc.              Location Identification;  Unique Remote or Local
                         subsystem identifier as it appears in the
                         SYSIBM.SYSLOCATIONS table in the CDB.

                         Note separator (.) between location and creator.

                         Example:     CHICAGO     (for remote processing facility in Chicago)

 

     cr.                Creator Identification;  creator or owner.
                         Note separator (.) between creator and application system.

                         Example:     PTMKTP01     (for Marketing production database)

 

     aaa_            Application System;  3 characters.

                         Note separator ( _ ) between appl system and table identification.

                         Example:     MKT             (for Marketing system)

 

     tab               Table Identification;  limited to 14 characters.

                         Example:     CONTACT

 

 

So this brings us to our second rule for revising database naming conventions:

Revision #2:  Avoid embedding a "T", or any other character , into table names to indicate that the object is a table.  Likewise, indicator characters should be avoided for any other table-like object (alias, synonym, view).

Although most shops avoid embedding a "T" in table name, many of these same shops do embed a character into alias, synonym, and view names.  The primary reason given is that the character makes it easy to determine what type of object is being accessed just by looking at the name.  There are two reasons why this is a bad idea.  The first is a semantic reason; the second a flexibility issue. 

In semantic terms, an object's name need only identify the object, not the object's type.  Consider the following arguments:

     How are people named?  Usually one can ascertain the gender of someone simply by knowing their name but would you banish all males named Chris, Pat, or Terry?  Or maybe all females named Chris, Pat, and Terry?  After all, men and women are different.  Shouldn't we make sure that all men's names are differentiated from women's names?  Maybe we should start all men's names with an M and all women's names with a W?  If we did, we'd sure have a lot of Marks and Wendys, wouldn't we?  The point here is that context enables us to differentiate men from women, when it is necessary.  The same can be said of database objects.

How are COBOL program variables named?  Do you name your 01, 05, and 77 level variable names differently in your COBOL programs?  For example, do all 01 levels start with "O" (for one), all 05 levels start with "F", and all 77 levels start with "S"?  No?  Why not?  Isn't this the same as forcing views to start with V (or having a strategically imbedded V within the name)?

      What about the naming of pets?  Say I have a dog, a cat, and a bird.  Now, I wouldn't want to get them confused, so I'll make sure that I start all of my dog names with a D, cat names with a C, and bird names with a B.  So, I'll feed C_FELIX in the morning, take D_ROVER for a walk after work, and make sure I cover B_TWEETY's cage before I go to bed.  Sound ridiculous, doesn't it?

The whole point of this tirade is that if we don't manufacture hokey names in the real world, why would we want to do it with our DB2 objects?  There is really no reason to embed special characters into DB2 objects names to differentiate them from one another.  It is very practical and desirable to name DB2 objects in a consistent manner, but that consistent manner should be well thought out and should utilize the system to its fullest capacity wherever possible.

The second reason for this rule is to increase flexibility.  Say, for example, that we have a table that for some reason is significantly altered, dropped, or renamed.  If views are not constrained by rigid naming conventions requiring an embedded "V" in the name, then a view can be constructed that resembles the way the table used to look.  Furthermore, this view can be given the same name as the old table.  This increases system flexibility.

Most users don't care whether they are using a table, view, synonym, or alias.  They simply want the data.  And, in a relational database, tables, views, synonyms, and aliases all logically appear to be identical to the end user: as rows and columns.  It is true that there are certain operations that can not be performed on certain types of views, and users who need to know this will generally be sophisticated users.  For example, very few shops allow end users to update any table they want using QMF, SPUFI, or some other tool that uses dynamic SQL.  Updates, deletions, and insertions (the operations which are not available to some views) are generally coded into application programs and executed in batch or via on-line transactions.  The end user does need to query tables dynamically.  Now you tell me, which name will your typical end user remember more readily when he needs to access his marketing contacts: 

MKT_CONTACT or VMKTCT01? 

Further Arguments For Indicators

Some shops believe they have very valid reasons for embedding an object type indicator character into database objects; view names, in particular.  Let's examine these arguments:

Point            Embedding a V into our view names enables our DBAs to quickly determine which objects are views and which are tables.

Counterpoint            Many of these shops do not embed a T into the table name, but feel that a V in the view name is necessary.  It is believed that the DBA will be able to more easily discern views from tables.  But, rarely do these shops force an S into synonym names or an A into alias names.  Even if they do, it is usually overkill.  Any good DBA already knows which objects are tables and which are views, and if s/he doesn't, a simple query against the system catalog will clarify the matter. 

For example, in DB2, this query will list all table-like objects:

                        SELECT NAME, CREATOR, "TABLE"

            FROM SYSIBM.SYSTABLES

            WHERE TYPE = "T"

            UNION ALL

            SELECT NAME, CREATOR, "ALIAS"

            FROM SYSIBM.SYSTABLES

            WHERE TYPE = "A"

            UNION ALL

            SELECT NAME, CREATOR, "SYNONYM"

            FROM SYSIBM.SYSSYNONYMS

            UNION ALL

            SELECT NAME, CREATOR, "VIEW"

            FROM SYSIBM.SYSVTREE

            ORDER BY 3, 1

 Point   It is necessary to code view names differently so that users understand that they are working with a view and not all operations can be performed on the view.

Counterpoint            But all operations can be performed on some views.  And not all operations can be performed on all tables either!  What if the user does not have the security to perform the operation?  For example, what is the difference, from the user's perspective, between accessing a non-updateable view and accessing a table where only the SELECT privilege has been granted?

Use It or Lose It

Another common problem with database naming conventions is unnecessary size restrictions.  Using DB2 as an example, most objects can have a name up to 18 characters long.  But, in many instances, shops establish naming standards that do not utilize all of the characters available.  This is usually unwise.

Revision #3:  Unless a compelling reason exits, ensure that your standards allow for the length of database object names to utilize every character available.

Refer to Figure 2 for maximum and recommended DB2 object name lengths.  Notice that except for indexes, the recommended length is equal to the maximum length for each object. 

Figure 2.  DB2 Object Reference

Object

Maximum

Recommended

System  Catalog

Information

Type

Name Length

Name Length

Table Name

Column Name

Database

8

8

SYSDATABASE

NAME

Tablespace

8

8

SYSTABLESPACE

NAME

Table

18

18

SYSTABLES

NAME *

View

18

18

SYSVTREE

NAME

Alias

18

18

SYSTABLES

NAME **

Synonym

18

18

SYSSYNONYMS

NAME

Column

18

18

SYSCOLUMNS

NAME

Index

18

8

SYSINDEXES

NAME

Index Space

8

8

SYSINDEXES

INDEXSPACE

Foreign Key

8

8

SYSRELS

RELNAME

Stogroup

8

8

SYSSTOGROUP

NAME

Package

8

8

SYSPACKAGE

NAME

Plan

8

8

SYSPLAN

NAME

         * where TYPE = T

       **  where TYPE = A

Why are indexes singled out in DB2?  This is an example of a compelling reason to bypass the general recommendation.  Developers can explicitly name DB2 indexes, but they can not explicitly name DB2 index spaces.  Yet, every DB2 index requires an index space name.  The index space name will be implicitly generated by DB2 from the index name.  If the index name is 8 characters or less in length, then the index space name will be the same as the index name.  However, if the index name is greater than 8 characters long, DB2 will use an internal, proprietary algorithm to generate a unique, 8 byte index space name.  As this name can not be determined prior to index creation, it is wise to limit the length of index names to 8 characters.  This is a good example of the maxim that there are exceptions to every rule.

Embedded Meaning

One final troublesome naming convention is embedding specialized meaning into database object names.  The name of an object should reflect what that object is or represents.  However, it should not attempt to define the object. 

Revision #4:  Do not embed specialized meaning into database object names.

Let's examine this revised rule by means of an example.  Some shops enforce DB2 index naming conventions such that the type of index is embedded in the index name.  Consider the standard shown in Figure 3.

Figure 3.  Typical Index Naming Standard (Not Recommended)

Prototype:                    cr.aaaXtttl
Example:                    CREATOR.MKTXCONU

Each index name is composed of the following segments:

 

     cr.                Creator Identification;  creator or owner.
                         Note separator (.) between creator and application system.

                         Example:     PTMKTP01     (for Marketing system database)

 

     aaa              Application System;  3 characters.

 

                         Example:     MKT             (for Marketing system)

 

     X                  Index Indicator; constant X.

 

     ttt                 Table Name Identification;  3 characters.

                         abbreviation of corresponding table name.

                         Example:     CON            (for MKT_CONTACT)

 

     l                   Index Letter;  1 character.

                         P             for index corresponding to Primary Key

                         F             for index corresponding to a Foreign Key

                         U             for unique index

                         N             for non-unique index

Note two potential problem areas with this standard:

1.   An embedded X identifies this object as an index.

2.   Embedded meaning in the form of indicators detailing the type of index. 

The embedded indicator character "X", although unnecessary, is not as evil as indicator characters embedded in table-like objects.  Indexes are not explicitly accessed by users.  Therefore, obscure or difficult to remember naming conventions are not a problem.  The same arguments hold true for table space names.   

In fact, indicator characters may actually be helpful to ensure that table spaces and indexes are never named the same.  Table spaces and indexes, within the same database, can never be named the same.  DB2 uses a name generation algorithm to enforce uniqueness if it is attempted.

This exception results in another revised rule:

Revision #5:  If you must use indicator characters in database names, use them only in objects which are never explicitly accessed by end users.

 The second potential problem area poses quite a bit of trouble.  Consider the following cases which would cause the embedded meaning in the index name to be incorrect:

 1.   The primary key is dropped.

2.   A foreign key is dropped.

3.   The index is altered from non-unique to unique (or vice versa) using a database alteration tool.

4.   What if an index is defined for a foreign key, but is also unique?  Should we use an "F" or a "U"?  Or do we need another character?

I have also seen naming conventions which indicate whether the index is clustering ("C") or not ("N").  This is not a good idea either.  Misconceptions can occur.  For example, in DB2, if no clustering index is explicitly defined, DB2 will use the first index created as a clustering index.  Should this index be named with an embedded "C" or not?

Let's look at one final example from the real world to better illustrate why it is a bad idea to embed specialized meaning into names.  Consider what would happen if we named corporations based upon what they produce.  When IBM began, they produced typewriters.  If we named corporation like we name database objects, the company could have been named based upon the fact that they manufactured typewriters when they began.  IBM might have been called TIBM (the "T" is for typewriters).  And guess what, they don't make typewriters any longer.  What would we do?  Rename TIBM or live with a name that is no longer relevant?

Synopsis

Naming conventions evoke a lot of heated discussion.  Everybody has their opinion as to what is the best method for naming database objects.  Remember, though, that it is best to keep an open mind.  Johnny Cash may have been upset that his father gave him a girl's name in the hit song "A Boy Named Sue," but that was before he knew why.  If this article caused you to think about naming conventions from a different perspective, then I will consider it a success.

From Database Trends, September 1999.
 
© 1999 Craig S. Mullins,  All rights reserved.
Home.