Share on Facebook Share on Twitter Share on Digg Share on Stumble Upon Share via e-mail Print

Removing Superfluous Spaces

by Craig S. Mullins

Although data integrity is a pervasive problem, there are some data integrity issues that can be cleaned up using a touch of SQL. Consider the common data entry problem of extraneous spaces in a name field. Not only is it annoying, sometimes it can cause the system to ignore relationships between data elements. For example, "Craig Mullins" is not equivalent to "Craig Mullins"; the first one has two spaces between the first and last name whereas the second one only has one.

You can write an SQL UPDATE statement to clean up the problem, if you know how to use the REPLACE function. REPLACE does what it sounds like it would do: it reviews a source string and replaces all occurrences of a one string with another. For example, to replace all occurrences of Z with A in the string BZNZNZ you would code:


     REPLACE(‘BZNZNZ’,’Z’,’A’)


And the result would be BANANA. But a simple REPLACE is not sufficient for the task at hand, so let’s create a SQL statement to get rid of any number of unwanted spaces in the NAME column of our EMPLOYEE table:


     UPDATE EMPLOYEE
            SET NAME = REPLACE(
                          REPLACE(
                              REPLACE(NAME, SPACE(1), '<>')
                              '><', SPACE(0))
                         '<>', SPACE(1));


What are all of those less-thans and greater-thans and why do you need them? Let me explain. The inside REPLACE statement takes the NAME column and converts every occurrence of a single space into a left/right carat. The next REPLACE (working outward), takes the string we just created, and removes every occurrence of a right/left carat combination by replacing it with a zero length string. The final REPLACE function takes that string and replaces any left/right carats with a single space. The reversal of the carats is the key to removing all spaces except one – remember, we want to retain a single space anywhere there was a single space as well as anywhere that had multiple spaces. Try it, it works.

Of course, you can use any two characters you like, but the left and right carat characters work well visually. Be sure that you do not choose to use characters that occur naturally in the string that you are acting upon.

Finally, the SPACE function was used for clarity. You could have used strings encased in single quotes, but the SPACE function is easier to read. It simply returns a string of spaces the length of which is specified as the integer argument.



From Database Trends and Applications, July 2012.

© 2012 Craig S. Mullins,  

July 2012

DBA Corner