Thursday, April 20, 2006

SQL For Developers

Why is it that so many otherwise skilled developers skimp so much on their SQL knowledge. Databases are a huge part of many applications yet until a few years ago I was as guilty as anyone of avoiding learning too much about SQL.

I guess most of us know how to string together a basic SELECT, INSERT or DELETE statement, but how many of us so often use a SQL Cursor when writing a stored procedure or trigger ?

Any of Joe Celkos books are a great way of getting into more advanced SQL but the best way is to decide you want to learn about SQL and not just avoid it as much as possible. Often it offers the easiest way to speed up your code.

Some big speed increases can be gained with very minor changes.

I have also seen SQL which checks if there are any unprocessed messages like this :
SELECT COUNT(*) FROM MyMessages WHERE Processed = 0
This SQL is usually built up in a hardcoded string and executed using the relevant objects for that language. Then the program looks at the number returned and checks if its greater than zero.

Its usually much faster to execute this
SELECT CASE WHEN EXISTS(SELECT MessageID FROM MyMessages WHERE Processed = 0 ) THEN 'YES' ELSE 'NO' END

In the first case the SQL engine has to read through all rows on the table and decide if they match. In the second case the SQL engine can stop looking at the table as soon as it finds a single match. Big speed difference.

Telling the database to create an index on the column Processed can also speed this up enormously.

Also consider putting more SQL into stored procedures rather than hardcoding it into your application. The engine will probably optimize it further, it allows you to restrict attack surface to hackers ( code only needs access to the proc not the underlying table ), also if you need to change the business logic its a SQL script to run not a full application build.

SQL Cursors code runs at least one order of magnitude slower than writing your SQL without them. That means your program runs *much* slower than it needs to. The reasons they get used so much is developers are conditioned to think in terms of looping thru a set of results. They can usually be removed by thinking about the pr oblem in terms of processing SETS of data and avoiding loops.

Replacing cursors with SET based operations is not an easy hack, you need to learn to think about the data processing in a different way. But once you can make this shift in the way you think about the problems the benefits of tearing out those cursors is vast speed increases.

A simple example, I would always have thought a cursor was needed to find duplicate rows. Maybe I would have a cursor running thru the table and some variables to load the row into. If this row matched what was already in the variables I had found a duplicate. But this SET based SQL runs *much* faster :

-- Show duplicate customer names in customer table
SELECT
CustomerName, COUNT(*)
FROM Customer
GROUP BY CustomerName
HAVING COUNT(*) > 1

If you are running Microsoft SQL Server try typing the SQL into query analyzer, highlight the code and press CTRL-L, this shows you the approach the engine is taking to process the SQL. It shows you the indexes and joins it will use and what percentage of time it thinks each part of the operation will take. Often this shows you where changes to the SQL will have the biggest speed increase.

SQL isnt just for dba's, mastering it will improve your employability and enable you to make big speed increases to your application.

No comments: