Tuesday, January 4, 2011

Avoiding Cursor in MSSQL server: The magic of set based operations

Hello World,

Right, so it has been a happy new year so far. That helped push up my optimism to a level that every problem seemed to come with a solution : )
Now the problem I am going to describe is pretty commonplace - you have a source table with n columns of interest and you want to sync a target table with the data from the source table where:
1. If the target table already has data, update it
2. Otherwise insert data into the target table

Most of us application developers would think of this as a source result-set that needs to be traversed row-by-row (pain!) against the target result-set; updating or inserting data as described above. That would inevitably lead one to code a solution using cursors. Now all of us know that cursors are resource intensive and messy. Every "best-practices" guru would tell you to avoid them as far as possible. This led me to search for an alternative way of achieving the desired result.
For starters, database queries are meant to be declarative - meaning: you declare your intent and the database engine will find a way to do it in the most optimized way known to it. Application programming on the other hand is mostly instructive: we tell the system to do something and how to do it.
My search then landed me on this excellent resource: http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them

E.g. You have an Employees_sync table that is the source for the data to be synchronized and Employees table where the update/insert (jargon alert: UPSERT) should happen, in a SQL 2008 environment you would write a query as below:
MERGE dbo.Employees AS Target
USING (SELECT EmployeeNumber, FirstName, LastName, EmailID FROM Employees_sync EI)
    AS Source
ON (Target.EmployeeNumber = Source.EmployeeNumber)
WHEN MATCHED THEN
    UPDATE
    SET Target.FirstName = Source.FirstName,
    Target.LastName = Source.LastName,
    Target.EmailID =Source.EmailID
WHEN NOT MATCHED BY TARGET THEN
    INSERT(EmployeeNumber, FirstName, LastName, EmailID)
    VALUES( Source.EmployeeNumber, Source.FirstName, Source.LastName, Source.EmailID);

(Column names should be self-explanatory)

If you have SQL 2005 or below, you would have to split this up into two batches:
-- Update
UPDATE Employees
SET
FirstName = tm.FirstName,
LastName = tm.LastName,
EmailID = tm.EmailID
FROM Employees_sync tm
LEFT JOIN Employees m ON tm.EmployeeNumber = m.EmployeeNumber

--INSERT
INSERT INTO Employees (EmployeeNumber, FirstName, LastName, EmailID)
SELECT EmployeeNumber, FirstName, LastName, EmailID FROM Employees_sync
WHERE EmployeeNumber not in (SELECT EmployeeNumber FROM Employees)

Happy Coding!