Anyone who has spent time analyzing someone else’s SQL code has had the horrific experience of trying to make sense out of poorly formatted (or completely unformatted) code; indiscriminate case usage, lack of tabs or indentation, and wrapping text with no carriage returns are just a few of the readability problems you can encounter. This is enough of a challenge when trying to determine the purpose of a given piece of code. Then you have to factor in that the SQL you are reading was more than likely written to take function, syntactically, with a specific RDBMS, such as Microsoft SQL Server, DB2, or Oracle. Additionally, each of these implementations has its own set of “current” syntax conventions, as well as older conventions supported purely for “backwards compatibility”. Given this, it can take hours to evaluate what the code is even trying to do, much less figure out how to fix it, much less performance tune it.
Obviously, one of the major purported solutions to this problem is to always use ANSI standardized SQL.
/*
For the uninitiated, ANSI is an acronym for the American National Standards Institute, an organization in the US that sets technical standards. They have published a series of standards relating to SQL in the last few decades, notably SQL-92, SQL:1999, SQL:2003, and SQL:2006. Each version generally includes updates to the standard that reflects new functionality, i.e. SQL:2006 introduced standardization around concurrently accessing traditional SQL data as well as XML data inside a relational database. ISO (the International Standards Organization) generally ratifies these standards as well, since ANSI is a US based organization.
*/
Since many of the major RDBMSs meet at least ANSI-92, if not later, using pure ANSI standard syntax will work for queries that do literally nothing but SELECT, UPDATE, INSERT, or DELETE data. That is, ANSI is used to standardize ONLY the data access and manipulation aspect of what DBAs and developers tend to think of as “SQL”. However, that standard does NOT include anything procedural, such as control-of-flow syntax. This is where the usage of ANSI SQL starts to break down.
Try to remember some of the recent SQL you’ve written. Was there a CONVERT statement? How about a FOR or WHILE loop?
None of these are ANSI compliant; they are extensions supported by specific vendors.
Now, can you imagine a world where none of your SQL had any procedural extensions? You’d be returning raw recordsets back to applications for sorting, traversing, etc. Doesn’t sound like much fun, does it?
Additionally, learning and using vendor specific SQL extensions can have a huge impact on performance. There are always nuances in query design that you can take advantage of when customizing your SQL for a specific RDBMS. But remember, the trade-off is portability and maintainability. If you use ANSI SQL, your code will likely run on any RDBMS with little or no changes. Also, *most* other developers will be able to understand your code, at least at a basic level, without having to dive into the vendor documentation.
So, like any other debate, there are multiple sides, and the answer is always “it depends”. But this is a choice that all database developers should consider when writing new SQL code. Be sure to evaluate each project and environment accordingly to choose what’s best for that situation.
I'm interested to hear what people think on this topic; we've all heard/read about various opinions on this, but what are the people in the trenches actually doing? Post a comment and let me know what you think!