Many people think that DB abstraction is cool. I get emails about this topic all the time. People suggesting we add a DB abstraction layer to b2evolution, or offering to do so themselves.

So why haven’t we added one yet? Well, simply because DB abstraction doesn’t work! :!:

DB abstraction layers will mostly hide the specific interface semantics between the application and the database. Thus you don’t have to worry about the specific system call to connect to each particular DB. OK nice. So what? That’s only the easiest part in porting an app from a DBMS to another… (BTW, we use a class (/evocore/db.class) in b2evolution for this too. You just have to slighlty alter this class if you want to connect to another DBMS than MySQL).

More sophisticated layers will also translate between different datatypes from one DBMS to another. But since datatypes tend to standardize among all popular DBMSes this is getting less of an issue as time passes by…

Finally, we get to the point where it hurts: the SQL syntax! There are vitually no two DBMSes that share the exact same SQL syntax, except for the most basic SELECT, INSERT, UPDATE and DELETE statements! >:-[ And it doesn’t seem to be standardizing really… some still won’t support even the minimalistic SQL 92 standard! >:XX

Just check it out with your 2 favorite DBMSes:

  • How do you perform a LEFT OUTER JOIN?
  • How do you concatenate columns into a single result string?
  • How do you limit results in the WHERE clause to a regular expression?
  • How do you handle the fact that when you INSERT and the primary key already exists, you want to UPDATE instead? (MySQL's REPLACE INTO syntax...)?

These are just a few common examples, but they’re probably already enough to show you why you DB abstraction layer won’t magically translate from you first favorite DBMS to your second… You’ll have to rewrite many queries too… and sometimes you won’t find equivalent functionnality (subqueries?) and you’ll have to write extra application code…

Not to mention triggers and stored procedures where you wouldn’t even dare to dream about some kind of compatibility.

The most advanced DB abstraction I have ever worked with was ODBC. Yes that thing actually did SQL syntax translation! But it depended on specific drivers to implement translation against a standardized syntax defined by Microsoft.

So today when I see some open source library pretending to perform DB abstraction and I can’t find anything closely related to SQL syntax translation, I won’t even consider it… It’s useless to me. The real difficulty with handling multiple DBs is rewriting the SQL queries, triggers and stored procedures… it’s not connecting to yet another fancy DBMS and SELECT ‘hello’… |-|


Comments from long ago:

Comment from: dAniel hAhler

There is Propel (http://propel.phpdb.org), which “is a full-service object persistence and query toolkit for PHP5”, which “allows you to treat your database as a set of objects, providing a simple API for storing and querying data.”

2005-04-25 12-05

Comment from: Jean-Michel Pouré

Follow the SQL 99 syntax. PostgreSQL, Oracle and MSSQL Server are mostly SQL 99 compliant. PostgreSQL itself can be seen as a reference development for SQL 99 compliancy.

If your application runs under PostgreSQL, it should run anywhere. Do not use too many server-side objects and programming.

The problem with MySQL is that it is not compliant with SQL 99. Therefore, everything written is likely to be incompatible with legacy SQL 99 systems.

2005-06-06 13-12

Comment from: Rob

Well I suppose a DB abstraction layer does mean that someone else (another programmer) will know exactly what modules or classes to to change when a new database needs to be supported… theoretically. Wouldn’t all the SQL to change be in the classes that comprise the abstracton layer?

It’s a pattern. Engineers love design patterns.

2005-07-20 07-45

Comment from: ike

I’m not sure about the PHP world, but I wouldn’t say that DB abstraction (and I’m talking only about sql syntax) is impossible (or even unlikely) …

I started down this road with my own solution for ColdFusion a while ago and am currently putting another revision on it actually right now.

Concatenation - easy - the API accepts || and converts that (which isn’t standard sql syntax for anything else afaik) into the appropriate + (mssql,mysql) or & (access) if necessary.

Left joins were handled about the same time - all data coming into the abstraction layer is “meta-data”, so the layer can do whatever it likes with the array of join statements to produce the appropriate syntax.

I don’t use regular expressions in SQL because they’re not widely enough supported.

I handle “replace into” in my update mechanism by optionally performing an equivalent select statement before the update to determine if the record exists.

There are a host of other issues I’ve covered, like automatically converting all the variables on the url (or any other structure) into ORM-mapped filters in your select query, so you don’t have to edit the sql-code when you add a column to your table or escaping reserved words (mssql/access [], oracle “”, mysql``) which I didn’t want to implement initially, but ended up doing it anyway because I found myself in an office where I’m working on a legacy system designed by someone who didn’t know better. If you didn’t want to use that feature, all you’d have to do is extend sqlagent.cfc or the appropriate cfc (class) for your db server (mssql.cfc, mysql.cfc, etc) and override the escape method.

The latest revision makes the abstraction layer even much easier to use… datasource.getStatement(“select”).init(“mytable”,“columns”).collectionFilter(url).filter(“column”,content).execute() … there’s also a getSyntax() method to return the generated syntax instead of executing the query… so … long and short of it is, I just think you haven’t seen the right abstraction layer.

2006-01-27 00-51

Comment from: ike

I just realized the last couple paragraphs might not have been entirely intuitive… :) with regard to not changing the sql-syntax when you add a column to your table, I’m talking about making search results functionality easier to produce. Once I have a search page made, I can narrow the results however I like just by adding the column name to the url. I still have to add the column to the table in my search display, but I only have to make 2 changes (html & database) not 3 (select statement).

2006-01-27 00-58

Comment from: François Planque

Sure… of course you can add tons of code to abstract everything including the syntax like ODBC does. (ODBC is actually a fairly intersting abstraction layer…)

You could also code regular expressions into the abstraction layer in order to handle those lousy DBMSes which still don’t support them yet.

At the end of the day, my point is that there is too much variations between DBMSes for DB abstraction to work in an efficient manner.

In other words, you’re bound to end up with a major implementation of what Joel Spolsky calls a leaky abstraction.

2006-01-27 13-09

Comment from: ike

umm… okay…

That’s an argument for never doing anything to make your life (programming work) easier, because doing something to make things easier is an abstraction and therefore would leak…

I’ll stick to shortening my development times…

2006-01-30 04-43