First, a minor confession, I dislike loathe databases. The only thing I may loathe more are the self-appointed high-priests of databases, DBA's. However, They are a
necessary evil. Invariably, most software solutions need one, so I begrudgingly
accept its role. My major complaint is that the industrial solutions that exist
today are all some form of Relational Database Management System (RDMS).
Working with them always is a hassle. The software I write has nice little
object abstractions for the entities I deal with it – it's great. I don’t worry
much about how they do what they do – they just do it. That is until I must
bridge the great impedance mismatch to get them in and out of the RDMS.
Why must I:
- Maintain data layout information in two places?
- Know some less than conformant version of SQL syntax?
- Deal with a mess of SQL scripts as I adjust the definition of objects over time?
In my previous commercial existence, I and some very
talented folks have waded right into this problem. The first time we tackled
the problem, it was to provide an RDMS like view of network data. As things
went forward, we realized we wanted to address this problem in such away as the
backing store did not matter. In other words, the objects we requested
information from could be backed up by a database, network calls, excel
spreadsheets, etc. You get the idea!
Before you all start posting comments, yes I have heard of
ODBC, ADO.NET, WMI, CSLA, and Hibernate. Hold your horses. Each of these
doesn’t really help solve the problem I want solved. I DON”T WANT TO DEAL WITH
THE DATABASE.
Further, when querying I am a huge fan of Query by Example (QBE). Let me show you an object and you give me back a list of them that look like it.
Formula exemplar = new Formula();
IStore store = ServiceFactory.Instance.Retrieve<IStore,Formula>();
exemplar.Email = ‘mwebb@tribalpizza.com’;
Console.WriteLine (“My formulas”);
foreach (Formula f in store.Retrieve (exemplar))
{
Console.WriteLine (f.Display);
}
Yes, this is how our current system works. Short, sweet, and
to the point.
In dealing with this problem, I want to work with three real
abstractions:
- An entity
- An entity’s definition
- A store
An entity is just that -- well let’s say something like a
Pizza Formula. I want to describe its properties, their constraints, relevant
data-types, and be done. We’ll call this description an entity’s definition.
A store is a place to store and retrieve entities from.
Here is how I want life to work:
- I describe an entity declaratively
- A code generator creates me a magic wrapper for ease of dealing with the entity. This is not entirely necessary but damn handy. The system needs to be able to retrieve and operate on entities for which these wrappers don’t exist though!
- A store is selected based on an implementation of the service factory and the type of identity. This gives me great flexibility in changing out backing store as well as potentially selecting the type (and characteristics) of the backing store based on entity.
- A store must be able to read an entities’ definition and create the necessary backing store for the entity. Period! I don’t want to help it. Read the definition and create the store. I’m not creating your SQL scripts, got it.
- A store must be able to read an entity’s definition and realize that a newer description exists. The store must be able to take reasonable actions to upgrade the backing store to realize these changes. When I say reasonable, I say the store should be smart enough to handle the additions of columns. Changing the data-type and/or deleting of columns may be beyond the capability of the store. I generally try to avoid these operations anyways.
- A store must be able to perform relevant read, delete, and update operations.
- I want to be able to access existing data without having the original data definitions. How? Well, easy, the backing store is also responsible for keeping track of the data definitions used to create objects. Wow! This has huge implications. More to come on this in the future.
Now, I know I loose some things:
- I may not be able to eeck out every micro-second of performance that a perfectly tuned SQL script could offer. However, I intend to deal with this problem another way. The selection of which store to use is done by requesting a store from a service factory based on entity. This allows me to *tune* up specialized stores for performance critical entities that have special characteristics. I shouldn’t need to do this much, but, I have a back door if needed.
- I am accepting some amount of lowest common denominator in backing store capabilities. Again, if this is a challenge see my solution to the problem above.
I’ve made some pretty good progress on a solution that works
for us. This is the first part in my series of explaining the design philosophy and its inner workings. Eventually,
we will be sharing an open source solution to this problem.
What are your thoughts on the great impedance mismatch and my approach to the problem?
Obviously the data you work on isn't very important. Otherwise, you would care about data integrity which is the problem relational databases solve.
Posted by: Kaizyn | September 25, 2007 at 02:44 AM
Another fine example of the "I want things to work my own special way, and anything that doesn't work that way sucks" style of programmer whining. People usually outgrow this kind of immaturity before they turn 10. Maybe your needs and what you want are really so new and important that relational databases suck by comparison, but I doubt it.
RDBMSs pre-date OOP, so it's pretty stupid to complain that databases suck because you can't be bothered to learn how to design or use them. Why not complain that your tools don't have any robust way to write and read objects to/from permanent storage? Do you think that OOP might be flawed because it can't do simple things like query a collection of objects for matching instances? You have it backwards -- databases don't suck, something else in your equation does.
I have some helpful tips on relational databases here if you're interested in upgrading your database skills:
http://typicalprogrammer.com/databases/abject-oriented-databases/
Posted by: gregjor | September 25, 2007 at 02:45 AM
Most data access layers look simple and elegant for simple problems. The true test of them comes when you have to do something more complicated (self-joins, outer joins, compound boolean expressions, grouping and aggregation).
The funny thing is that none of these APIs really works as well as SQL does. In the end, your better off spending your time learning how to use SQL effectively, then trying to learn some other API (or invent your own).
Posted by: David Avraamides | September 25, 2007 at 08:43 AM
I really appreciate your feedback. Actually, I am fairly versed in three versus of SQL (SQL Server 2000, SQL Server 2005 and MySQL).
I guess I was confused on the purpose of RDBMS. I thought an RDBMS was one that realized some or the entire relational model proposed by E.F. Codd. Data integrity (if you mean as to apply constraints to data and relations and enforce them) is part of that model. If you mean use of transactions, it is not; however it is an important feature for data integrity and one of the key reasons to use an RDBMS even if it doesn't quite match.
Obviously, my attempt at levity when dealing with the impedance mismatch that exists between our different tool sets wasn't well received. Too bad. This isn't that funny of a problem when you consider the big dollars that have been thrown at trying to solve the problem that OOP and RDBMS (and XML) don't play that well together. We as developers collectively spend lots of time writing boiler plate code to overcome these barriers.
Many in the software industry are devoting big resources to tackling the difficult impedance mismatches between OOP, RDBMS, and XML.
One need only look at toolkits like CSLA, Hibernate, the work in SQL Server 2005, and the upcoming capabilities in LINQ for C# 3.0 to see that many realize this is a problem.
Yes, the real challenge to any O-R mapping is dealing with joins, expressions, and aggregation. Some of these will fall to the way side. The one's I don't need or use won't be supported.
Posted by: Michael Webb | September 25, 2007 at 09:40 AM
The problem with ORM is not the tech, its the approach.
Suppose you have 3 tables, Foo,Bar and Baz. Foo has many Bars, and Foo has Many Baz.
The first thing a typical hibernate/ORM programmer would do is make Foo, Bar and Baz into three objects, write mappings, and DAOs(if needed) to handle access.
The problem is typical usage of Foo,Bar and Baz *does not* need all the data from any of them.
A typical UI shows a subset of data being manipulated in the database. This concept is so prevalent SQL has a keyword for this *join*.
The representation of the data does not necessarily match the view of it, and often doesn't.
A better approach is make entity objects which match your user interface. The other method doesn't utilize the RDBMS it duplicates it in Java/C#.
Posted by: DavidM | September 25, 2007 at 03:36 PM