close

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Showing posts with label ORM. Show all posts
Showing posts with label ORM. Show all posts

Sunday, November 15, 2009

ORM magic

The ORM layer is magic, right?

The ORM layer "hides" the database, right?

We never have to think about persistence, right? It just magically "happens."

Wrong.

Here's some quotes from a recent email:

"Somehow people are surprised that we would have performance issues. Somehow people are surprised that now that we are putting humpy/dumpy together that we would have to go back and look at how we have partitioned the system."

I'm not sure what all of that means except that it appears that the author thinks mysterious "people" think performance considerations are secondary.

I don't have a lot of technical details, just a weird ranting list of complaints, including the following.

"... the root cause of the performance issue was that each call to the component did a very small amount of work. So, they were having to make 10 calls to 10 different components to gather useful info. Even though each component calls was quick (something like 0.1 second), to populate the gui screen, they had to make 15 of them."

Read the following Stack Overflow questions: Optimizing this Django Code?, and Overhead of a Round-trip to MySql?

ORM Is A "Silver Bullet" -- It Solves All Our Problems

If you think that you can adopt some architectural component and then program without further regard for the what that component actually does, stop coding now and find another job. Seriously.

If you think you don't have to consider performance, please save us from having to clean up your mess.

I'm repeatedly shocked at people who claim that some particular ORM (e.g., Hibernate) was unacceptable because of poor performance.

ORM's like Hibernate, iBatis, SQLAlchemy, Django ORM, etc., are not performance problems. They're solutions to specific problems. And like all solution technology, they're very easy to misuse.

Hint 1: ORM == Mapping. Not Magic. Mapping.

The mapping is from low-rent relational row-column (with no usable collections) to object instances. That's all. Just mapping rows to objects. No magic. Object collections and SQL foreign keys are cleverly exchanged using specific techniques that must be understood to be used.

Hint 2: Encapsulation != Ignorance. OO design frees us from "implementation details". This does not mean that it frees us from performance considerations. Performance is not an "implementation detail". The performance considerations of class encapsulation are central to the very idea of encapsulation.

One central reason we have object-oriented design is to separate performance from programming nuts and bolts. We want to be able to pick and choose alternative class definitions based on performance considerations.

ORM's Role.

ORM saves writing mappings from column names to class instances. It saves us from writing SQL. It doesn't remove the need to actually think about what's actually going on.

If an attribute is implemented as a property that actually does a query, we need to pay attention to this. We need to read the API documentation, know what features of a class do queries, and think about how to manage this.

If we don't know, we need to write experiments and spikes to demonstrate what is happening. Reading the SQL logs should be done early in the architecture definition.

You can't write random code and complain that the performance isn't very good.

If you think you should be able to write code without thinking and understanding what you're doing, you need to find a new job.

Friday, July 31, 2009

Object Models and Relational Joins -- Endless Confusion

Check out this list of questions from Stack Overflow: [Django] join.

These are all folks trying to do joins or outer joins even though they have objects fetched through the ORM.

How does this confusion arise? Easy. Folks work with SQL as if the relational world-view is Important and Universal. It isn't. SQL isn't even a programming language, per se.

Here's the important thing for Django developers to know: SQL is a Hack; Leave it Behind.

The bad news is that all those years spent mastering the ins and outs of the SELECT statement doesn't have as much enduring value as I'd hoped it would have. [Yes, I was a DBA in Ingres and Oracle. I know my SQL.]

The good news is that Object Navigation replaces much of the hideousness of SQL. To an extent. Let's look at some cases.

Joins in General

SQL SELECT statements are an algebraic specification of a result set. The database is free to use any algorithm to build the required set.

SQL imposes the Join hack because SQL is a completely consistent set algebra system. A simple SELECT returns a row-column set of data. A join between tables has to construct a fake row-column set so that everything is consistent.

A Join is nothing more than navigation from an object to associated objects. In OO world, this is simply object containment; the navigation is simply the name of a related object. Nothing more.

Master-Detail (1:m) Joins

A master-detail join in SQL works with a foreign key reference on the children.

In Django, this has to be declared in a SQL-friendly way so that the ORM will work.

class Master( models.Model ):

class Detail( models.Model ):
master= models.ForeignKey( Master )

The "Join" query is simply this. The "detail_set" name is deduced by Django from the class that contains the foreign key.

for m in Master.objects.filter():
process m
for d in m.detail_set.all():
process d

"But wait!" the SQL purist cries, "isn't that inefficient?" The answer is "rarely". It's possible that the RDBMS, doing a "merge-join" algorithm to build the entire result set might be quicker than this.

As practical matter, however, the rest of the web transaction -- including the painfully slow download -- will dominate the timeline.

Association (m:m) Joins

An association in SQL requires an intermediate table to carry the combinations of foreign keys.

In Django, this has to be declared in a SQL-friendly way so that the ORM will work.

class This( models.Model ):

class That( models.Model ):
these = models.ManyToManyField( This )

The navigation, however, is simply following the relationships. There's no complicated SQL join required.

for this in This.objects.filter():
for that in this.that_set.all():
process this and that

Here's the other side of the navigation.

for that in That.objects.filter():
for this in that.these:
process this and that

Outer Joins

An Outer Join is a "Join with Null for Missing Relationships". It's navigation with an if-statement or an exception clause.

for that in That.objects.filter():
try:
this = that.this_set.get()
except This.DoesNotExist:
this = None
process this and that

There isn't any "join" in object-oriented programming. The ORM layer removes the need.