S.Lott-Software Architect: Django

Showing posts with label Django. Show all posts

Tuesday, October 6, 2015

Today's Milestone: Refactoring and Django Migrations

Once upon a time, when today's old folks were young, we'd debate the two project strategies: Hard Part Do Later (HPDL) vs. Hard Part First (HPF).

The HPDL folks argued that you could pick away at the hard part until -- eventually -- it wasn't hard any more. This doesn't often work out well in practice, but a lot of people like it. Sometimes the attempt to avoid the hard part makes it harder.

The HPF folks, on the other hand, recognized that solving the hard problem correctly, may make the easy problems even easier. It may not, but either way, the hard part was done.

The debate would shift to what -- exactly -- constituted the hard part. Generally, what one person finds hard, another person has already done several times before. It's the part that no one has done before that eventually surfaces as being truly hard.

Young kids today (get off my lawn!) often try to make the case that an Agile approach finesses the "hard part" problem. We define a Minimally Viable Product (MVP) and we (magically) don't have to worry about doing the hard part first or last.

They're wrong.

If the MVP happens to include the hard part, we're back a HPF. If the MVP tries to avoid the hard part, we're looking at HPDL.

The Novelty Factor

Agile methods don't change things. We still have to Confront the Novelty (CTN™). Either it's new technology or it's a new problem domain or a new solution to an existing problem domain. Something must be novel, or we wouldn't be writing software, we'd be downloading it.

I'm a HPF person. If you set the hard part aside to do later, all the things you do instead become constraints, limiting your choices for solving the hard part that comes later. In some rare cases, you can decompose the hard part and solve it in pieces. The decomposition is simply Hard Part First through Decomposition (HPFtD™) followed by Prioritize the Pieces (PtP™) and another round of Hard Part First.

Today, we're at a big milestone in the HPF journey.

The application's data model is simple. However.

The application has a complex pipeline of processing to get from source data to the useful data model.

A strict (and dumb) MVP approach would skip building the complex pipeline and assume that it was magically implemented somehow.

A slightly smarter MVP approach uses some kind of technical spike solution to handle the complex pipeline. We do that manually until we get past MVP and decide to implement the pipeline in something more final and complete.

My HPF strategy tackles the complex pipeline because we have to build it anyway and it's hard. We don't have to build all of it. Just enough to lay out the happy path.

The milestone?

It's time to totally refactor because -- even doing the hard part first -- we have the wrong things in the wrong places. Django application boundaries generally follow the "resources". It's a lot like designing a RESTful API. Define the resources, cluster them together in some kind of ontology that provides a meaningful hierarchy.

Until -- of course -- you get past the problem domain novelty and realize that some portion of the hierarchy is going to become really lopsided. It needs to be restructured so we have a flat group of applications.

Wait. What?

Flatten?

Yes.

When we have a Django application model that's got eleventy-kabillion classes, it's too big. Think the magic number 7±2: there's a limit to our ability to grasp a complex model.

Originally, we thought we'd have apps "A", "B", and "C". However. "A" turned out to be more complex than it seemed when we initially partitioned the apps. Based on the way the classes are named and clustered in the model file, it's clear that we have an internal structure is struggling to emerge. There are too many comments and high-level organizational hints in the docstrings.

It looks like this might be the model that's emerging:

Former A

A1
Conceptual A2

This means that there will be classes in A3 that depend on separate apps A2a and A2b. Further, A2 is really just a concept that unifies the design; it doesn't need to be implemented as a proper app. Both A2a and A2b depend on A1. A3 depends on A2a, A2b, and A1.

Ugh. Refactoring. And the associated migrations.

Django allows us to have nested apps. But. Do we really want to go there? Is a nested collection of packages really all that helpful?

Or.

Would it be better to flatten the whole thing, and simply annotate the dependencies among apps?

The Zen Of Python suggests that Flat is Better than Nested.

The hidden benefit of Flat is that the Liskov Substitution Principle is actually a bit easier to exploit. Yes, we have a tangled web of dependencies, but we're slightly less constrained when all of the Django apps are peers. Yes, many things will depend on the A1 app, but that will be less of a problem than the current pile of classes is.

The important part here is to start again. This means I need to discard the spike database and discard the history of migrations to date. I always hate disrupting my development databases, since it has test cases I know and remember.

That's the disruptive milestone for me: discarding the old database and starting again.

Thursday, January 29, 2015

Bottle vs. Flask vs. Django vs. a dozen others

There are times when a "micro framework" is actually useful. I wasn't easily convinced that this could be true. Big framework or die trying. Right?

Maybe not so right.

My primary example of a micro framework's value is a quick demo site to show how some API's are going to be consumed.

I've been avoiding an Angular.js app. We're going to be supporting the user experience with a sophisticated Angular.js app. But, as a back-end developer, I don't want to try to write demo pages in the proper app framework. There's too many ways to screw this up because I'll miss some key UX feature. Instead, I want to write fake pages that show a considerably simplified version of consuming an API. Sort of "suggestion" pages to clarify how the API's fit together.

To make it even more complex than necessary, I'm not interested in learning Angular.js, and I'm not interested in figuring out how it works. Running node.js, bower, npm, etc., etc., is too much.

[And Angular.js is just one version of the front-end. There's also mobile web, mobile native, tablet native, and God-alone-only-knows-what-all-else, to name a few. Each unique.]

Several sprints back, I slapped together two fake forms using bootstrap for layout and hosted them with Bottle. The navigation and framework look were simply copied from a screenshot and provided as static graphics. Lame, but acceptable. All that matters is getting the proper logo to show up.

The problem is that the sequence of API requests has grown since then. The demo grew to where we need a session so that alternative sequences will provide proper parameters to the APIs. We're beyond "Happy Path" interactions and into "what-if" demonstrations to show how to get (or avoid) a 404.

Bottle started with the significant advantage in fitting entirely into a single .py module. The barrier to entry was very low. But then the forms got awkwardly complex and Jinja2 was required. Now that sessions are required, the single module benefit isn't as enticing as it once was.

I've been forced to upgrade from Bottle to Flask. This exercise points out that I should have started with Flask in the first place. Few things are small enough for Bottle. In some ways, the two are vaguely similar. The @route() annotation being the biggest similarity. In other ways, of course, the two are wildly different. There's only a single Flask, but we can easily combine multiple Bottles into a larger, more comprehensive site. I like the composability of Bottles, and wish Flask had this.

The Flask Blueprints might be a good stand-in for composing multiple Bottles. Currently, though, each functional cluster of API's has their own unique feature set. The bigger issue is updating the configuration to track the API's through the testing and QA servers as they march toward completion. Since they don't move in lock-step, the configuration is complex and dynamic.

The transparent access to session information is a wonderful thing. I built a quick-and-dirty session management in Bottle. It used shelve and a "simple" cookie. But it rapidly devolved to a lot of code to check for the cookie and persist the cookie. Each superficially simple Bottle @route() needed a bunch of additional functionality.

The whole point was to quickly show how the API's fit together. Not reinvent Yet Another Web Framework Based On Jinja2 and Werkzeug.

Django seems like too much for this job. We don't have a model; and that's the exact point. Lacking a database model doesn't completely break Django, but it makes a large number of Django features moot. We just have some forms that get filled in for different kinds of events and transactions and searches and stuff. And we need a simple way to manage stateful sessions.

Omitted from consideration are the other dozen-or-so frameworks listed here: http://codecondo.com/14-minimal-web-frameworks-for-python/. This is a great resource for comparing and contrasting the various choices. Indeed, this was how I found Bottle to begin with.

Thursday, February 27, 2014

Django and REST -- Tastypie vs. Django REST

Ouch. What a difficult question.

Lazyweb: Django REST Framework vs. Tastypie. Thoughts? #django
— Joe Dougherty (@modusjonens) February 17, 2014

This isn't easy.

Comparing http://django-tastypie.readthedocs.org/en/latest/ with http://www.django-rest-framework.org is hard. They're both outstanding projects with a long history.

Trivial Follow-up Question 1: What are the requirements?

I happen to know, however, a bit about the context, so I suspect that the requirements center around super-flexible data access and numerous serialization formats.

History

My initial reaction is "Django-REST" of course. Mostly because I started with this several years ago and spent some time tweaking and adjusting my local copy. Our requirements involved adapting Django (and Django-REST) to use Forge Rock Open AM for authentication.

One feature that we didn't need was a sophisticated set of built-in transactions that covered the full REST spectrum of GET, PUT, POST and DELETE. 90% of our processing was GET with an occasional POST.

The other feature we didn't need was a trivial mapping from the Django object model. Our GET processing required view functions as mediation between our database models and the "published" model available through the RESTful API.

Since we needed so little, we hacked out the essential serialization feature set to support our GET operations.

Serialization

Considering the context of the initial question, I think that serialization is the deciding factor. Comparing the serialization features seems to indicate that the following summary may be relevant.

Tastypie serialization is simpler. The support for XML, YAML, JSON, etc., is simple.

Django-REST serialization+render is quite a bit more sophisticated and more flexible. The process is explicitly decomposed into serialization (for breaking down the model objects) and rendering in some external representation like XML, JSON, YAML, etc.

This two-step breakdown in Django-REST seems to make an open data project work out nicely. The developers should find it easier to integrate and publish data from a variety of sources.

Thursday, February 23, 2012

Is Django Suitable?

I got a long list of requirements from a firm that's looking to build a related family of web sites. They were down to a Django vs. Ruby-on-Rails decision.

As you can see, they've done their homework in thinking through their needs.

I grouped their "high-level requirements" into several categories. I summarizes the fit with Django here, and provided into details separately.

Authentication. Django supports flexible logins and Python makes it easy to adapt other security API's. Django and Python assure that this is a solid 10.
Shared Code. This is handled through Python features that are central to the Django framework. Shared code management -- with appropriate overrides and customization -- is part of Python and a 10.
Database Access. While Django provides the necessary access features, database scalability depends on the implementation of the database engine itself. There are numerous parallelization features that must all be used to maximize database throughput. Even though the real responsibility for performance is outside Django, the Django flexibility results in a 10.
AJAX and Javascript. Django supports the necessary RESTful API's. However, Django treats JavaScript as simple static content, offering little specific support. Since JavaScript support is not an essential part of Django, perhaps this is only a 5.
Applications. The various applications described in the requirements are more-or-less irrelevant to Django. They can be built easily, but are not first-class features of Django. In the sense of easy-to-develop, this is a solid 10. In the sense of already-existing-applications, this may be a 5 if the applications are part of a community like Pinax. Because the applications do not already exist, this may also be a 0.
API. Python allows use of any API. Django's transparent use of Python makes it easy to build API's. This is a feature of Python and scores 10 out of 10.
Usability and Developer Skills. Django's ease-of-use is a direct consequence of the Python programming language. The developers of Django make excellent use of the Python language, giving this a 10.
Performance, Access and Scalability. For the most part, Django performance comes from consideration of the purpose of all layers of the architecture. Principle design features include keeping static content separate from dynamic content (reducing Django's workload), and optimizing the database (to hande concurrent access well). Django provides several internal design features to minimize memory. Django encourages proper separation of concerns, giving a 10.

In each of these areas, it's possible to dive into considerable depth. It was tempting to offer up proof-of-concept code for some of the questions.

Tuesday, December 14, 2010

Code Base Fragmentation -- Again

Check this out: "Stupid Template Languages".

Love this: "The biggest annoyance I have with smart template languages (Mako, Genshi, Jinja2, PHP, Perl, ColdFusion, etc) is that you have the capability to mix core business logic with your end views, hence violating the rules of Model-View-Controller architecture."

Yes, too much power in the template leads to code base fragmentation: critical information is not in the applications, but is pushed into the presentation. This also happens with stored procedures and triggers.

I love the questions on Stack Overflow (like this one) asking how to do something super-sophisticated in the Django Template language. And the answer is often "Don't. That's what view functions are for."

Wednesday, October 13, 2010

Real Security Models

Lots of folks like to wring their hands over the Big Vague Concept (BVC™) labeled "security".

There's a lot of quibbling. Let's move beyond BVC to the interesting stuff.

I've wasted hours listening to people identify risks and costs of something that's not very complex. I've been plagued by folks throwing up the "We don't know what we don't know" objection to a web services interface. This objection amounts to "We don't know every possible vulnerability; therefore we don't know how to secure it; therefore all architectures are bad and we should stop development right now!" The OWASP top-ten list, for some reason, doesn't sway them into thinking that security is actually manageable.

What's more interesting than quibbling over BVC, is determining the authorization rules.

Basics

Two of the pillars of security are Authentication (who are you?) and Authorization (what are you allowed to do?)

Authentication is not something to be invented. It's something to be used. In our case, with an Apache/Django application, the Django authentication system works nicely for identity management. It supports a simple model of users, passwords and profiles.

We're moving to Open SSO. This takes identity management out of Django.

The point is that authentication is -- largely -- a solved problem. Don't invent. It's solved and it's easy to get wrong. Download or License an established product for identity management

and use it for all authentication.

Authorization

The Authorization problem is always more nuanced, and more interesting, than Authentication. Once we know who the user is, we still have to determine what they're really allowed to do. This varies a lot. A small change to the organization, or a business process, can have a ripple effect through the authorization rules.

In the case of Django, there is a "low-level" set of authorization tests that can be attached to each view function. Each model has an implicit set of three permissions (can_add, can_delete and can_change). Each view function can test to see if the current user has the required permission. This is done through a simple permission_required decorator on each view function.

However, that's rarely enough information for practical — and nuanced — problems.

The auth profile module can be used to provide additional authorization information. In our case, we just figured out that we have some "big picture" authorizations. For sales and marketing purposes, some clusters of features are identified as "products" (or "features" or "options" or something). They aren't smallish things like Django models. They aren't largish things like whole sites. They're intermediate things based on what customers like to pay for (and not pay for).

Some of these "features" map to Django applications. That's easy. The application view functions can all simply refuse to work if the user's contract doesn't include the option.

Sadly, however, some "features" are part of an application. Drat. We have two choices here.

Assure that there's a "default" option and configure the feature or the default at run time. For a simple class (or even a simple module) this isn't too hard. Picking a class to instantiate at run time is pretty standard OO programming.
Rewrite the application to refactor it into two applications: the standard version and the optional version. This can be hard when the feature shows up as one column in a displayed list of objects or one field in a form showing object details. However, it's very Django to have applications configured dynamically in the settings file.

Our current structure is simple: all customers get all applications. We have to move away from that to mix-and-match applications on a per-customer basis. And Django supports this elegantly.

Security In Depth

This leads us to the "Defense in Depth" buzzword bingo. We have SSL. We have SSO. We have high-level "product" authorizations. We have fine-grained Django model authorizations.

So far, all of this is done via Django group memberships, allowing us to tweak permissions through the auth module. Very handy. Very nice. And we didn't invent anything new.

All we invented was our high-level "product" authorization. This is a simple many-to-many relationship between the Django Profile model and a table of license terms and conditions with expiration dates.

Django rocks. The nuanced part is fine-tuning the available bits and pieces to match the marketing and sales pitch and the the legal terms and conditions in the contracts and statements of work.

Monday, April 5, 2010

Getting Started Creating Web Pages

Got this question recently.

I’m looking for an HTML editor that fits into my price range (free of course). I don’t need to do anything fancy, just vanilla HTML to run on an Apache server ..., and maybe some PHP down the line. Can you recommend any open source or shareware software that would run on Windows?

What to do?

First, civilized folks don’t edit HTML any more. That’s so 1999.

You have a spectrum of choices if you want to try and edit HTML.

General-purpose text editors. Good ones do HTML syntax coloring. This is the hardy, forge-through-the-forest way to go. Raw text editing. Like when we were kids. http://en.wikipedia.org/wiki/List_of_text_editors. In Windows world, I use Notepad++.
HTML-specific editors. http://en.wikipedia.org/wiki/List_of_HTML_editors. Note that WYSIWYG HTML Editing is more trouble than you’d believe possible. It’s always fun for the first few months, but then you try to do something that confuses the GUI interface and you wind up with an entire paragraph in italics and can’t figure out why. Or you want to move a punctuation outside a link and discover that the editor just can’t figure out where the tag is supposed to fall and puts everything inside it. Most of us do not try to use WYSIWYG HTML editors because it slowly becomes annoying once you get beyond the trivial basics.
IDE’s. To produce HTML sensibly, you have to also write .CSS style sheets, and you often have a number of related pages. Essentially, a “project”. An IDE is usually a better choice than an editor. All the good IDE’s are free: Eclipse, NetBeans and Komodo Edit. I use ActiveState Komodo Edit heavily.

While NetBeans or Komodo Edit seems like overkill, it will (eventually) pay out as you move into developing more than static HTML pages.

Better Than HTML

Instead of creating HTML, many of us use “Lightweight Markup” which is much, much easier to cope with and simple tools to produce HTML from the markup. http://en.wikipedia.org/wiki/Lightweight_markup_language

I use reStructuredText instead of HTML. I use the DocUtils project, which has an rst2html.py tool that converts my RST into HTML for me. I also use rst2s5.py to create power-point-like presentations from my reStructuredText. If you want to see the power of RST, you can look at my personal site and my books: and. 100% RST. No manual HTML anywhere. I use Sphinx to create really complex docments like the books.

For some tasks, I use HTML templates and simple scripts to process data and create static HTML from the data. You’d be surprised how effective this is. Few things require up-to-the-second web applications. Many things can be done as nightly batch programs that emit static HTML and FTP the HTML up to the web page. No PHP.

Application Development

For web development, PHP is fine. It will – before long – create holes in your head because it’s so badly thought out. But for getting started, it’s fun. Real companies (like Google) don’t waste their time with it because of the numerous problems PHP causes.

“Problems?” you say. “What problems?”

PHP’s world view (HTML + code in a single package) is a terrible architecture. It’s horribly slow and leads to very muddled, inflexible designs. Everyone who tries to make a global change to their site's “look and feel” finds that PHP is inflexible and a regrettable platform. Even folks who simply want consistency among several different pages within their site find that the PHP world view is more headache than solution.

But it’s fun when you first build a site that works.

Frameworks

Generally, most folks find that a “framework” is absolutely essential for debugging, consistency and separating Content, Processing and Presentation. Even a simple Blog or Forum or Visitor Registration has separate Content, Processing and Presentation; PHP muddles these. A framework can help unmuddle them.

I use Django as framework and Python as programming language. Your hosting site may not support this, in which case you may be in trouble.

The Web Frameworks list on Wikipedia is good. Zend and CodeIgniter are highly recommended in places like StackOverflow. However, here's a good Django vs. PHP comparison: The Onion Uses Django, And Why It Matters To Us.

"Because

Cleaner. Much cleaner. Proper unit testing. Real reusable components across applications. An ORM rather than a just a series of functional query helpers...."

Summary

Get an IDE to edit your pages. Komodo Edit.
Consider using RST and tools instead of raw HTML. Installing Python + DocUtils and using rst2html.py is easier than learning HTML.
Try to avoid PHP’s numerous pitfalls; ideally by avoiding PHP. Use Django + Python and create a real application that clearly separates the content (data model) from processing (view functions) from presentation (HTML templates)

Wednesday, October 28, 2009

Painful Python Import Lessons

Python's packages and modules are -- generally -- quite elegant.

They're relatively easy to manage. The __init__.py file (to make a module into a package) is very elegant. And stuff can be put into the __init__.py file to create a kind of top-level or header module in a larger package of modules.

To a limit.

It took hours, but I found the edge of the envelope. The hard way.

We have a package with about 10 distinct Django apps. Each Django app is -- itself -- a package. Nothing surprising or difficult here.

At first, just one of those apps used a couple of fancy security-related functions to assure that only certain people could see certain things in the view. It turns out that merely being logged in (and a member of the right group) isn't enough. We have some additional context choices that you must make.

The view functions wind up with a structure that looks like this.

@login_required
def someView( request, object_id, context_from_URL ):
   no_good = check_other_context( context_from_URL )
   if no_good is not None: return no_good
   still_no_good = check_session()
   if still_no_good is not None: return still_no_good
   # you get the idea

At first, just one app had this feature.

Then, it grew. Now several apps need to use check_session and check_other_context.

Where to Put The Common Code?

So, now we have the standard architectural problem of refactoring upwards. We need to move these functions somewhere accessible. It's above the original app, and into the package of apps.

The dumb, obvious choice is the package-level __init__.py file.

Why this is dumb isn't obvious -- at first. This file is implicitly imported. Doesn't seem like a bad thing. With one exception.

The settings.

If the settings file is in a package, and the package-level __init__.py file has any Django stuff in it -- any at all -- that stuff will be imported before your settings have finished being imported. Settings are loaded lazily -- as late as possible. However, in the process of loading settings, there are defaults, and Django may have to use those defaults in order to finish the import of your settings.

This leads to the weird situation that Django is clearly ignoring fundamental things like DATABASE_ENGINE and similar settings. You get the dummy database engine, Yet, a basic from django.conf import settings; print settings.DATABASE_ENGINE shows that you should have your expected database.

Moral Of the Story

Nothing with any Django imports can go into the package-level __init__.py files that may get brought in while importing settings.

Friday, October 16, 2009

Django Capacity Planning -- Reading the Meta Model

I find that some people spend way too much time doing "meta" programming. I prefer to use someone's framework rather than (a) write my own or (b) extend theirs. I prefer to learn their features (and quirks).

Having disclaimed an interest in meta programming, I do have to participate in capacity planning.

Capacity planning, generally, means canvassing applications to track down disk storage requirements.

Back In The Day

Back in the day, when we wrote SQL by hand, we were expected to carefully plan all our table and index use down to the kilobyte. I used to have really sophisticated spreadsheets for estimating -- to the byte -- Oracle storage requirements.

Since then, the price of storage has fallen so far that I no longer have to spend a lot of time carefully modelling the byte-by-byte storage allocation. The price has fallen so fast that some people still spend way more time on this than it deserves.

Django ORM

The Django ORM obscures the physical database design. This is a good thing.

For capacity planning purposes, however, it would be good to know row sizes so that we can multiply by expected number of rows and cough out a planned size.

Here's some meta-data programming to extract Table and Column information for the purposes of size estimation.


import sys
from django.conf import settings
from django.db.models.base import ModelBase

class Table( object ):
   def __init__( self, name, comment="" ):
       self.name= name
       self.comment= comment
       self.columns= {}
   def add( self, column ):
       self.columns[column.name]= column
   def row_size( self ):
       return sum( self.columns[c].size for c in self.columns ) + 1*len(self.columns)

class Column( object ):
   def __init__( self, name, type, size ):
       self.name= name
       self.type= type
       self.size= size

sizes = {
   'integer': 4,
   'bool': 1,
   'datetime': 32,
   'text': 255,
   'smallint unsigned': 2,
   'date': 24,
   'real': 8,
   'integer unsigned': 4,
   'decimal': 40,
}
def get_size( db_type, max_length ):
   if max_length is not None:
       return max_length
   return sizes[db_type]

def get_schema():
   tables = {}
   for app in settings.INSTALLED_APPS:
       print app
       try:
           __import__( app + ".models" )
           mod= sys.modules[app + ".models"]
           if mod.__doc__ is not None:
               print mod.__doc__.splitlines()[:1]
           for name in mod.__dict__:
               obj = mod.__dict__[name]
               if isinstance( obj, ModelBase ):
                   t = Table( obj._meta.db_table, obj.__doc__ )
                   for fld in obj._meta.fields:
                       c = Column( fld.attname, fld.db_type(), get_size(fld.db_type(), fld.max_length) )
                       t.add( c )
                   tables[t.name]= t
       except AttributeError, e:
           print e
   return tables

if __name__ == "__main__":
   tables = get_schema()
   for t in tables:
       print t, tables[t].row_size()

This shows how we can get table and column information without too much pain. This will report an estimated row size for each DB table that's reasonably close.

You'll have to add storage for indexes, also. Further, many databases leave free space within each physical block, making the actual database much larger than the raw data.

Finally, you'll need extra storage for non-database files, logs and backups.

Friday, July 31, 2009

Object Models and Relational Joins -- Endless Confusion

Check out this list of questions from Stack Overflow: [Django] join.

These are all folks trying to do joins or outer joins even though they have objects fetched through the ORM.

How does this confusion arise? Easy. Folks work with SQL as if the relational world-view is Important and Universal. It isn't. SQL isn't even a programming language, per se.

Here's the important thing for Django developers to know: SQL is a Hack; Leave it Behind.

The bad news is that all those years spent mastering the ins and outs of the SELECT statement doesn't have as much enduring value as I'd hoped it would have. [Yes, I was a DBA in Ingres and Oracle. I know my SQL.]

The good news is that Object Navigation replaces much of the hideousness of SQL. To an extent. Let's look at some cases.

Joins in General

SQL SELECT statements are an algebraic specification of a result set. The database is free to use any algorithm to build the required set.

SQL imposes the Join hack because SQL is a completely consistent set algebra system. A simple SELECT returns a row-column set of data. A join between tables has to construct a fake row-column set so that everything is consistent.

A Join is nothing more than navigation from an object to associated objects. In OO world, this is simply object containment; the navigation is simply the name of a related object. Nothing more.

Master-Detail (1:m) Joins

A master-detail join in SQL works with a foreign key reference on the children.

In Django, this has to be declared in a SQL-friendly way so that the ORM will work.


class Master( models.Model ):

class Detail( models.Model ):
   master= models.ForeignKey( Master )

The "Join" query is simply this. The "detail_set" name is deduced by Django from the class that contains the foreign key.


for m in Master.objects.filter():
   process m
       for d in m.detail_set.all():
           process d

"But wait!" the SQL purist cries, "isn't that inefficient?" The answer is "rarely". It's possible that the RDBMS, doing a "merge-join" algorithm to build the entire result set might be quicker than this.

As practical matter, however, the rest of the web transaction -- including the painfully slow download -- will dominate the timeline.

Association (m:m) Joins

An association in SQL requires an intermediate table to carry the combinations of foreign keys.