Excerpt from Python Descriptors: A Comprehensive Guide – Chapter 6: Which Methods Are Needed?

Intro

Last week, I posted that I would put up some excerpts from my book, Python Descriptors: A Comprehensive Guide, this week and next week. This is the first of two.

This chapter of the book has one of my greatest epiphanies in it: “unbound attributes”. For context, I decided to provide the entire chapter here, instead of just putting the excerpt about unbound attributes in. Despite the context, this chapter assumes you at least know the descriptor protocol and its basics. If not, check out Raymond Hettinger’s relatively short article on descriptors. Enjoy!

Chapter 6: Which Methods Are Needed?

When designing a descriptor, it must be decided which methods will be included. It can sometimes help to decide right away if the descriptor should be a data or non-data descriptor, but sometimes it works better to “discover” which kind of descriptor it is.

__delete__() is rarely ever needed, even if it is a data descriptor. That doesn’t mean it shouldn’t ever be included, however. If the descriptor is going to be released into open domain, it wouldn’t hurt to add the __delete__() method simply for completeness for cases when a user decides to call del on it. If you don’t, an AttributeError will be raised upon trying to delete it.

__get__() is almost always needed, for data and non-data descriptors. It is required for non-data descriptors, and the typical case where __get__() isn’t required for data descriptors is if __set__() assigns the data into the instance dictionary under the same name as the descriptor (what I call set-it-and-forget-it descriptors). Otherwise, it is almost always wanted for retrieving the data that is set in a data descriptor, so unless the data is assigned to the instance to be automatically retrieved without __get__() or the data is write-only, a __get__() method would be necessary. It should be kept in mind that if a descriptor doesn’t have a __get__() method and instance doesn’t have anything in __dict__ under the same name as the descriptor, the actual descriptor object itself will be returned.

Just like __delete__(), __set__() is only used for data descriptors. Unlike __delete__(), __set__() is not regarded as unnecessary. Seeing that __delete__() is unused in the most common cases, __set__() is nearly a requirement for creating data descriptors (which need either __set__() or __delete__()). If the descriptor’s status as data or non-data is being “discovered”, generally __set__() is the deciding factor. Even if the data is meant to be read-only, __set__() should be included to raise an AttributeError in order to enforce the read-only nature.

When `get()` is Called Without `instance`

It is often that a descriptor’s __get__() method is the most complicated method on it because there are two different ways it can be called: with or without an instance argument (although “without” means that None is given instead of an instance).

When the descriptor is a class-level descriptor (usually non-data), implementing __get__() without using instance is trivial, since that’s the intended use. But when a descriptor is meant for instance-level use, and the descriptor is not being called from an instance, it can be difficult to figure out what to do.

Here, I present a few options.

Raise Exception or Return `self`

The first thing that may come to mind is to raise an exception, since class-level access is not intended, but this should be avoided. A common programming style in Python is called EAFP, meaning that it is Easier to Ask for Forgiveness than for Permission. What this means is that, just because something isn’t used as intended, it doesn’t mean that usage should be disallowed. If the use will hurt invariants and cause problems, it’s fine; otherwise, there are other, better options to consider. The conventional solution is to simply return self. If the descriptor is being accessed from the class level, it’s likely that the user realizes that it’s a descriptor and wants to work with it. Doing so can be a sign of inappropriate use, but Python allows freedom, and so should its user, to a point. The property built-in will return self (the property object) if accessed from the class, as an example.

“Unbound” Attributes

Another solution, which is used by methods, is to have an “unbound” version of the attribute be returned. When accessing a function from the class level, the function’s __get__() detects that it does not have an instance, and so just returns the function itself. In Python 2, it actually returned an “unbound” method, which is where the name comes from. In Python 3, though, they changed it to just the function.

This can work for non-callable attributes as well. It’s a little strange, since it turns the attribute into a callable that must receive an instance to return the value. This makes it into a specific attribute lookup, akin to len() and iter(), where you just need to pass in the instance to receive the wanted value.

Here is a stripped-down __get__() implementation that works this way.

def __get__(self, instance, owner):
   if instance is None:
      def unboundattr(inst):
         return self.__get__(inst, owner)
      return unboundattr
   else:
      ...

The inner unboundattr() function can use the same implementation code within that the else block does, but it was desirable to show off a universal way to define unboundattr() for the sake of completeness, and it leads to less duplicate code by simply calling itself again. Here’s a reusable implementation, though, which can be used in any descriptor.

class UnboundAttribute:
   def __init__(self, descriptor, owner):
      self.descriptor = descriptor
      self.owner = owner

   def __call__(self, instance):
      return self.descriptor.__get__(instance, self.owner)

Using this class, a __get__() method that uses unbound attributes can be implemented like this

def __get__(self, instance, owner):
   if instance is None:
      return UnboundAttribute(self, owner)
   else:
      ...

The original version relies on closures around self and owner, which removes its reusability, other than through copying and pasting. But the class takes those two variables in with its constructor to store on a new instance.

The really interesting (and useful) thing about this technique is that the unbound attribute can be passed into a higher-order function that receives a function, such as map(). It avoids having to write up a getter method or ugly lambda. For example, if there was a class like this:

class Class:
   attr = UnbindableDescriptor()

A map() call to a list of Class objects like this:

result = map(lambda c: c.attr, aList)

could be replaced with this:

result = map(Class.attr, aList)

Instead of passing in a lambda to do the work of accessing the attribute of the Class instances, Class.attr is passed in, which returns the “unbound” version of the attribute – a function that receives the instance in order to look up the attribute on the descriptor. In essence, the descriptor provides an implicit getter method to the reference of the attribute.

This is a very useful technique for implementing a descriptor’s __get__() method, but it has one major drawback: returning self is so prevalent that not doing so is highly unexpected. Hopefully, this idea gets some traction in the community and becomes the new standard. Also, as seen in the upcoming chapter on read-only descriptors, there needs to be a way to access the descriptor object. Luckily, all you need to do is get the descriptor attribute from the returned UnboundAttribute.

Even though it’s not the expected behavior, the built-in function descriptor already does this, so it won’t be too difficult for them to get used to it. People expect “unbound method” functions when accessing from the class level, so applying the convention to attributes shouldn’t be a huge stretch for them.

Programming Ideas With Jake

Jake explores ideas in Java and Python programming