The Law of Demeter, and a question of design

I was reading the comment thread on Hacker News about Joe Armstrong's critique of object-oriented style. One commentor by the name nessus42 asked a thought-provoking question: how would one model a book with chapters, sections, paragraphs, and sentences. The problem is such: it's easy to model a book as being an object of Book type, which has a method for producing a list of Chapter objects, each of which can produce a Section object, each of which has Paragraph objects in side, whence can be extracted Sentence objects. This is all well and good, and a reasonable OO model. But with this model, changes to implementation can force changes in the interface; we'll see in a second why.

To make things concrete, suppose we have the following problem: we want to, given a character index into a book (or, say, a specific word), produce a citation for that character. That citation may be just the chapter that that character is in, the section and chapter, or the paragraph, section, and chapter. Let's also assume that this is intended to be a web service, where millions of users use it simultaneously, so performance is a big concern.

So, why do we take pains to produce models like the chapter, section, paragraph heirarchy? Well, one of the main goals of this sort of design is encapsulation: if we define the book as above, we don't necessarily know, when we use the book abstraction, whether the chapters are implemented as a linked list or and array or a keyed hash table; and if we want to change the parser that produces this Book to use a different algorithm, or to parallelize it, we can do that without introducing changes.

But nessus42 asks the following question: what happens if we want to introduce the notion of subsections? The natural thing to do is to make Section objects be a list of Subsection objects; but now this breaks existing code, because that code will expect Paragraph objects instead. You could make a Section contain a list of Paragraph objects who when queried for parents are Subsection objects, but this is ugly too since now the parent of the child of a thing is not that thing. With this heirarchical chapter, section, paragraph organization, adding levels to the heirarchy forces us to change existing code.

Now nessus42 reminds us of the Demeter system, which, given methods to drill down one step, produced methods to drill down to a specific level. So, it would add a method for chapter.sentence(idx), which gave you a sentence containing a given index. This way, if the book model gained subsections, your code wouldn't change, since asks for sentences specifically, not for just "the next level". The problem with doing this in anything but Demeter is that you end up writing hundreds of these methods; chapter.sentence(idx) calls section.sentence(idx) which calls paragraph.sentence(idx) which…

After thinking about this a while, I think there's another viewpoint onto this solution that may clear things up, and also actaully be possible to implement in most languages. We started off by talking about a book as a rigid tower of chapters of sections of paragraphs of sentences – this was a mistake. Books certainly have that structure, but that structure isn't the only valid structure you can impose on a book. Perhaps better would be to think of a model of a book that allowed this heirarchical structure, but also encompassed others; that way, when we changed our model to add subsections, we wouldn't have to change the interface.

In this case, I'm drawn to the observation that chapters, sections, subsections, paragraphs, and sentences are all contiguous spans of text. So maybe instead of making a book chapters of sections of …, we instead want to say that a book is a sequence of words; and that further, the book consists of various ranges of words that have semantic meaning. We don't put sentences and chapters on different footing: a book contains both individual sentences, and full chapters, as meaningful ranges of text.

Now, this is fine, but we do need to somehow distinguish between chapters and sections; if you want to find what chapter a certain point in the text is in, you want the chapter, not the section or paragraph or sentence; how do you specify? Or do you just always get an arbitrary one? Well, first, we'll have a function to give us all the text ranges that a certain sentence belongs in, which we can do quickly with a binary search; and then we'll take this set, presumably of TextRange objects, and filter it to get only those that are actually objects of the Chapter subclass. If we ever add subsections to our model of a book, the set of text ranges changes, but the filtered variant does not.

How did I arrive at this model? Well, what the Demeter system did was produce identical methods on multiple classes: the same sentence(idx) call was available on the Section, Chapter, and Paragraph classes. But instead of duplicating this call, I just imagined that this was implemented by some superclass that all of Section, Chapter, and Paragraph inherited from in parallel; text ranges were such a class.

So perhaps we don't need a Demeter system after all, but instead just need to take more care about unifying seemingly-different classes and inheriting common functionality from there.

By Pavel Panchekha

15 July 2012

The Law of Demeter, and a question of design