The Law of Demeter, and a question of design
I was reading the comment thread on Hacker News about Joe Armstrong's
critique of object-oriented style. One commentor by the name nessus42
asked a thought-provoking question: how would one model a book with
chapters, sections, paragraphs, and sentences. The problem is such:
it's easy to model a book as being an object of Book
type, which has
a method for producing a list of Chapter
objects, each of which can
produce a Section
object, each of which has Paragraph
objects in
side, whence can be extracted Sentence
objects. This is all well
and good, and a reasonable OO model. But with this model, changes to
implementation can force changes in the interface; we'll see in a
second why.
To make things concrete, suppose we have the following problem: we want to, given a character index into a book (or, say, a specific word), produce a citation for that character. That citation may be just the chapter that that character is in, the section and chapter, or the paragraph, section, and chapter. Let's also assume that this is intended to be a web service, where millions of users use it simultaneously, so performance is a big concern.
So, why do we take pains to produce models like the chapter, section,
paragraph heirarchy? Well, one of the main goals of this sort of
design is encapsulation: if we define the book as above, we don't
necessarily know, when we use the book abstraction, whether the
chapters are implemented as a linked list or and array or a keyed hash
table; and if we want to change the parser that produces this Book
to use a different algorithm, or to parallelize it, we can do that
without introducing changes.
But nessus42 asks the following question: what happens if we want to
introduce the notion of subsections? The natural thing to do is to
make Section
objects be a list of Subsection
objects; but now this
breaks existing code, because that code will expect Paragraph
objects instead. You could make a Section
contain a list of
Paragraph
objects who when queried for parents are Subsection
objects, but this is ugly too since now the parent of the child of a
thing is not that thing. With this heirarchical chapter, section,
paragraph organization, adding levels to the heirarchy forces us to
change existing code.
Now nessus42 reminds us of the Demeter system, which, given methods to
drill down one step, produced methods to drill down to a specific
level. So, it would add a method for chapter.sentence(idx)
, which
gave you a sentence containing a given index. This way, if the book
model gained subsections, your code wouldn't change, since asks for
sentences specifically, not for just "the next level". The problem
with doing this in anything but Demeter is that you end up writing
hundreds of these methods; chapter.sentence(idx)
calls
section.sentence(idx)
which calls paragraph.sentence(idx)
which…
After thinking about this a while, I think there's another viewpoint onto this solution that may clear things up, and also actaully be possible to implement in most languages. We started off by talking about a book as a rigid tower of chapters of sections of paragraphs of sentences – this was a mistake. Books certainly have that structure, but that structure isn't the only valid structure you can impose on a book. Perhaps better would be to think of a model of a book that allowed this heirarchical structure, but also encompassed others; that way, when we changed our model to add subsections, we wouldn't have to change the interface.
In this case, I'm drawn to the observation that chapters, sections, subsections, paragraphs, and sentences are all contiguous spans of text. So maybe instead of making a book chapters of sections of …, we instead want to say that a book is a sequence of words; and that further, the book consists of various ranges of words that have semantic meaning. We don't put sentences and chapters on different footing: a book contains both individual sentences, and full chapters, as meaningful ranges of text.
Now, this is fine, but we do need to somehow distinguish between
chapters and sections; if you want to find what chapter a certain
point in the text is in, you want the chapter, not the section or
paragraph or sentence; how do you specify? Or do you just always get
an arbitrary one? Well, first, we'll have a function to give us all
the text ranges that a certain sentence belongs in, which we can do
quickly with a binary search; and then we'll take this set, presumably
of TextRange
objects, and filter it to get only those that are
actually objects of the Chapter
subclass. If we ever add
subsections to our model of a book, the set of text ranges changes,
but the filtered variant does not.
How did I arrive at this model? Well, what the Demeter system did was
produce identical methods on multiple classes: the same
sentence(idx)
call was available on the Section
, Chapter
, and
Paragraph
classes. But instead of duplicating this call, I just
imagined that this was implemented by some superclass that all of
Section
, Chapter
, and Paragraph
inherited from in parallel; text
ranges were such a class.
So perhaps we don't need a Demeter system after all, but instead just need to take more care about unifying seemingly-different classes and inheriting common functionality from there.