Book review: Formalizing debugging

This is the third in a series of reviews of debugging books.

The Science of Debugging (Matt Telles and Yuan Hsieh, Coriolis, 495 pp., 2001)
Why Programs Fail: A Guide to Systematic Debugging, 2nd Edition (Andreas Zeller, Morgan Kaufmann, 400 pp., 2009)

Given all the time and effort collectively spent debugging by the software industry, it makes sense to explore the notion of formalizing the practice. This involves reflecting on the process and breaking it down into its constituent parts. To complicate matters, as a field of study debugging is a cross between empirical and formal sciences. The artifacts that you deal with (programs) are complicatedly deterministic, birthed by social forces and maintained in the political realm of human interaction. You can’t debug in a vacuum of technological processes.

The two books I talk about in this review take a stab turning debugging into an activity based on evidence and formalism, beyond that of “this worked for me.” There is the wonderful Why Programs Fail and the woeful Science of Debugging. We’ll start with the woeful.

Studying The Science of Debugging was a slog. For one thing, it’s got a lot of text.¹ The physical copy I read was typical textbook size with surprisingly small margins. Not only is there a lot of text, you’re confronted with walls of it. It was mildly intimidating.

That’s a superficial nitpick, though. A more substantial problem was the editing (if one can call it that). It was a constant distraction. First, there were seemingly minor issues that ended up being serious distractions, like using the word “low” when “high” was intended. Such things make you do a double-take during the act of reading, and you start to question if it’s an editing mishap, or if you or the authors are misunderstanding something. When it happens with the frequency it does in this book, you have to start checking other sources to verify what is being said. There was also an abundance of meandering sentences. It felt like a first draft for long stretches. The more grievous editing mistake, though, was allowing the same stories to be repeated ad nauseam. While reading this book I felt that I was duped into partaking in a lecture-rant. I lost track of how many times it was said that if you add a printf and the bug goes away, you haven’t solved anything. For as much as the case of the magical printf was mentioned, you would think it to be an epidemic. And if you want regular reminders about the race conditions and buffer problems of a digital TV system from the 90s, by all means subject yourself to this book.

All of this is forgivable, though. I mention it for selfish reasons: it’s hard for me to contain how much I didn’t like The Science of Debugging. The core problem is that the book fails to make its case of defining debugging as a profession.

There are good parts to the book (with that many words, you’re bound to get something right!). One of those is a survey of some infamous bugs. By now you’ve probably heard of most of them — Intel Pentium FDIV, Ariane 5, Therac-25, Mars Climate Orbiter, AT&T outages — and they are still good case studies. In fact, the overview occurs at the beginning of the book and made me hopeful for the rest. Those hopes were dashed once it started to set down something approaching the science in the title.

Discussing the philosophy of science is too much for this review, but it’s safe to say that part of science is the act of classifying. Taxonomy is a often a contentious practice, but there is general agreement that there are similarities among groups of things, and it is useful to try to categorize them. Defects and bugs are no exception. Diagnosing and correcting a defect becomes easier as you gain experience solving them and recognizing similarities can guide an investigation.

Telles and Hsieh examine this idea by discussing bug definitions and life cycles, leading to a bug taxonomy. It’s a fundamental aspect of the book because it tries to establish a groundwork for a debugging profession. A taxonomy would provide it a knowledge base. Finding new entries or refining the definitions would be part of the science.

The taxonomy starts with “classes of bugs”: requirement, design, implementation, process, build, deployment, future planning, and documentation. It’s a reasonable breakdown given that it roughly aligns with the phases of software development. It then immediately describes the “bug classes,” making the classic mistake of overloading the terminology. It’s confusing to the reader: how are “bug classes” different from “classes of bugs”? Muddying the waters is the fact that the “bug classes” consist almost entirely of things that occur directly in code. That is, they seem to fall into the class of implementation bugs, thus completely ignoring the other seven classes.

It’s a deeply unsatisfying narrative. I give the authors credit for ambition but the execution really misses the mark. Worse, the range covered by the bug classes varies wildly, and overlaps in confounding ways. For example, one class, “Hard-Coded Length/Sizes,” is remarkably narrow, whereas another, “Distributed Application Errors,” is wide open. There is “Memory or Resource Leaks” and “Allocation/Deallocation Errors,” where the latter literally sets itself up as a subset of the former. In the description of what constitutes an allocation or deallocation error, it says “This is a classic memory leak.” So, is an allocation/deallocation error a subclass of memory or resource leaks? It sure seems like it, but that relationship is ignored. As mentioned earlier, taxonomy is contentious, but the breakdown presented seems hastily concocted, bordering on careless.

When it comes to the idea of debugging as a profession, I will admit I was skeptical but willing to listen to the argument. In the end I wasn’t convinced, but that doesn’t mean it is a bad argument to put forth. What The Science of Debugging presents, though, is not much of an argument. I struggled to get at just why their treatise doesn’t work, beyond the fact that I have about 17 years of hindsight and industry knowledge to draw from since it was published.

After the taxonomy is laid out, Telles and Hsieh go on at (great) length about how to debug, different debugging techniques and situations, postmortem analysis, testing, maintenance, and something called “prebugging,” which is a term they seem to have coined for describing decent design and implementation decisions. Most of the discussion is the kind of thing you’d find in any reasonable software engineering book of today or even at the time. The organization of it may be questionable, but aside from one thing (see below) it’s more of the usual advice. Its greatest weakness is the lack of concision. It truly hampers the message.² The descriptions drag, the stories are repetitive, and there is a distinct air of pomposity in the war stories.

At the end of the book they make the case for debugging as a profession. After such an exasperating read, I was hoping for some redemption and, again, was left wanting. It turns out that the profession of debugging, as laid out in the book, is what we know as “software development.” The book ends with a “typical day in the life of a professional debugger.” To get a flavour, here is the first entry in the imagined logbook.

8:00 A.M. I arrive at work, check my email, and look for new problems or reports. There are two new bugs reported in my email. One is of moderate severity, the other of low severity. I decide to leave them for a bit while I read the remainder of my email.

This is followed by similar descriptions of things developers do every day: try to reproduce bugs, find possible enhancements, talk to others about issues, and attend meetings, complete with details about when the debugger does a quick crossword puzzle or gets a glass of water (that’s just a taste of the verbosity in the book’s 495 pages).

Ultimately, I think the book fails because the authors have a twisted take on software engineering and development. If that is the life of a debugger, what is the life of a developer? Do they only churn out code? How can that be a reasonable team dynamic? If developers only churn out code, with testing left to testers, design left to architects, and debugging left to debuggers, then you get dangerously close to the siloed waterfall strawman now used to prop up the egos of Agile evangelists. You’d be hard pressed to find someone who seriously advocates for this kind of approach, even at the time, yet the book seems to be doing just that. Put curtly, it’s bad.

Two other aspects of The Science of Debugging must be called out. First, the bug puzzles. Each chapter ends with a “bug puzzle” that describes a situation to be debugged. They don’t illuminate the ideas, nor do they add to the discussion. Reading the solutions in the appendix, you’re left wondering what the clues in the puzzle actually were since the solution is so far removed from the puzzle description. They come across as arrogant showmanship.

This pales in comparison to the blatant provocation put forth when talking about defensive programming: asserts are evil. The opening paragraph of section “The Evils of Asserts” (p. 380) lays out what can only be described as a nuclear take:

When we were surveying books and articles written about debugging and bug prevention, one technique that stood out in all of these books and articles was the encouragement to use asserts. Frankly, we do not see the value of asserts in producing quality software. In fact, we believe that usage of asserts is the mark of a lazy programmer as we discussed in Chapter 4. We believe that asserts can increase the odds of bug formation and introduction, and that usage of asserts should be avoided.

Their argument, in a nutshell, is that asserts are removed in production builds, and just because the situation doesn’t occur in development doesn’t mean it won’t occur in production. The problem is that they equate assert usage with error handling. Asserts as error handling is a genuinely bad idea, but apparently the authors could envision no other usage of asserts, nor had they experienced any other approach. Even worse, they double down and claim — with no evidence! in a science book! — that asserts actually produce more bugs. It’s actively bad advice derived from anecdotal data.

The Science of Debugging is the first tech book, let alone debugging book, that grated my nerves so much that I can’t recommend it in any way. You’ll get the same information elsewhere and it will probably be presented better.

At the complete opposite end of the recommendation spectrum is Zeller’s book. It acted as a blissful antidote.

Comparing Why Programs Fail to The Science of Debugging is an unfair exercise. For one thing, Zeller’s book is laser-focussed. Zeller addresses a single aspect of formal debugging: given a reproduce scenario that can induce a program failure, how can you determine the defect(s) that caused the failure? It’s the technical essence of debugging and deftly avoids the softer parts of the process. Those softer things, such as bug tracking, are not ignored, but nor are they explored beyond what is necessary. Telles and Hsieh, on the other hand, addressed a much broader scope with considerably less formality. It was not my intent to compare books of such contrasting quality, but the best laid plans of mice and men often go awry.³

What Zeller does in his book is formalize the process of diagnosing and fixing defects that good software engineers seem to intuit. He dubs it the delta debugging algorithm. It’s one of those algorithms that seems obvious after you read it. The presentation and explanation is so clear that you’re left wondering how you never thought of it yourself by the time you reach the end.

Although delta debugging makes perfect sense, I often wondered if Zeller underappreciates the effort that could be involved in automating it. The purpose to his formalization is to automate the debugging process as much as possible and includes an implementation of delta debugging (it’s not that complicated). However, it’s not one of those algorithms you can just download and use. There are a variety of environmental factors that come into play in any attempt to use it. This does not undermine its value (ignore the algorithm at your peril!) but as I read about it and imagined using it in my day-to-day work, it did not seem as easy as the text was suggesting. And on the other side, there are times where automating is probably overkill. These caveats, though, speak to the brilliance of the work: the automation is not a necessary component for it to be genuinely useful.

As with any book that is older, the tools used are dated but it does not rely on them to get the point across, aside from some of the basics. There is also a somewhat distracting practice of using all-caps for most program names (you’ll see a lot of “MOZILLA”). Even in the tool use, though, Zeller is the consumate academic and informs the reader to proceed with caution as the tools may not be available anymore. It’s a refereshing change from other books who seem to think that URLs are forever.

Zeller’s book also dedicates a whole chapter to usage of asserts, which made for an entertaining contrast to Telles and Hsieh’s absymal advice. Zeller’s arguments are convincing as he frames asserts as automating observation. More importantly, he outlines where they are useful: for checking invariants, pre and post conditions, and as kind of specification.⁴

The latter part of the book is more academic, so those with a practical bent may find it less than engaging. Zeller warns the reader that the ideas are not fully evaluated — a nice touch — but that should not discourage you from reading it. Although I have not heard of any tools that implement the ideas he outlines, it’s important stuff.

Why Programs Fail is a stellar book. It is well organized, clearly written, illuminating, and thoughtful. It is worth your time and attention, and will be useful for many years to come.

Footnotes:

By a rough estimate, the text contains about 220000 words. I picked a page that struck me as a representative page of text and counted the words on five random lines. Each one had 14 words. There were 39 lines on the page, for a total estimate of 546 words. As a rough guess, if you took out code samples and figures, you had about 400 pages of text for a total of 218400 words.

My wife will attest to this because of all the cursing that I weaved into the phrase “Get to the point!”, which was exclaimed many times as I read the book.

I do these reviews by trying to pick books that appear to be thematically similar. I was hoping to do a more thorough comparison of attempts at formalism, but instead got served with polar opposites. It meant I had to write a much different review than I planned.

⁴

Reading that chapter, my mind wandered into an imaginary world where the authors of the two books try to convince me of their position through a cooking competition. In one case I get fresh pasta, smooth tomato sauce, and homemade bread; the other I get a microwaved TV dinner with a cigarette butt in it.