Undergraduate Capstone Open Source Projects

Rich, Famous, and Popular

Posted by Greg Wilson on 2010/01/17

Almost everyone who joins a new project says it sooner or later: “More documentation, please.” No one can make sense of 30,000 lines of code in one gulp: everyone wants an overview or roadmap to help them make sense of things. So why don’t they exist?

  1. Almost by definition, by the time you can write that document, you don’t need it yourself. You probably also have a dozen tickets assigned to you by then too, all of which really, really need to be fixed for next week’s release.
  2. Overviews are much harder to write than lower-level Javadoc-style explanations of what individual methods do. The latter is just an assemblage of facts; the former requires story-telling skills, and good storytellers are rare in any field (not just programming).
  3. Anyone who ever has written an overview document knows that it will rust pretty quickly. Design decisions will change, code will be refactored, and pretty quickly, that 30-page tutorial you sweated over is so far out of date that it’s actually doing as much harm as good. Keeping it up to date is a never-ending struggle, and it’s not like people have stopped assigning you tickets…

Jacob Kaplan-Moss (from the Django team) wrote a good post a while back about writing great documentation. It’s worth reading, and he’s right: after a certain point, investing effort in documentation and discussion actually pays a bigger dividend for open source projects than investing effort in code. It’s still an open research problem, though; anyone who ever figures out how to generate, check, and update narrative explanations of how code is structured, what it does, and (most importantly) why, will be rich, famous, and popular. Lemme know how it goes…

One Response to “Rich, Famous, and Popular”

  1. tedkirkpatrick said

    Is the title of this post a self-description, Greg? 😉

    I think the difficulty of writing high-level narratives, their brittleness in the face of ongoing change, and the limitations of algorithms for supporting this activity all stem from the same dilemma: We need high-level narratives precisely because there is an enormous gap between the detailed operational description of behaviour provided by code, versus the more qualitative issues that matter to humans as we plan changes to that code. There are some kinds of human meaning that are likely inherently beyond algorithms, and that is exactly why we need them to understand algorithms and also why code-analysis algorithms can’t generate that meaning for us.

    Code analysis tools can still support the process, and I think there’s probably more that we can do. But I suspect it will be long time, if ever, before such tools can give us the high-level “story” we want. For the forseeable future, only humans will be able to do that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: