... but please do repeat me

February 14, 2009

The hard choice between duplication, paralysis and chaos

A common programmer credo is “Don’t Repeat Yourself” (Pragmatic Programmer) or “Once and only once” (Extreme Programming). Like all credos, we risk following it even when it is not appropriate.

The larger truth is that we have choice between three evils:

We can duplicate our code, thus duplicating effort, understanding and being forced to hunt down twice.
We can share code and affect everyone who shares the code every time we change to code to better fit our needs. If this is a large number of people, this translates into lots of extra work. If you’re on a large project, you might’ve experienced code storms: Days where you’re unable to get any work done as you’re chasing the consequences of other people’s changes.
We can keep shared code unchanging, thus forgoing improvements. Most code I (and I expect, you) write is not initially fit for its purpose, so this means leaving bad code to cause more harm.

I expect there is no perfect answer to this dilemma. When the number of people involved is low, we might accept the noise of people changing code that’s used by others. As the number of people in a project grows, this becomes increasingly painful to everyone involved. At some time, large projects start experiencing paralysis.

If we’re not happy with the state of the code when paralysis sets in, it might be that there’s really only one option left: To eschew the advice of the masters and duplicate the code.

Comments:

dagblakstad - Feb 18, 2009

Important question, and perhaps it’s all about chossing [b]your[/b] evil in the end. There is no silver bullet and easy way out in this I think.

But: The evil chosen should be the one least capable of making a mess in your project. This depends on:

How many software clients do you have that will have to be changed when stirring things up?
Sometimes 2 evils can be better than one: Support multiple simultaneous interfaces or versions (4th evil?) so not everyone must adapt at the same time

Shared code require a lot more forethought and careful crafting than unshared. Having good test coverage will be a good help to avoid breaking the contract accidentially, but this is no new thoughts though (at least I hope it is not)

jhannes - Feb 15, 2009

Projects do go though such cycles. And your point is very important: At different stages, different choices will be better.

However, I’ve seen many instances of code never reaching sufficient maturity before the project becomes too large to avoid a (possibly de facto) code freeze. It is important to recognize this likely failure mode and treat it serious if it occurs.

I’ve seen a few examples that especially domain objects end up having this fate. The open-closed principle is often not the way to go for domain entities. This means that a stable domain model soon may feel anemic.

Eivind - Feb 15, 2009

This subject has interested me for a long time. Thank you for bringing it up. Isn’t there a life cycle here, and won’t changing options over time help?

You sketch three different approaches, that I attempt to rephrase as follows:

Duplicate code, so that change at one place does not affect code somewhere else, but at the cost of duplicated changes of common logic
Share code, to avoid duplicated changes of common logic, but at the cost of requiring change in all the clients when the logic of any of the suppliers changes
Do not allow changes, to avoid both of the inconveniences above, but at the cost of conserving unsatisfactory solutions

I tend to think that the solution will change over time.

Initial code is unstable and has only a few clients. Apply approach 2.
As code matures and stabilizes, fix its contract (precondition and postcondition). Apply rule 3 and the open/closed principle.
A subsequent change may of may not break the contract.

If it does not (weaker precondtion, stronger postcondition), there is nothing to worry about - clients are not affected by the change. All tests will pass unchanged.
If it does, apply rule 1 and define a new function for the new contract, possibly making the old definition deprecated to leave a transision time for the clients.

Just some thoughts.

[Christian Rørdam] - Feb 17, 2009

An important discussion, and I think you summarize the dilemma quite well.

May I add a point:
Sometimes you need a new module that starts out identical to some
other module, but actually it is totally independent of the other
module and will most likely divert more and more from it. In that case
I think you will be better off just copying the whole thing and let it
live its own life. I think.