A canonical XML test
I recently did a few days of TDD training for a client. They asked me to help them test and refactor a class that created XML from an internal domain model. This gave me the opportunity to examine a bigger pattern.
I wondered where the domain model came from. Looking through the code base, I found that the same or similar data structures were dealt with many places. As often is the case, I also found a bit of code that parsed an XML structure and output the domain model. This made it possible to use my favorite way of testing mapping code: Round-tripping.
The general pattern: To test translation code, you can test the encoding and decoding as one. These tests will often give you a lot of bang for your buck, both in terms of readability and in terms of error detection rates. Their main limitation is that they may not work to exercise all paths of the code well. If this is a problem, you should supplement them with more fine-grained test.
As I have dealt with this sort of problem a few times before, I’ve decided to create my own XML library, Eaxy (as you do). I introduced the library in the tests, but the production code remained using a combination of DOM and JAXB. Here’s a reasonable reproduction of the test:
@Test
public void shouldReadHtml() {
Element input =
el("people",
el("person",
el("name",
el("firstName", "Johannes"),
el("lastName", "Brodwall")),
el("contact info",
el("address", "Test Street 42"),
el("postalCode", "4242"),
el("phoneNumber", "5552224444"));
File testFile = createTmpFile();
input.writeTo(testFile);
// The names of domain objects are on purpose poor, to reflect
// that this is often useful with legacy code
PersonListExchanger exchange = new XmlToPersonService().read(testFile);
Element output = new PersonXmlDataCreator().createXml(exchange);
assertThat(input.toIndentedXml())
.isEqualTo(output.toIndentedXml());
}
When I introduced this test to the existing code base, we discovered a few interesting things: 1. There were internal dependencies in the XML file that the developers were unaware of as all the canned test data consisted of huge files that nobody would read. 2. A field was decoded from base64, but treated internally as if it was still encoded, leading to doubly encoding it in the output. 3. The output structure was slightly different from the input structure.
The test, combined with coverage measurements, gave us enough confidence to refactor some pretty crufty code that the team relies on in the future. Round-trip testing can give you a lot of bang for your buck.