Sunday, April 27, 2008

Change Whitespace, or change functionality but not both.

When changing software, do yourself a huge favor. For every commit, pick one of these two things to do:
  • Modify the implementation and supporting documentation
  • Modify nothing about the implementation, but instead prettify the code
It is a huge mental effort for humans to process a patch with both style and functional modifications.

This rule has three corollaries:
  • Patches that modify both induce extra effort for reviewers
  • Applying whitespace changes in a place that is likely to be diff'ed across often should be avoided if at all possible.
  • Whitespace should be done correctly the first time, when the code is implemented.
The simple problem is that computers are great at comparing two blocks of flat text, and humans are really awful at it. When diff'ing the code, it's very difficult to spot the functionality changes that are embedded inside of the areas that also have whitespace changes. In my experience, it is easier for the person making the changes to modify the whitespace first, and then make the necessary functionality changes. That is the case because the whitespace changes should make the code easier to read (otherwise, why change it?). However, it makes it harder for someone to merge their changes in.

Let's setup a scenario:
  r2 - r3 - r4
\
wc1
Developer 1 checked out r2, and made a functional change r3, and whitespace changes r4. Meanwhile wc1 is r2 plus the functional changes by developer 2. There are a couple of ways to approach this. Generally the easiest to do is the following:
cd wc1
svn diff > /tmp/patch.tmp
cd ..
mkdir wc2
cd wc2
svn co -r 3 http://repo/project
patch -p1 -i /tmp/patch.tmp
At this point, there are no whitespace changes in the merge, so all of the merge conflicts are real problems. Next, resolve any merge conflicts. The scenario will look like:
r2 - r3 - r4
\
wc2
Then continue the standard workflow:
cd wc2
svn update
At this point, resolve all merge conflicts by choosing the one from the workspace. Choose the one from the workspace, you will want to modify all of the new code to conform to the new whitespace rules. Before committing, review the diff to ensure that there are only functionality differences.

When making a whitespace-only change, clearly mark it, and ensure that using diff with the ignoring whitespace option shows no changes. Reviewers should validate that it is a whitespace only change.

This will save a great deal of trouble when reviewing. It should be stated that prettifying the code should be avoided - the whitespace being wrong should be something that reviewers catch and enforce prior to committing.

In code that is likely to be merged or branched, it is generally best to avoid prettyifing the code. It is always a hassle to read a diff that crosses a significant amount of whitespace changes. It is not uncommon for the author to diff up to the whitespace change, and diff after whitespace changes, and then apply both changes (by hand if necessary).

"Listen To Your Tools"

This is the first post to this blog. Never written one, so this will be a bit of an adventure.

My first thought is:
Listen to your tools.

So, let me tell you a little story. There I was, about a year into working on C code professionally. The project I was working on was an Airbus A-320 simulator that was originally written in QNX 2.0 and was being forward ported to QNX 4.2x. It was a really nice little project. The problem I was solving was the last in a long line of problems; one of our senior developers had run out of time, and it was left to me. We were modifying a video driver that used DMA and I/O ports. All that was done, and everything worked except for the fonts. The screen display had circles, boxes, lines, and raster images. The problem was that all the letters showed up as vertical lines.

So I compiled up the code, and 30 screens of warnings go flying by. It was a really large project, with hundreds of files. Most of the files were modified by an automated tool called MIG24 (Migrate from QNX 2 to QNX 4). So I listened to everybody who told me, "Don't worry about those warnings, they're all over the place". One of the warnings in the files I was was working on was: "No prototype for 'round()'". This concerned me as I knew we used the round function. I asked around, and was told 'round' was a standard C function and not to worry about it.

I ground up against this problem for about three weeks. I assumed we were sending the wrong commands to the hardware. I spent a great deal of time looking into that. And then more and more time; poring over that area of code, using the debugger, spending lots of time writing little test drivers, compiling and re-compiling that code over and over again. I watched the screens full of errors, happily ignoring them for a week or two. Eventually I finally figured the hardware commands were right, otherwise all the other stuff wouldn't work. So I started to debug the font rendering code directly. After a day or two of that, I checked the coordinates of the fonts. Huh? The coordinates that we are pushing to the hardware should draw vertical lines. Another important lesson:

Computers almost always do exactly what you tell them to.

So I start poking around some more, and look at the coordinate computations. They are all clearly correct. They are using the scaling ratio to take a standard size letter and scale it to the right size. So I go check the code that's reading in the fonts, and I check the source data of the font. Hmmm, that all looks good. So I go look at that "obviously correct" scaling code:


float scaleFactor = /* floating point computations. */
int scaledX = round(initX * scaleFactor);


So after printing out scaledX, initX and scaleFactor, I finally figure out that my whole problem was in "round". I tool around some more and find out that C compilers (or at least this one), generate a prototype that assumes all arguments are ints. So it would take "initX * scaleFactory", which produces a float, and pass it using the integer calling conventions, because there was no prototype. So after three weeks of chasing DMA, reading hardware specs, re-confirming the I/O bit-twiddling, chasing file I/O and validating input, I find out that all I had to do was go to the header file for the helper math routines and add

int round(float value);

That was it. Three whole weeks of hard work assuming it was the dirtiest, nastiest problem it could be. The whole time, from the very first compile, my tools told me exactly what the problem was. I just didn't want to listen.

This was my hard knock lesson in listening to my tools. The people who write tools are generally fairly clever folks. They do not warn about things for no good reason, especially at the default warning level. When a tool generates a warning, the first order of business is to research why it does, what the implications of that warning are, and to then fix that warning.

When a tool, especially a high quality tool, prints out a warning - no matter what the tools, be it a C/C++ compiler, Python, the make build system, etc - take the time to track it down. Who knows? It might save you three weeks of hard knocks.