I just ran a couple of tests with real bitmaps from 40GB drives and 250GB drives in various states of fullness and found that my fear was correct. With all of the extra conditionals to be checked each loop, it takes a statistically significant amount longer to run if we check for 0x00 and 0xFF.
But, I then ran it on a 1TB real partition bitmap and it completed in an average of 0.392 seconds (n=50, 95%CI 0.387s-0.398s) whereas the normal one completed in an average of 1.130 seconds (n=50, 95%CI 1.106s-1.155s).
Either way, for normal systems, it really doesn't seem to matter in any appreciable way, and adding a status indicator would just slow it down (has to do a comparison every loop to see if it is time to display an update) and you wouldn't even get to see it. The file sizes, by the way, are still quite small for the logs, so it should really be helpful. In the 1TB drive it ended up only being 3.3MiB, so nothing to worry about. It occurs to me that there may be a problem with using 32bit integers to store the position, but that won't happen with drives less than 16TiB.
I did run a test switching them to unsigned long long but it does cost in time (0.542s [0.534s-0.552s] and 1.675s [1.662s-1.687s] respectively), so that is something to consider. Then again, it would not cost anything when compiled for 64 bit...
Anyway, let me know if there are any other things you might like out of it. Or, of course, you can play around with all of those things yourself. As I mentioned, all you should need are gcc and libc6-dev packages and compile it with:
gcc -D_FILE_OFFSET_BITS=64 -O3 -Wall processbitmap.c -o processbitmap