At least part of the problem here is that pdftoppm and hocr2pdf (used by pdfocr) default to low resolutions, like 150. If the input file has higher resolution, nothing lines up. Here is how to...