It still doesen't work properly; the best solution I found so far for pdf files with searchable text layer is gscan2pdf 0.9.31; using ocropus as engine the recognition is pretty good; the matching between image and text is very accurate.

I have developed another solution producing djvu-files, see http://wiki.ubuntuusers.de/xsane2djvu , a wrapper for xsane-text recognition; it's german, but the script is anotated, so it should be not too difficult to use...

so long
clasikowski AKA hank