  • Scan Tailor does a very decent job of slicing up scanned books.
  • gscan2pdf, if you like graphical things, probably does what you want with scanned images …
  • alternative to cbz format that's readable as a PDF yet retains JPEG files intact: collate with img2pdf, burst apart with pdfimages -j … from Poppler.
  • other OCR software: Cuneiform used to be a contender and may still be useful. Not so sure about Gnu ocrad, gocr/jocr, …
  • tesseract:
    • tesseract -psm 0 … will output orientation and language heuristics, useful if your scanner spits out duplex pages flipped. It's a bit slow, but faster than manually digging through a PDF object stream looking at the individual convolution matrices.
    • training tesseract is definitely worth it if you have a limited domain, say line-printed numerals. (You might have less success with the completely illegible old German Sütterlin cursive, though.)
  • JPEG 2000: The least painful way I found of making JPEG 2000 files from colour images was with ImageMagick/GraphicMagick's convert command: convert -define 'jp2:rate=0.008' in.png out.jp2. More (possibly dated) ways of mucking about with JP2s are here: JPEG 2000 on Ubuntu — without anyone getting stabbed

