gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf
. If you need to work out if the document is colour or not, ghostscript's inkcov device can tell you if colour was used at all.tesseract -psm 0 …
will output orientation and language heuristics, useful if your scanner spits out duplex pages flipped. It's a bit slow, but faster than manually digging through a PDF object stream looking at the individual convolution matrices.convert -define 'jp2:rate=0.008' in.png out.jp2
. More (possibly dated) ways of mucking about with JP2s are here: JPEG 2000 on Ubuntu — without anyone getting stabbed