Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
meeting:2017-04 [2017/04/15 12:44] scruss [Notes] - added sample file |
meeting:2017-04 [2018/01/15 08:21] scruss [Moar Notes] |
||
---|---|---|---|
Line 11: | Line 11: | ||
===== Notes ===== | ===== Notes ===== | ||
+ | * Video! [[https://www.youtube.com/watch?v=EH_txB_hJWw|A Bit More Than Mostly Searchable: Scanned Paper You Can Find with Stewart Russell - YouTube]] | ||
* Slide deck: [[https://scruss.com/talks/02017/gtalug201704-ABitMoreThanMostlySearchable.odp|ABitMoreThanMostlySearchable.odp]] | * Slide deck: [[https://scruss.com/talks/02017/gtalug201704-ABitMoreThanMostlySearchable.odp|ABitMoreThanMostlySearchable.odp]] | ||
Line 18: | Line 19: | ||
==== Moar Notes ==== | ==== Moar Notes ==== | ||
- | * [[http://gscan2pdf.sourceforge.net/|gscan2pdf]], if you like graphical things, //probably// does what you want … | + | * [[http://scantailor.org/|Scan Tailor]] does a decent job of slicing up scanned books. |
+ | |||
+ | * [[http://gscan2pdf.sourceforge.net/|gscan2pdf]], if you like graphical things, //probably// does what you want with scanned images … | ||
* alternative to [[https://en.wikipedia.org/wiki/Comic_book_archive|cbz]] format that's readable as a PDF yet retains JPEG files intact: collate with [[https://gitlab.mister-muffin.de/josch/img2pdf|img2pdf]], burst apart with ''pdfimages -j …'' from [[https://poppler.freedesktop.org/|Poppler]]. | * alternative to [[https://en.wikipedia.org/wiki/Comic_book_archive|cbz]] format that's readable as a PDF yet retains JPEG files intact: collate with [[https://gitlab.mister-muffin.de/josch/img2pdf|img2pdf]], burst apart with ''pdfimages -j …'' from [[https://poppler.freedesktop.org/|Poppler]]. | ||
Line 26: | Line 29: | ||
* PDF/A archival format: [[https://ghostscript.com/doc/current/Ps2pdf.htm#PDFA|Creating a PDF/A document from PostScript in ghostscript]], [[https://stackoverflow.com/questions/1659147/how-to-use-ghostscript-to-convert-pdf-to-pdf-a-or-pdf-x/9343820#9343820|converting a PDF to PDF/A]]: ''gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf''. If you need to work out if the document is colour or not, ghostscript's [[https://ghostscript.com/doc/current/Devices.htm|inkcov]] device can tell you if colour was used at all. | * PDF/A archival format: [[https://ghostscript.com/doc/current/Ps2pdf.htm#PDFA|Creating a PDF/A document from PostScript in ghostscript]], [[https://stackoverflow.com/questions/1659147/how-to-use-ghostscript-to-convert-pdf-to-pdf-a-or-pdf-x/9343820#9343820|converting a PDF to PDF/A]]: ''gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf''. If you need to work out if the document is colour or not, ghostscript's [[https://ghostscript.com/doc/current/Devices.htm|inkcov]] device can tell you if colour was used at all. | ||
* Cryptographic signing: Rough process here, using an X.509 certificate issued by ARRL: [[http://scruss.com/blog/2011/10/09/creating-secure-digital-qsl-cards-with-your-lotw-certificate/|Creating secure digital QSL cards with your LoTW certificate]]. I wonder if you could use P12 certs issued by [[https://letsencrypt.org/|Let’s Encrypt]]? | * Cryptographic signing: Rough process here, using an X.509 certificate issued by ARRL: [[http://scruss.com/blog/2011/10/09/creating-secure-digital-qsl-cards-with-your-lotw-certificate/|Creating secure digital QSL cards with your LoTW certificate]]. I wonder if you could use P12 certs issued by [[https://letsencrypt.org/|Let’s Encrypt]]? | ||
- | |||
- | * [[http://scantailor.org/|Scan Tailor]] does a decent job of slicing up books. | ||
* other OCR software: [[https://launchpad.net/cuneiform-linux|Cuneiform]] used to be a contender and may still be useful. Not so sure about Gnu ocrad, gocr/jocr, … | * other OCR software: [[https://launchpad.net/cuneiform-linux|Cuneiform]] used to be a contender and may still be useful. Not so sure about Gnu ocrad, gocr/jocr, … | ||
Line 36: | Line 37: | ||
* JPEG 2000: The least painful way I found of making JPEG 2000 files from colour images was with ImageMagick/GraphicMagick's **convert** command: ''convert -define 'jp2:rate=0.008' in.png out.jp2''. More (possibly dated) ways of mucking about with JP2s are here: [[http://scruss.com/blog/2014/06/16/jpeg-2000-on-ubuntu-without-anyone-getting-stabbed/|JPEG 2000 on Ubuntu — without anyone getting stabbed]] | * JPEG 2000: The least painful way I found of making JPEG 2000 files from colour images was with ImageMagick/GraphicMagick's **convert** command: ''convert -define 'jp2:rate=0.008' in.png out.jp2''. More (possibly dated) ways of mucking about with JP2s are here: [[http://scruss.com/blog/2014/06/16/jpeg-2000-on-ubuntu-without-anyone-getting-stabbed/|JPEG 2000 on Ubuntu — without anyone getting stabbed]] | ||
+ | |||
+ | ==== Post meeting observations ==== | ||
+ | |||
+ | * Chris B. suggested [[https://github.com/oniony/TMSU|oniony/TMSU: TMSU lets you tags your files and then access them through a nifty virtual filesystem from any other application.]] | ||
+ | * [[http://weasyprint.org/|WeasyPrint]] is the rather clever Python-based web→PDF printer application I mentioned | ||
+ | * If you're running a Gnome-based system and the [[https://wiki.gnome.org/Projects/Tracker/|Tracker]] desktop search engine is running amuck, it's probably getting stuck indexing a particular file. I posted a workaround here: [[https://askubuntu.com/questions/914581/no-progress-updates-from-gnome-tracker/914602#914602|no progress updates from gnome tracker - Ask Ubuntu]] | ||
===== Meta ===== | ===== Meta ===== |