Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
meeting:2017-04 [2017/04/15 12:09]
scruss [Notes]
meeting:2017-04 [2020/12/14 19:49] (current)
scruss [Moar Notes]
Line 11: Line 11:
 ===== Notes ===== ===== Notes =====
  
 +  * Video! [[https://​www.youtube.com/​watch?​v=EH_txB_hJWw|A Bit More Than Mostly Searchable: Scanned Paper You Can Find with Stewart Russell - YouTube]]
   * Slide deck: [[https://​scruss.com/​talks/​02017/​gtalug201704-ABitMoreThanMostlySearchable.odp|ABitMoreThanMostlySearchable.odp]]   * Slide deck: [[https://​scruss.com/​talks/​02017/​gtalug201704-ABitMoreThanMostlySearchable.odp|ABitMoreThanMostlySearchable.odp]]
  
   * Good-enough-for-me //scan → OCR → PDF// script: [[https://​scruss.com/​talks/​02017/​dwim-ocr.sh|dwim-ocr.sh]]. Uses ghostscript,​ tesseract, pdfbeads (Ruby) and (Gnu) parallel. The original idea came from an outline script on the [[https://​forum.diybookscanner.org/​index.php?​sid=0be98f164e262176b6689f9892aa5a64|DIY Book Scanner]] forum circa 2010.   * Good-enough-for-me //scan → OCR → PDF// script: [[https://​scruss.com/​talks/​02017/​dwim-ocr.sh|dwim-ocr.sh]]. Uses ghostscript,​ tesseract, pdfbeads (Ruby) and (Gnu) parallel. The original idea came from an outline script on the [[https://​forum.diybookscanner.org/​index.php?​sid=0be98f164e262176b6689f9892aa5a64|DIY Book Scanner]] forum circa 2010.
 +  * sample multi-page duplex document: https://​scruss.com/​talks/​02017/​EPSON012.PDF
  
 ==== Moar Notes ==== ==== Moar Notes ====
 +
 +  * [[http://​scantailor.org/​|Scan Tailor]] does a very decent job of slicing up scanned books.
 +
 +  * [[http://​gscan2pdf.sourceforge.net/​|gscan2pdf]],​ if you like graphical things, //​probably//​ does what you want with scanned images …
  
   * alternative to [[https://​en.wikipedia.org/​wiki/​Comic_book_archive|cbz]] format that's readable as a PDF yet retains JPEG files intact: collate with [[https://​gitlab.mister-muffin.de/​josch/​img2pdf|img2pdf]],​ burst apart with ''​pdfimages -j …''​ from [[https://​poppler.freedesktop.org/​|Poppler]].   * alternative to [[https://​en.wikipedia.org/​wiki/​Comic_book_archive|cbz]] format that's readable as a PDF yet retains JPEG files intact: collate with [[https://​gitlab.mister-muffin.de/​josch/​img2pdf|img2pdf]],​ burst apart with ''​pdfimages -j …''​ from [[https://​poppler.freedesktop.org/​|Poppler]].
Line 23: Line 29:
     * PDF/A archival format: [[https://​ghostscript.com/​doc/​current/​Ps2pdf.htm#​PDFA|Creating a PDF/A document from PostScript in ghostscript]],​ [[https://​stackoverflow.com/​questions/​1659147/​how-to-use-ghostscript-to-convert-pdf-to-pdf-a-or-pdf-x/​9343820#​9343820|converting a PDF to PDF/A]]: ''​gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf''​. If you need to work out if the document is colour or not, ghostscript'​s [[https://​ghostscript.com/​doc/​current/​Devices.htm|inkcov]] device can tell you if colour was used at all.     * PDF/A archival format: [[https://​ghostscript.com/​doc/​current/​Ps2pdf.htm#​PDFA|Creating a PDF/A document from PostScript in ghostscript]],​ [[https://​stackoverflow.com/​questions/​1659147/​how-to-use-ghostscript-to-convert-pdf-to-pdf-a-or-pdf-x/​9343820#​9343820|converting a PDF to PDF/A]]: ''​gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf''​. If you need to work out if the document is colour or not, ghostscript'​s [[https://​ghostscript.com/​doc/​current/​Devices.htm|inkcov]] device can tell you if colour was used at all.
     * Cryptographic signing: Rough process here, using an X.509 certificate issued by ARRL: [[http://​scruss.com/​blog/​2011/​10/​09/​creating-secure-digital-qsl-cards-with-your-lotw-certificate/​|Creating secure digital QSL cards with your LoTW certificate]]. I wonder if you could use P12 certs issued by [[https://​letsencrypt.org/​|Let’s Encrypt]]?     * Cryptographic signing: Rough process here, using an X.509 certificate issued by ARRL: [[http://​scruss.com/​blog/​2011/​10/​09/​creating-secure-digital-qsl-cards-with-your-lotw-certificate/​|Creating secure digital QSL cards with your LoTW certificate]]. I wonder if you could use P12 certs issued by [[https://​letsencrypt.org/​|Let’s Encrypt]]?
- 
-  * [[http://​scantailor.org/​|Scan Tailor]] does a decent job of slicing up books. 
  
   * other OCR software: [[https://​launchpad.net/​cuneiform-linux|Cuneiform]] used to be a contender and may still be useful. Not so sure about Gnu ocrad, gocr/jocr, …    * other OCR software: [[https://​launchpad.net/​cuneiform-linux|Cuneiform]] used to be a contender and may still be useful. Not so sure about Gnu ocrad, gocr/jocr, … 
Line 33: Line 37:
  
   * JPEG 2000: The least painful way I found of making JPEG 2000 files from colour images was with ImageMagick/​GraphicMagick'​s **convert** command: ''​convert -define '​jp2:​rate=0.008'​ in.png out.jp2''​. More (possibly dated) ways of mucking about with JP2s are here: [[http://​scruss.com/​blog/​2014/​06/​16/​jpeg-2000-on-ubuntu-without-anyone-getting-stabbed/​|JPEG 2000 on Ubuntu — without anyone getting stabbed]]   * JPEG 2000: The least painful way I found of making JPEG 2000 files from colour images was with ImageMagick/​GraphicMagick'​s **convert** command: ''​convert -define '​jp2:​rate=0.008'​ in.png out.jp2''​. More (possibly dated) ways of mucking about with JP2s are here: [[http://​scruss.com/​blog/​2014/​06/​16/​jpeg-2000-on-ubuntu-without-anyone-getting-stabbed/​|JPEG 2000 on Ubuntu — without anyone getting stabbed]]
 +
 +==== Post meeting observations ====
 +
 +  * Chris B. suggested [[https://​github.com/​oniony/​TMSU|oniony/​TMSU:​ TMSU lets you tags your files and then access them through a nifty virtual filesystem from any other application.]]
 +  * [[http://​weasyprint.org/​|WeasyPrint]] is the rather clever Python-based web→PDF printer application I mentioned
 +  * If you're running a Gnome-based system and the [[https://​wiki.gnome.org/​Projects/​Tracker/​|Tracker]] desktop search engine is running amuck, it's probably getting stuck indexing a particular file. I posted a workaround here: [[https://​askubuntu.com/​questions/​914581/​no-progress-updates-from-gnome-tracker/​914602#​914602|no progress updates from gnome tracker - Ask Ubuntu]]
 +  * [[https://​code-industry.net/​free-pdf-editor/​|Master PDF Editor for Linux]] — though the website looks dodgy af, this closed-source program seems to be quite a capable PDF editor.
  
 ===== Meta ===== ===== Meta =====
  
-  * **Dinner**: ​+  * **Dinner**: ​Doner Kebab House, 391 Yonge Street
   * **Attendance**:​ 15   * **Attendance**:​ 15
   * [[operations:​meeting:​2017-04|Ops Notes]]   * [[operations:​meeting:​2017-04|Ops Notes]]
  • meeting/2017-04.1492272560.txt.gz
  • Last modified: 7 years ago
  • by scruss