davidphilliposter

Steve Jasik wrote Mon, 9 May 2022 21:50:10 -0700:

> Live text enables you to take a picture of text and extract the text to use it in an email,

> password, ...

> https://www.nytimes.com/wirecutter/blog/practical-utility-of-iphones-live-text/

I use the Mac app Simple Comic to read graphic novels, comic books in .cbz file format. I looked up the most active fork of it on github.com, forked it, and added Live Text to it. Today, the maintainer took my pull request so it is now part of the main code base.

my fork: https://github.com/DavidPhillipOster/Simple-Comic

the official code base: https://github.com/MaddTheSane/Simple-Comic

Writing the actual text extraction was the easy part, thanks to Apple's Vision framework. https://developer.apple.com/documentation/vision/recognizing_text_in_images is a nice one-page write-up on it.

The tricky part was showing the selected regions within the image (you get "lines of text" which might not align with the axes of the coordinate system), and worst of all, tracking the mouse to animate the selection as you sweep out a rectangle.

Simple Comic supports side-by-side 2-page spreads, with page rotations, but I got my OCR integration going with all of that.

One nice side-effect: If a PDF is just a wrapper for images, Apple's Preview disables Live Text, but such PDFs open nicely in Simple Comic, and the OCR additions work on those just fine.

Currently, having selected text, you can copy it to the clipboard, or use the Edit menu to have it Text-to-speech the selection. (which could be useful for someone who wasn't fully comfortable reading English.)

In addition to Amazon, https://www.humblebundle.com/books is a cheap source of graphic novels.

Next steps:

Apple supports recognition in en-US, fr-FR, it-IT, de-DE, es-ES, pt-BR, zh-Hans, zh-Hant but I currently have no U.I. to choose the source language.

Now that Simple Comic can OCR, it needs a Find command (I'm thinking like Terminal's, with next and previous buttons)

Once it has a Find command, it needs a Spotlight importer to integrate it in with Apple's desktop search.

More speculative: Apple has sample code for training an on-device machine learning model to recognize page layouts, to be able to group the recognition lines into panels and maybe group further into word balloons to give a smarter grouping of the recognized lines.

on Mac, if you select some text and Control-Click to get the contextual menu in Safari or Text Edit, you get the option to Look Up, or Translate the selection. I have not found how to get access to those menu items (I tried creating an NSTextView and grabbing its contextual menu and popping that up on my own selection, and while Copy and Speak worked that way, Look Up and Translate were missing.) Please reply to me personally, not the list, if you know how to fix this.

I'm morally convinced that I haven't broken Simple Comic for old machines, but when I try to test on macOS High Sierra (10.13) the app can't open _any_ documents because it's failing with an inconsistency initializing its Core Data model (colliding ImageGroups). I have to debug this in lldb from the command line because the Xcode and macOS SDK are too new to compile on that machine. Please reply to me personally, not the list, if you know how to debug and fix this.

Live Text is a delightful surprise..