|Imgclip on GitHub|
These days I had to extract the written content from a series of images into a doc file, a task that is basically quite simple and boring. At first I was thinking, “Really, how hard can you be?” After I found that there are over 120 images and that to extract the written content of each I have to lose a few minutes, I started to look for alternatives.
After several unsuccessful methods (among which the most promising one was the conversion of .jpeg images into .PDF files using a simple converting tool I made in C # some time ago, then applying several string-selection and Clipboard control techniques to paste the extracted content in a .doc file) has proven to be as inefficient as the other methods I have tried in terms of accuracy and processing time, I I decided to look for a better solution on the internet.
Trying to avoid commercial software solutions and inefficient / inadequate requirements, I finally found GitHub a great little tool that does exactly what I need.
It is called ImgClip and it is a Command line utility that extracts text from an image into the system clipboard. It uses the tesseract.js library for automating image processing and pulling text . It’s one of the most powerful OCR libraries to date, and it’s completely open source just like Imgclip.
It simply takes an argument for the image file along with the language, then returns the text copied to your clipboard.
You can install the whole library via npm and it’s a pretty simple install. The thing is only a few KB large and it comes with a JS file that you can browse on GitHub. I think I’m most impressed with the quality and speed of this tool. It really does work to pull accurate text and it’s one of the simplest tools to use.