Go to TogaWare.com Home Page. GNU/Linux Desktop Survival Guide
by Graham Williams
Duck Duck Go

MLHub Pipelines

A general mlhub philosophy is that command output is a well defined text format (generally using CSV - i.e., commas to separate fields) that is consistent so that follow on processes within a pipeline (or even loading into Excel) can format or otherwise utilise the output very easily. The mlhub commands focus on their specific task, not solving all problems. So we can leave extra formatting to specialist other tools:

$ ml ocr azcv handwriting.jpg

103.0 28.0 1982.0 52.0 1980.0 141.0 101.0 116.0,My cats name is freckles . She like's to cl...
65.0 184.0 2051.0 207.0 2049.0 298.0 63.0 274.0,high. She is 2 years old. She likes to play...

$ ml ocr azcv handwriting.jpg | sed 's/,/    /'

103.0 28.0 1982.0 52.0 1980.0 141.0 101.0 116.0    My cats name is freckles . She like's to...
65.0 184.0 2051.0 207.0 2049.0 298.0 63.0 274.0    high. She is 2 years old. She likes to p...

If you do not care for the bounding boxes then simply remove them:

$ ml ocr azcv handwriting.jpg | cut -d, -f2-

My cats name is freckles . She like's to climb up 
high. She is 2 years old. She likes to play a lot of games.


Support further development by purchasing the PDF version of the book.
Other online resources include the Data Science Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.