tesseract-ocr

差分

このページの2つのバージョン間の差分を表示します。

この比較画面へのリンク

--- tesseract-ocr [2020/01/06 15:16] – nabezo
+++ tesseract-ocr [2020/12/11 17:10] (現在) – nabezo
@@ 行 1: / 行 1: @@
 ====== tesseract ocr ======
-参考
+===== 参考 =====
-Tesseract OCRで文字認識をする　https://gihyo.jp/admin/serial/01/ubuntu-recipe/0577
+Tesseract OCRで文字認識をする　https://gihyo.jp/admin/serial/01/ubuntu-recipe/0577  \\
-\\
+セグメント読み取り [[ssocr]]   \\
-セグメント読み取り [[ssocr]]
+セグメント読み取り https://github.com/adrianlazaro8/Tesseract_sevenSegmentsLetsGoDigital \\
-\\
+Pythonで書くTesseract 4の基本的な使い方。APIとCLIからOCRを実行する方法  https://valmore.work/how-to-use-tesseract4-with-python/   \\
+===== 日本語学習データ =====
+ubuntu package(精度が一番良さそう)  https://packages.ubuntu.com/focal/tesseract-ocr-jpn \\
+github  https://github.com/tesseract-ocr/tessdata_best \\
+===== インストール =====
+  sudo apt install gimageReader tesseract-ocr
+  sudo apt install tesseract-ocr-jp
+===== 使い方 =====
+tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
+  tesseract infile outfile --psm 6 -l jpn
+  tesseract infile.png stdout  -l jpn_katakana
+===== 画像を修正して認識率を上げる =====
+pdfから画像へ
+  pdfimages orgpin.pdf.pdf orgpic
+２値化
+  convert orgpic.png -threshold 9000 convpic.png
+解像度の変更
+  convert orgpic.png -resize 200% convpic.png
+===== 日本語を認識させる =====
+https://github.com/tesseract-ocr/langdata/tree/master/jpn
+apt install tesseract-ocr-jp で以下の場所に日本語データがインストールされる
+  /usr/share/tesseract-ocr/4.00/tessdata
+  /usr/local/Cellar/tesseract/4.1.0/share/tessdata
+===== 再学習 =====
+https://qiita.com/aki_abekawa/items/418e069038fbdb77c59e
+文字認識エンジンTesseract OCRで学習(jTessBoxEditor)
+http://danglingfarpointer.hatenablog.com/entry/2015/01/28/215629
+Tesseract4の再学習・追加学習手順まとめ
+http://laplace-daemon.com/training-tesseract/

tesseract-ocr.1578291367.txt.gz · 最終更新: 2020/01/06 15:16 by nabezo