Install Tesseract OCR on AlmaLinux 9: Best OCR Library
In this guide, we aim to show you how to Install Tesseract OCR on AlmaLinux 9. Tesseract – is an optical character recognition engine with open-source code, making it a highly regarded and widely used OCR library. OCR leverages artificial intelligence to search for text and recognize it within images.
Tesseract functions by identifying patterns in pixels, representing letters, words, and sentences. It employs a two-stage adaptive recognition process. The initial stage focuses on character recognition, followed by a second stage that refines the results by considering the context of words and sentences to improve accuracy.
Follow the steps below on the Orcacore website to Install Tesseract OCR on AlmaLinux 9.
Before proceeding, ensure you are logged in to your AlmaLinux 9 server as a non-root user with sudo privileges. If you haven’t already, you can follow our guide on Initial Server Setup with AlmaLinux 9 to configure this.

1. Tesseract OCR Setup on AlmaLinux 9
This section will guide you through installing Tesseract on AlmaLinux 9 from the source code.
First, update your local package index:
sudo dnf update -y
Install required packages and Dependencies
Install the necessary packages for building the Tesseract OCR Library on AlmaLinux 9:
sudo dnf install git automake make autoconf libtool clang gcc-c++.x86_64 wget -y
Install the leptonica dependencies:
sudo dnf install zlib zlib-devel libjpeg libjpeg-devel libwebp libwebp-devel libtiff libtiff-devel libpng libpng-devel -y
Move the executables to your path:
# cd /usr/local/lib
# sudo cp /usr/lib64/libjpeg.so.62 .
# sudo cp /usr/lib64/libwebp.so.7 .
# sudo cp /usr/lib64/libtiff.so.5 .
# sudo cp /usr/lib64/libpng16.so.16 .
Clone Leptonica From GitHub

Clone leptonica from git:
# cd ~
# git clone https://github.com/DanBloomberg/leptonica.git --depth 1
Switch to your Leptonica directory:
cd leptonica
Compile and Build Leptonica
Compile leptonica:
# ./autogen.sh
# ./configure --prefix=/usr/local --disable-shared --enable-static --with-zlib --with-jpeg --with-libwebp --with-libtiff --with-libpng --disable-dependency-tracking
# sudo make
# sudo make install
# sudo ldconfig
Download Tesseract OCR on AlmaLinux 9
After completing the Leptonica installation, download the latest version of Tesseract OCR on AlmaLinux 9 from GitHub.
# cd ~
# VER=$(curl -s https://api.github.com/repos/tesseract-ocr/tesseract/releases/latest|grep tag_name | cut -d '"' -f 4)
# wget https://github.com/tesseract-ocr/tesseract/archive/refs/tags/$VER.tar.gz -O tesseract-5.tar.gz
Extract the downloaded file:
tar zxvf tesseract-5.tar.gz
Switch to your Tesseract directory on AlmaLinux 9:
cd tesseract-*/
Compile Tesseract OCR
Compile Tesseract OCR:
# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
# ./autogen.sh
# ./configure --prefix=/usr/local --disable-shared --enable-static --with-extra-libraries=/usr/local/lib/ --with-extra-includes=/usr/local/lib/
Build and Install Tesseract OCR
Build and install Tesseract on AlmaLinux 9:
# sudo make
# sudo make install
# sudo ldconfig
After the installation, load Tesseract languages.
Load Tesseract Languages
Create a language path:
mkdir -p /tess/traineddata
Export the Tesseract path by adding the following line to ~/.bashrc
:
export TESSDATA_PREFIX=/home/$USER/tess/traineddata
Note: Replace $USER
with the actual username.
Source the profile:
source ~/.bashrc
Add any trained data available on Github tessdata to the path.
# cd $TESSDATA_PREFIX
# wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
# wget https://github.com/tesseract-ocr/tessdata/raw/main/fra.traineddata
Now, let’s see how to use Tesseract OCR after you Install Tesseract OCR on AlmaLinux 9.
2. How To Use Tesseract OCR on AmaLinux 9?
Now that Tesseract OCR has been installed on AlmaLinux 9, you can extract text from scanned documents or images.
To convert an image to a text file, use the following syntax:
tesseract <image_name> <output file_name>
For example:
tesseract image.png new
This will create a text file named new
containing the extracted text from image.png
.
Specify the language using the -l
flag. For example, to use Czech:
tesseract image.png new -l ces
Multiple languages can also be specified:
tesseract image.png new -l ces+eng
Conclusion
This guide has walked you through the process to Install Tesseract OCR on AlmaLinux 9. Tesseract OCR allows you to extract text from images and documents lacking a text layer, converting them into searchable text files, PDFs, or other popular formats.
Hope you enjoy it. You may also be interested in these articles:
Install and Secure Wekan Server on AlmaLinux 9
How To Set up Redis on Rocky Linux 9
Install phpMyAdmin on AlmaLinux 9
AlmaLinux 10.0 Beta and Kitten 10 Now Available
Check Linux security update on AlmaLinux 9
Install VirtualBox 7.0 in AlmaLinux 9
Alternative Installation Methods for Tesseract OCR on AlmaLinux 9
While the previous method detailed compiling Tesseract OCR from source, which provides greater control over the build process, there are alternative, often simpler, methods for installing Tesseract OCR on AlmaLinux 9. These methods involve using package managers like dnf
, or containerization using Docker.
1. Using the DNF Package Manager
AlmaLinux 9, being a derivative of RHEL, benefits from a robust package management system via dnf
. If Tesseract and its dependencies are available in the standard or enabled repositories, installation becomes significantly easier.
Explanation:
This method leverages pre-built packages, eliminating the need for manual compilation. The package manager handles dependency resolution, ensuring all required libraries are installed correctly. However, the version of Tesseract available through dnf
might not always be the latest.
Code Example:
First, search for Tesseract packages to confirm availability:
sudo dnf search tesseract
If Tesseract is found, install it using:
sudo dnf install tesseract tesseract-langpack-eng
The tesseract-langpack-eng
package provides the English language data. Install other language packs as needed.
After installation, verify the installation:
tesseract --version
This method provides a quick and easy way to Install Tesseract OCR on AlmaLinux 9.
2. Using Docker Containerization
Docker provides a way to encapsulate Tesseract OCR and its dependencies within a container, ensuring consistent operation across different environments. This approach avoids modifying the host system and simplifies deployment.
Explanation:
Docker images contain everything needed to run an application: code, runtime, system tools, system libraries, and settings. Using a pre-built Tesseract OCR Docker image or creating your own offers portability and reproducibility. This is particularly useful if you need a specific version of Tesseract or have complex dependency requirements.
Code Example:
First, ensure Docker is installed and running on your AlmaLinux 9 system. Then, pull a pre-built Tesseract OCR image from Docker Hub. A popular image is jbarratt/tesseract
.
docker pull jbarratt/tesseract
To run Tesseract OCR on an image (e.g., image.png
) within the container, mount the directory containing the image to the container and specify the output directory:
docker run --rm -v /path/to/your/images:/data jbarratt/tesseract /data/image.png /data/output
Replace /path/to/your/images
with the actual path to the directory containing your image. The output text file (output.txt
) will be created in the same directory.
To specify a language, you can use the -l
flag:
docker run --rm -v /path/to/your/images:/data jbarratt/tesseract /data/image.png /data/output -l eng
These two methods offer alternative approaches to Install Tesseract OCR on AlmaLinux 9 that can be more convenient than compiling from source, depending on your specific needs and environment.