What is Hugging Face Hub?
The HuggingFace Hub is a platform that allows developers to store and share code, as well as collaborate on Machine Learning projects. It hosts Git-based repositories, a type of version-controlled storage where developers can keep all of their project files. Developers can upload and access cutting-edge models for natural language processing, computer vision, and audio tasks on the Hub. It also provides a variety of datasets for various domains and modalities. Finally, developers can look into interactive apps that show off ML models directly in their browsers.
To know more about Hugging Face Hub, check out the document.
What is Hugging Face space?
Spaces is a Hub platform that allows developers to quickly create and showcase ML demo apps. It is compatible with two Python Software Development Kits (SDK), namely Gradio and Streamlit, these are two tools that make it simple to create apps in a short period of time. Furthermore, users have the ability to create static Spaces, which are HTML, CSS, and JavaScript web pages hosted inside a Space. Visit the Spaces documentation if you want to find out more about Spaces and how to create your own. You can also upgrade your Space to run on GPU or other accelerated hardware.
Let’s, get a quick idea about Optical Character Recognition(OCR).
Optical Character Recognition
Optical Character Recognition (OCR) is a deep learning method for recognizing text from images such as scanned documents and photos. It analyses an image and extracts text from it using a convolutional neural network. After that, the extracted text is fed into an OCR engine that has been trained to recognize words and characters. The OCR engine’s output is then used to generate a text version of the original image. To automate data entry and document management processes, OCR is commonly used to extract text from images.
There are many libraries and techniques for OCR. Here we are going to implement 3 types of OCR techniques PaddleOCR, KerasOCR, and EasyOCR for text recognition.
In this tutorial, we will see how we can host our OCR app on Hugging face spaces. For that, first you need to create a repository on Hugging Face space as shown below steps.
Steps for creating a repository on Hugging Face (🤗) space:
Step 1: Create an account on 🤗 Hub, and create a new space. Go to the Files and versions. You will see a generated README.md file for the project.
Step 2: Currently, we have set the below metadata as shown in the image in our README.md file. You can replace the metadata values as per your requirements and save them. You can check more about metadata configuration reference at 🤗 space configuration.
Step 3: Now you can create new files or upload the project files from your local system as shown below. You need to add all the required libraries in the requirement.txt file, 🤗 server will automatically download all the libraries. Another way to upload the entire project is using huggingface_hub, for this make sure you are login to 🤗 from your system. Then you can follow huggingface_hub steps to upload your local folder on 🤗 space.
Step 4: Now let’s start with the code, we will write our code in the app.py file.
Let’s start our code implementation
1. Import all libraries
import os
import cv2
import json
import easyocr
import datasets
import socket
import requests
import keras_ocr
import numpy as np
import gradio as gr
import pandas as pd
import tensorflow as tf
import re as r
from PIL import Image
from datasets import Image
from datetime import datetime
from paddleocr import PaddleOCR
from urllib.request import urlopen
from huggingface_hub import Repository, upload_file
2. We have written OCR generation functions separately for all 3 methods.
Code for Paddle OCR:
"""
Paddle OCR
"""
def ocr_with_paddle(img):
finaltext = ''
ocr = PaddleOCR(lang='en', use_angle_cls=True)
# img_path = 'exp.jpeg'
result = ocr.ocr(img)
for i in range(len(result[0])):
text = result[0][i][1][0]
finaltext += ' '+ text
return finaltext
Code for Keras OCR:
"""
Keras OCR
"""
def ocr_with_keras(img):
output_text = ''
pipeline=keras_ocr.pipeline.Pipeline()
images=[keras_ocr.tools.read(img)]
predictions=pipeline.recognize(images)
first=predictions[0]
for text,box in first:
output_text += ' '+ text
return output_text
Code for Easy OCR:
"""
easy OCR
"""
# gray scale image
def get_grayscale(image):
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Thresholding or Binarization
def thresholding(src):
return cv2.threshold(src,127,255, cv2.THRESH_TOZERO)[1]
def ocr_with_easy(img):
gray_scale_image=get_grayscale(img)
thresholding(gray_scale_image)
cv2.imwrite('image.png',gray_scale_image)
reader = easyocr.Reader(['th','en'])
bounds = reader.readtext('image.png',paragraph="False",detail = 0)
bounds = ''.join(bounds)
return bounds
3. Created a common function for all OCR methods that will take input as images and return generated text from the input image.
"""
Generate OCR
"""
def generate_ocr(Method,input_image):
text_output = ''
if (input_image).any():
print("Method___________________",Method)
if Method == 'EasyOCR':
text_output = ocr_with_easy(input_image)
if Method == 'KerasOCR':
text_output = ocr_with_keras(input_image)
if Method == 'PaddleOCR':
text_output = ocr_with_paddle(input_image)
flag(Method,input_image,text_output,ip_address,location)
return text_output
else:
raise gr.Error("Please upload an image!!!!")
4. After all these functions, let’s use the gradio app to integrate our code with the user interface.
Gradio
Gradio is a useful tool for developers because it allows them to quickly and easily build interactive user interfaces for their machine-learning models. This can be especially useful for demonstrating the capabilities of a model to others, or for gathering user feedback on a model’s performance. Additionally, because Gradio uses Jupyter notebooks, developers can easily share their work with others, making it a great tool for collaboration. If you want to learn more about the Gradio app, please follow this link.
Basically, We can launch Gradio Demo in 2 ways, using gr.blocks
and gr.interface.
There are three main parameters in Gradio:
1. Function: a process that handles the user interface’s primary function
2. Input: the type of input component
3. Output: the type of output component
The final section of the code involves launching the interface. It is made up of various components such as function, inputs, outputs, title, description, and more. This link contains all the interface components.
image = gr.Image(shape=(300, 300))
method = gr.Radio(["PaddleOCR","EasyOCR", "KerasOCR"],value="PaddleOCR",elem_id="radio_div")
output = gr.Textbox(label="Output",elem_id="opbox")
demo = gr.Interface(
generate_ocr,
[method,image],
output,
title="Optical Character Recognition",
css=".gradio-container {background-color: #C0E1F2} #radio_div {background-color: #ADA5EC; font-size: 40px;} #btn {background-color: #94D68B; font-size: 20px;} #opbox {background-color: #ADA5EC;}",
article="""Feel free to give us your feedback and contact us at
letstalk@pragnakalp.com And don't forget to check out more interesting
NLP services we are offering.
Developed by : Pragnakalp Techlabs
"""
)
demo.launch()
Saving data and logs on the Hugging Face Hub datasets
After creating your application. If you want to log the user input and the results, then you can follow the below steps. Here, we have used Hugging Face datasets to store the logs.
Step 1: To save/store logs or data, create a new dataset on 🤗 Datasets. You can refer Datasets doc for detailed information.
Step 2: To make the connection with your dataset, follow the below code snippet.
HF_TOKEN = os.environ.get("HF_TOKEN")
DATASET_NAME = "OCR-img-to-text"
DATASET_REPO_URL = f"https://huggingface.co/datasets/pragnakalp/{DATASET_NAME}"
HF_TOKEN = os.environ.get("HF_TOKEN")
DATASET_REPO_ID = "pragnakalp/OCR-img-to-text"
print("is none?", HF_TOKEN is None)
REPOSITORY_DIR = "data"
LOCAL_DIR = 'data_local'
os.makedirs(LOCAL_DIR,exist_ok=True)
repo = Repository(
local_dir="ocr_data", clone_from=DATASET_REPO_URL, use_auth_token=HF_TOKEN
)
repo.git_pull()
Here, HF_TOKEN is known as User access tokens of 🤗, which are the most commonly used method of authenticating an application or notebook to 🤗 services. Note: While saving your token, keep the role in “write” mode. After generating the access token, copy it and save it to your space’s setting → Repository secrets, keeping the name as “HF_TOKEN”.
DATASET_REPO_ID will be your path to the dataset.
REPOSITORY_DIR will be your folder name to save data.
Step 3: Write a function for saving data.
"""
Save generated details
"""
def dump_json(thing,file):
with open(file,'w+',encoding="utf8") as f:
json.dump(thing,f)
def flag(Method,input_image,text_output,ip_address,location):
try:
print("saving data------------------------")
adversarial_number = 0
adversarial_number = 0 if None else adversarial_number
metadata_name = datetime.now().strftime('%Y-%m-%d %H-%M-%S')
SAVE_FILE_DIR = os.path.join(LOCAL_DIR,metadata_name)
os.makedirs(SAVE_FILE_DIR,exist_ok=True)
image_output_filename = os.path.join(SAVE_FILE_DIR,'image.png')
try:
Image.fromarray(input_image).save(image_output_filename)
except Exception:
raise Exception(f"Had issues saving PIL image to file")
# Write metadata.json to file
json_file_path = os.path.join(SAVE_FILE_DIR,'metadata.jsonl')
metadata= {'id':metadata_name,'method':Method,
'File_name':'image.png','generated_text':text_output,
'ip_address': ip_address,'loc': location}
dump_json(metadata,json_file_path)
# Simply upload the image file and metadata using the hub's
upload_file
# Upload the image
repo_image_path = os.path.join(REPOSITORY_DIR,os.path.join
(metadata_name,'image.png'))
_ = upload_file(path_or_fileobj = image_output_filename,
path_in_repo =repo_image_path,
repo_id=DATASET_REPO_ID,
repo_type='dataset',
token=HF_TOKEN
)
# Upload the metadata
repo_json_path = os.path.join(REPOSITORY_DIR,os.path.join
(metadata_name,'metadata.jsonl'))
_ = upload_file(path_or_fileobj = json_file_path,
path_in_repo =repo_json_path,
repo_id= DATASET_REPO_ID,
repo_type='dataset',
token=HF_TOKEN
)
adversarial_number+=1
repo.git_pull()
return "*****Logs save successfully!!!!"
except Exception as e:
return "Error whils saving logs -->"+ str(e)
You can see the log dataset preview in the below image.
Try it on Hugging Face Spaces
We have setup the code and demo of this OCR on Hugging Face Spaces, you can try your own images and check the output. Also, you can easily setup the code in your system using the space.