What is the WeChat OCR API?

The WeChat OCR API is a Dockerized REST API service that wraps the WeChat OCR functionality, allowing you to perform optical character recognition on images using WeChat's OCR capabilities.

What are the key features of this API?

Key features include Dockerized deployment, a RESTful interface, Base64 image support, and JSON response format.

What image formats does the API support?

Currently, the API primarily supports PNG images.

How do I deploy the WeChat OCR API?

You can deploy it using Docker with simple commands: `docker pull golangboyme/wxocr` and `docker run -d -p 5000:5000 --name wechat-ocr-api golangboyme/wxocr`.

How do I use the API?

Send a POST request to `/ocr` with a JSON payload containing your base64-encoded image data.

What kind of response does the API return?

The API returns a JSON response containing the extracted text, bounding box coordinates, and confidence scores.

Is this API free to use?

The API itself is open-source and free to use, but it is intended for learning and communication purposes only and should not be used for commercial activities.

Can I use this API for commercial purposes?

No, this API is not intended for commercial activities. It is for learning and communication purposes only.

What are the limitations of this API?

Limitations include support primarily for PNG images and dependence on WeChat's OCR binaries, which may be updated by WeChat.

Where can I find the Python client example?

The Python client example is provided in the documentation and allows you to easily send images to the API and retrieve results.

What is UBOS and how does this API fit into the UBOS platform?

UBOS is a full-stack AI Agent Development Platform. The WeChat OCR API enhances UBOS by providing powerful OCR capabilities to AI Agents.

Can I contribute to this project?

Yes, contributions are welcome! You can submit a Pull Request with your improvements.

What should I do if I believe this project infringes upon my copyright or intellectual property rights?

Please contact the repository owner immediately, and the repository will be promptly removed.

UBOS Asset Marketplace: WeChat OCR API - Empowering AI with Accessible OCR

In the rapidly evolving landscape of Artificial Intelligence and Machine Learning, Optical Character Recognition (OCR) plays a pivotal role. The ability to convert images of text into machine-readable text is fundamental to a wide range of applications, from automating data entry to enhancing document accessibility. Recognizing this critical need, UBOS presents the WeChat OCR API, a powerful and easily deployable solution available on the UBOS Asset Marketplace, designed to democratize access to advanced OCR technology.

The Power of WeChat OCR, Now Accessible

The WeChat OCR API harnesses the robust OCR capabilities developed by WeChat, a leading global communication platform. WeChat’s OCR engine is renowned for its accuracy and efficiency, making it a preferred choice for various text recognition tasks. This API encapsulates that power within a simple, Dockerized REST service, enabling developers and businesses to seamlessly integrate WeChat’s OCR functionality into their own applications.

This project is built upon the excellent work of the open-source project wechat-ocr, which reverse-engineered and created a usable interface for WeChat’s OCR functionality. This API essentially containerizes that work, making it even easier to deploy and use.

Key Features

Dockerized Deployment: The API is packaged as a Docker container, ensuring consistent performance and effortless deployment across different environments.
RESTful Interface: A simple and intuitive REST API allows for easy integration with various programming languages and platforms.
Base64 Image Support: Accepts images encoded in Base64 format, simplifying the process of sending image data to the API.
JSON Response: Returns OCR results in a structured JSON format, facilitating easy parsing and utilization of the extracted text.

Use Cases

The WeChat OCR API opens up a plethora of possibilities across diverse industries. Here are some compelling use cases:

Automated Data Entry: Streamline data entry processes by automatically extracting text from scanned documents, invoices, and forms.
Mobile Applications: Integrate OCR capabilities into mobile apps to enable users to quickly capture and digitize text from images.
Accessibility: Enhance accessibility for visually impaired users by converting images of text into readable formats.
Document Management: Improve document management systems by automatically indexing and categorizing documents based on their textual content.
Robotic Process Automation (RPA): Automate tasks that involve extracting information from images, such as processing insurance claims or verifying identity documents.
AI-Powered Chatbots: Use OCR to understand images sent to chatbots, enabling more intelligent and context-aware interactions.
Content Moderation: Identify and flag inappropriate content in images by extracting and analyzing the text they contain.
Inventory Management: Quickly scan and record product information from images of labels and packaging.

Technical Deep Dive

The API functions as a Flask application, providing a RESTful endpoint for processing OCR requests. When an image is submitted:

The base64-encoded image is decoded.
A temporary file is created to store the image.
The WeChat OCR engine processes the image via the wcocr Python binding.
The OCR results are structured into a JSON format.
The JSON response, containing extracted text and bounding box information, is returned to the user.
Temporary files are cleaned up to maintain system efficiency.

Getting Started

Integrating the WeChat OCR API into your project is remarkably straightforward:

Pull the Docker image:
bash docker pull golangboyme/wxocr
or
bash docker pull programnotes/ocr:latest
Run the Docker container:
bash docker run -d -p 5000:5000 --name wechat-ocr-api golangboyme/wxocr
Send a POST request to /ocr:
bash curl -X POST http://localhost:5000/ocr
-H “Content-Type: application/json”
-d ‘{“image”: “BASE64_ENCODED_IMAGE_DATA”}’
Replace BASE64_ENCODED_IMAGE_DATA with the base64 representation of your image.

Example Response

{ “errcode”: 0, “height”: 72, “width”: 410, “imgpath”: “temp/5726fe7b-25d6-43a6-a50d-35b5f668fbb6.png”, “ocr_response”: [ { “text”: “aacss”, “left”: 80.63632202148438, “top”: 29.634929656982422, “right”: 236.47093200683594, “bottom”: 55.28932189941406, “rate”: 0.9997046589851379 }, { “text”: “xxzsa”, “left”: 312.625, “top”: 30.75, “right”: 395.265625, “bottom”: 55.09375, “rate”: 0.997739315032959 } ] }

The response includes the extracted text, bounding box coordinates, and confidence scores for each recognized word or phrase.

Python Client

python import requests import base64 import os

def ocr_recognize(image_path=None, image_url=None, api_url=“http://localhost:5000/ocr”): “”" Send an image to the OCR API service and get the recognition results. Use either image_path or image_url (one is required). “”" # Get image data if image_path: if not os.path.exists(image_path): print(f"Error: Local image not found: {image_path}“) return with open(image_path, “rb”) as image_file: img_data = image_file.read() elif image_url: try: response = requests.get(image_url) response.raise_for_status() img_data = response.content except Exception as e: print(f"Failed to download image: {str(e)}”) return else: print(“Please provide either image_path or image_url”) return

# Convert image to base64
base64_image = base64.b64encode(img_data).decode('utf-8')

# Send request to API
try:
    response = requests.post(api_url, json={"image": base64_image})
    response.raise_for_status()
    return response.json()
except Exception as e:
    print(f"API request failed: {str(e)}")
    return None

Example usage

if name == “main”: # Local image example result = ocr_recognize(image_path=“ocrtest.png”) if result: print(result)

# URL image example (uncomment to use)
# result = ocr_recognize(image_url="https://example.com/image.png")

Limitations and Considerations

Image Format: Currently, the API primarily supports PNG images. Support for other formats can be extended based on demand.
WeChat Dependency: The API relies on WeChat’s OCR binaries, which are subject to updates by WeChat. While the UBOS team strives to maintain compatibility, occasional updates may be required.
Ethical Use: It is crucial to use this technology responsibly and ethically, respecting privacy and intellectual property rights. The creators of this project disclaim any liability for misuse of the API.

Why Choose the WeChat OCR API on UBOS?

The UBOS platform offers a comprehensive ecosystem for AI Agent development, and the WeChat OCR API perfectly complements this vision. By providing access to powerful OCR capabilities, UBOS empowers developers to create more intelligent and versatile AI Agents that can interact with the visual world.

UBOS: Your Partner in AI Agent Development

UBOS is a full-stack AI Agent Development Platform that focuses on bringing AI Agents to every business department. Our platform helps you:

Orchestrate AI Agents: Design and manage complex workflows involving multiple AI Agents.
Connect to Enterprise Data: Seamlessly integrate AI Agents with your existing data sources.
Build Custom AI Agents: Create bespoke AI Agents tailored to your specific needs, leveraging your own LLM models.
Develop Multi-Agent Systems: Build sophisticated AI solutions that involve collaboration between multiple AI Agents.

By choosing the WeChat OCR API on UBOS, you gain access to a powerful OCR solution and a comprehensive platform for building the next generation of AI Agents. Join the UBOS community today and unlock the potential of AI-powered automation!

Embracing the Future of OCR

The WeChat OCR API on the UBOS Asset Marketplace represents a significant step towards democratizing access to cutting-edge OCR technology. By simplifying deployment and integration, this API empowers developers and businesses to leverage the power of WeChat’s OCR engine in a wide range of applications.

As AI continues to evolve, the ability to understand and interact with the visual world will become increasingly critical. The WeChat OCR API, combined with the UBOS platform, provides the tools and resources you need to stay ahead of the curve and build innovative AI solutions that transform your business.

UBOS Asset Marketplace: WeChat OCR API - Empowering AI with Accessible OCR