Gen lines text image for ocr or pdf-parse
Are you still struggling with the lack of data for OCR recognition and PDF parsing?
This tool is designed to generate multiple lines text images for OCR and PDF parsing.
Support multiple themes, multiple code formats, and multiple languages
see demo
1. see https://wkhtmltopdf.org/ install it
2. pip install pygments imgkit pillowfrom PIL import Image
from anygenocrdata import AnyGenOCRData
model = AnyGenOCRData()
## gen chinese text
text1 = f"""
欢迎来到 gpt-oss 系列,OpenAI 的开放权重模型 旨在提供强大的推理能力、代理任务和多样的开发者使用场景。
我们发布了这两种开放模型:
gpt-oss-120b — 适用于生产环境、通用用途和高推理需求的场景,可以放入单个 80GB GPU(如 NVIDIA H100 或 AMD MI300X)中(117B 参数,5.1B 活动参数)
gpt-oss-20b — 适用于低延迟和本地或特定用途的场景(21B 参数,3.6B 活动参数)
这两种模型都经过了我们的 和谐响应格式 训练,仅应与和谐格式一起使用,否则将无法正常工作。
"""
model.invoke(
content = text1,
htmlfile = './assets/1.html',
imgfile = './assets/1.png',
file_suffix = 'txt'
)
Image.open('./assets/1.png')
## gen english text
text2 = f"""
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
We’re releasing two flavors of these open models:
gpt-oss-120b — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)
gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise.
"""
model.invoke(
content = text2,
htmlfile = './assets/2.html',
imgfile = './assets/2.png',
theme = None,
file_suffix = 'txt'
)
Image.open('./assets/2.png')
## gen code
text3 = f"""
CREATE TABLE Beds (State VARCHAR(50), Beds INT);
INSERT INTO Beds (State, Beds) VALUES ('California', 100000), ('Texas', 85000), ('New York', 70000);
"""
model.invoke(
content = text3,
htmlfile = './assets/3.html',
imgfile = './assets/3.png',
theme = None,
file_suffix = 'sql'
)
Image.open('./assets/3.png')

