-
Notifications
You must be signed in to change notification settings - Fork 0
How It Works
Nick edited this page Dec 18, 2025
·
1 revision
Chart2CSV uses computer vision and OCR to extract data from chart images.
Image → Preprocess → Detect Axes → OCR Labels → Extract Points → Output CSV
- Convert to grayscale
- Enhance contrast
- Remove noise
- Find the chart boundaries
- Crop to the actual plot region
- Ignore titles, legends, and margins
- Find X and Y axis lines using line detection
- Determine axis positions in pixels
Two backends available:
Tesseract (offline):
- Free, runs locally
- Good for clear, high-resolution images
Mistral Vision (cloud):
- Uses AI to read text
- Better accuracy, especially for difficult fonts
- Requires API key
- Map pixel positions to actual values
- Uses detected tick labels to build the mapping
- Handles linear and logarithmic scales
Depends on chart type:
Scatter plots: Detect colored dots using blob detection
Line charts: Trace the line using skeletonization
Bar charts: Detect rectangles and measure heights
- CSV file with x,y coordinates
- JSON metadata (confidence, warnings, parameters)
- Visual overlay showing detected points
Chart2CSV reports a confidence score (0.0 to 1.0):
| Score | Meaning | Action |
|---|---|---|
| ≥ 0.7 | High confidence | Trust the results |
| 0.4 - 0.7 | Medium | Check the overlay |
| < 0.4 | Low | Use manual calibration |
OCR results are cached to speed up repeated extractions:
- Cache location:
~/.cache/chart2csv/ocr/ - Separate caches for Tesseract and Mistral
- Use
--no-cacheto bypass