Overview
Pipeline for extracting entities from daily content and sourcing visual assets (icons/logos). Seeking collaboration to improve coverage and methodology.
Related PR: #26
Current Pipeline
Daily Facts → Entity Extraction (LLM) → Inventory → Asset Matching → Coverage Report
↓
CoinGecko (tokens)
Manual curation (others)
Scripts
| Script |
Purpose |
scripts/etl/extract-entities.py |
Extract entities via LLM |
scripts/posters/fetch-icons.py |
Fetch token icons from CoinGecko |
scripts/posters/generate-asset-checklist.py |
Generate coverage report |
Current Coverage
| Category |
Coverage |
| Tokens |
20% (19/96) |
| Platforms |
17% (33/189) |
| Tech |
11% (18/157) |
| Projects |
14% (34/244) |
| Plugins |
30% (53/175) |
Strengths
- Automated extraction - LLM identifies entities from unstructured content
- Normalization -
--normalize-only dedupes without re-extraction (saves API calls)
- CoinGecko integration - Reliable token icons with rate limiting
- Fuzzy matching - Containment matching reduces false negatives
- Pre-scan efficiency - Checks existing files before making API calls
Weaknesses / Open Questions
- Low platform coverage - No reliable automated source for platform icons
- Manual curation - Plugins/projects need manual sourcing
- Entity noise - Extraction sometimes includes generic terms
- No OSINT automation - Finding official sources is still manual research
- No validation - Can't verify icon authenticity/currency
Ideas for Improvement
How to Contribute
- Improve coverage - Add CoinGecko ID mappings for missing tokens in
fetch-icons.py
- Source research - Find reliable APIs/methods for platform/tech icons
- Pipeline feedback - Suggest improvements to extraction/matching logic
- Icon contributions - Submit PRs with properly sourced icons
Files
scripts/posters/assets/entity-inventory.json - Current entity list (1143 entities)
scripts/posters/assets/asset-checklist.md - Coverage report
scripts/posters/assets/icons/ - Downloaded icons
Overview
Pipeline for extracting entities from daily content and sourcing visual assets (icons/logos). Seeking collaboration to improve coverage and methodology.
Related PR: #26
Current Pipeline
Scripts
scripts/etl/extract-entities.pyscripts/posters/fetch-icons.pyscripts/posters/generate-asset-checklist.pyCurrent Coverage
Strengths
--normalize-onlydedupes without re-extraction (saves API calls)Weaknesses / Open Questions
Ideas for Improvement
How to Contribute
fetch-icons.pyFiles
scripts/posters/assets/entity-inventory.json- Current entity list (1143 entities)scripts/posters/assets/asset-checklist.md- Coverage reportscripts/posters/assets/icons/- Downloaded icons