Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# About CrateDB changelog

## Unreleased
- Outline: Shrank llms-txt output to <200_000 input tokens

## v0.0.7 - 2025-07-22
- Prompt: Added `instructions-general.md` file when generating bundle
Expand Down
7 changes: 5 additions & 2 deletions src/cratedb_about/bundle/llmstxt.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,11 @@ def run(self):
# listing all the pages in the documentation.
# - The `llms-full.txt` contains the entire documentation, expanded from the `llms.txt`
# file. Note this may exceed the context window of your LLM.
Path(self.outdir / "llms.txt").write_text(self.outline.to_markdown())
Path(self.outdir / "llms-full.txt").write_text(self.outline.to_llms_txt(optional=True))
llms_txt = Path(self.outdir / "llms.txt")
llms_txt_full = Path(self.outdir / "llms-full.txt")

llms_txt.write_text(self.outline.to_markdown())
llms_txt_full.write_text(self.outline.to_llms_txt(optional=False))

return self

Expand Down
34 changes: 34 additions & 0 deletions src/cratedb_about/outline/cratedb-outline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,8 @@ data:
They also influence the behaviour when the records are queried.
parents: [ sql-syntax ]
tags: [ sql ]
# FIXME: This needs about 40_000 input tokens. Maybe a stripped-down variant could help?
markdown_enabled: false

# SQL: Functions
- title: "CrateDB SQL reference: Scalar functions"
Expand Down Expand Up @@ -388,18 +390,21 @@ data:
The `ctk cluster {start,info,stop}` subcommands provide higher level CLI
entrypoints to start/deploy/resume a database cluster, inquire information
about it, and stop/suspend it again.
markdown_enabled: false
- title: "Cluster API: Python"
link: https://cratedb-toolkit.readthedocs.io/_sources/cluster/python.md.txt
description: |
The `cratedb_toolkit.ManagedCluster` class provides the higher level API/SDK
entrypoints to start/deploy/resume a database cluster, inquire information
about it, and stop/suspend it again.
markdown_enabled: false
- title: "Cluster API: Tutorial"
link: https://cratedb-toolkit.readthedocs.io/_sources/cluster/tutorial.md.txt
description: |
This tutorial outlines end-to-end examples connecting to the CrateDB Cloud
API and the CrateDB database cluster. It includes examples about both the
CrateDB Cluster CLI and the CrateDB Cluster Python API.
markdown_enabled: false

# Drivers and clients
- title: "CrateDB drivers and clients"
Expand All @@ -410,6 +415,7 @@ data:
source: docs
type: index
id: drivers
markdown_enabled: false
- title: "CrateDB Python Client"
link: https://cratedb.com/docs/python/en/latest/_sources/index.rst.txt
description: |
Expand All @@ -419,6 +425,7 @@ data:
connecting to CrateDB from the Python ecosystem. It is verified to work with CPython, but it has also
been tested successfully with PyPy.
tags: [ driver ]
markdown_enabled: false
- title: "CrateDB SQLAlchemy dialect"
link: https://cratedb.com/docs/sqlalchemy-cratedb/_sources/index.rst.txt
description: |
Expand All @@ -429,12 +436,14 @@ data:
CrateDB from the Python ecosystem. It is verified to work with CPython, but it has also been tested
successfully with PyPy.
tags: [ driver ]
markdown_enabled: false
- title: "CrateDB Driver for MicroPython"
link: https://raw.githubusercontent.com/crate/micropython-cratedb/refs/heads/main/README.md
description: |
micropython-cratedb is a CrateDB driver for the MicroPython language.
It connects to CrateDB using the HTTP Endpoint.
tags: [ driver ]
markdown_enabled: false
- title: "Python psycopg3 driver"
link: https://www.psycopg.org/psycopg3/docs/_sources/basic/usage.rst.txt
description: |
Expand All @@ -447,6 +456,7 @@ data:
The basic Psycopg usage is common to all the database adapters implementing the DB-API protocol.
Other database adapters, such as the builtin sqlite3 or psycopg2, have roughly the same pattern of interaction.
tags: [ driver ]
markdown_enabled: false
- title: "node-postgres driver"
link: https://raw.githubusercontent.com/brianc/node-postgres/refs/heads/master/docs/pages/index.mdx
description: |
Expand All @@ -455,6 +465,7 @@ data:
It has support for callbacks, promises, async/await, connection pooling, prepared statements,
cursors, streaming results, C/C++ bindings, rich type parsing, and more.
tags: [ driver ]
markdown_enabled: false
- title: "PostgreSQL JDBC Driver"
link: https://raw.githubusercontent.com/pgjdbc/pgjdbc/refs/heads/master/docs/content/documentation/_index.md
description: |
Expand All @@ -463,6 +474,7 @@ data:
Pure Java (Type 4), and communicates in the PostgreSQL native network protocol. Because of this,
the driver is platform independent; once compiled, the driver can be used on any system.
tags: [ driver ]
markdown_enabled: false
- title: "PostgreSQL driver and toolkit for Go"
link: https://raw.githubusercontent.com/jackc/pgx/refs/heads/master/README.md
description: |
Expand All @@ -471,23 +483,27 @@ data:
the wire protocol and type mapping between PostgreSQL and Go. These underlying packages can be used to
implement alternative drivers, proxies, load balancers, logical replication clients, etc.
tags: [ driver ]
markdown_enabled: false
- title: "Npgsql - .NET Access to PostgreSQL"
link: https://raw.githubusercontent.com/npgsql/doc/refs/heads/main/conceptual/Npgsql/index.md
description: |
Npgsql is an open source ADO.NET Data Provider for PostgreSQL, it allows programs written in C#,
Visual Basic, F# to access the PostgreSQL database server. It is implemented in 100% C# code,
is free and is open source.
tags: [ driver ]
markdown_enabled: false
- title: "psqlODBC - PostgreSQL ODBC driver"
link: https://raw.githubusercontent.com/postgresql-interfaces/psqlodbc/refs/heads/main/docs/config.html
description: A library to talk to the PostgreSQL DBMS using ODBC.
tags: [ driver ]
markdown_enabled: false
- title: "PHP PostgreSQL PDO Driver (PDO_PGSQL)"
link: https://raw.githubusercontent.com/php/doc-en/refs/heads/master/reference/pdo_pgsql/reference.xml
description: |
PDO_PGSQL is a driver that implements the PHP Data Objects (PDO) interface
to enable access from PHP to PostgreSQL databases.
tags: [ driver ]
markdown_enabled: false

- name: Examples
items:
Expand All @@ -511,23 +527,29 @@ data:
- title: "CrateDB GTFS / GTFS-RT Transit Data Demo"
link: https://raw.githubusercontent.com/crate/devrel-gtfs-transit/refs/heads/main/README.md
description: Capture GTFS and GTFS-RT data for storage and analysis with CrateDB.
markdown_enabled: false
- title: "CrateDB Offshore Wind Farms Demo Application"
link: https://raw.githubusercontent.com/crate/devrel-offshore-wind-farms-demo/refs/heads/main/README.md
description: A CrateDB demo application using data from the UK's offshore wind farms.
markdown_enabled: false
- title: "CrateDB RAG / Hybrid Search PDF Chatbot"
link: https://raw.githubusercontent.com/crate/devrel-pdf-rag-chatbot/refs/heads/main/README.md
description: A chatbot powered by CrateDB using RAG techniques and data from PDF files.
markdown_enabled: false
- title: "CrateDB Geospatial Data Demo"
link: https://raw.githubusercontent.com/crate/devrel-shipping-forecast-geo-demo/refs/heads/main/README.md
description: Spatial data demo application using CrateDB and the Express framework.
markdown_enabled: false
- title: "Plane Spotting with Software Defined Radio, CrateDB and Node.js"
link: https://raw.githubusercontent.com/crate/devrel-plane-spotting-with-cratedb/refs/heads/main/README.md
description: Code for the Plane Spotting with Software Defined Radio, CrateDB and Node.js talk.
markdown_enabled: false
- title: "MongoDB/CrateDB/Grafana CDC Demonstration"
link: https://raw.githubusercontent.com/crate/devrel-mongo-cdc-demo/refs/heads/main/README.md
description: |
A small Python project that demonstrates how a CrateDB database can be populated and kept
in sync with a collection in MongoDB using Change Data Capture (CDC).
markdown_enabled: false

- name: Optional
items:
Expand All @@ -541,36 +563,44 @@ data:
type: index
id: cloud
parents: [ cloud ]
markdown_enabled: false
- title: "CrateDB Cloud: Services"
link: https://cratedb.com/docs/cloud/en/latest/_sources/reference/services.md.txt
description: Services specifications and variants of CrateDB Cloud.
parents: [ cloud ]
markdown_enabled: false
- title: "CrateDB Cloud: Billing"
link: https://cratedb.com/docs/cloud/en/latest/_sources/organization/billing.md.txt
description: How billing works in CrateDB Cloud.
parents: [ cloud ]
markdown_enabled: false
- title: "CrateDB Cloud: API"
link: https://cratedb.com/docs/cloud/en/latest/_sources/organization/api.md.txt
description: CrateDB Cloud provides an HTTP API for programmatic cluster and resource management.
parents: [ cloud ]
markdown_enabled: false
- title: "CrateDB Cloud: Import data"
link: https://cratedb.com/docs/cloud/en/latest/_sources/cluster/import.md.txt
description: How to conveniently import data into CrateDB Cloud.
parents: [ cloud ]
markdown_enabled: false
- title: "CrateDB Cloud: Export data"
link: https://cratedb.com/docs/cloud/en/latest/_sources/cluster/export.md.txt
description: How to conveniently export data from CrateDB Cloud.
parents: [ cloud ]
markdown_enabled: false
- title: "CrateDB Cloud: Automatic backups"
link: https://cratedb.com/docs/cloud/en/latest/_sources/cluster/backups.md.txt
description: How automatic backups work in CrateDB Cloud.
parents: [ cloud ]
markdown_enabled: false
- title: "CrateDB Cloud: MongoDB CDC integration"
link: https://cratedb.com/docs/cloud/en/latest/_sources/cluster/integrations/mongo-cdc.md.txt
description: |
CrateDB Cloud enables continuous data ingestion from MongoDB using Change Data Capture (CDC),
providing seamless, real-time synchronization of your data.
parents: [ cloud ]
markdown_enabled: false

# Features
- title: "CrateDB features"
Expand Down Expand Up @@ -691,13 +721,15 @@ data:
- How to provide content from Jupyter Notebooks?
- What other content to feed about the timeseries topic?
source: examples
markdown_enabled: false
- title: "Timeseries QA Assistant with CrateDB, LLMs, and Machine Manuals"
link: https://raw.githubusercontent.com/crate/cratedb-examples/refs/heads/main/topic/chatbot/table-augmented-generation/app/README.md
description: |
A full interactive pipeline for simulating telemetry data from industrial motors,
storing that data in CrateDB, and enabling natural-language querying powered by
OpenAI — including RAG-style guidance from machine manuals.
source: examples
markdown_enabled: false

# Generative AI
- title: "LangChain and CrateDB"
Expand All @@ -710,7 +742,9 @@ data:
link: https://raw.githubusercontent.com/crate/about/refs/heads/main/src/content/blog/shared-nothing-architecture-multi-model-databases-scalable-real-time-analytics.md
description: Leveraging Shared Nothing Architecture and Multi-Model Databases for Scalable Real-Time Analytics on Large Data.
source: blog
markdown_enabled: false
- title: "Use case: Digital Twins"
link: https://raw.githubusercontent.com/crate/about/refs/heads/main/src/content/blog/digital-twins.md
description: Digital twins are virtual representations of physical objects, processes, or systems in the digital realm. The abundance of data to be processed in digital twin setups is no problem for CrateDB.
source: blog
markdown_enabled: false
3 changes: 3 additions & 0 deletions src/cratedb_about/outline/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ class OutlineItem(DictTools):
title: str
link: str
description: str
markdown_enabled: bool = True

def __attrs_post_init__(self):
# FIXME: Currently, `llms_txt` does not accept newlines in description fields.
Expand Down Expand Up @@ -76,6 +77,8 @@ def to_markdown(self) -> str:
for section in self.data.sections:
buffer.write(f"## {section.name}\n\n")
for item in section.items:
if not item.markdown_enabled:
continue
buffer.write(f"- [{item.title}]({item.link}): {item.description}\n")
buffer.write("\n")
return buffer.getvalue().strip()
Expand Down