Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ to relevant resources in the spirit of a curated knowledge backbone.

## What's inside

- The [cratedb.md] file includes hints about what CrateDB is and what
you can do with it.
- The [cratedb-overview.md] file includes hints about what CrateDB is
and what you can do with it.

- The [llm] folder includes a few [llms.txt] files generated from
[cratedb.md], which can be used to provide better context to LLM
conversations about CrateDB.
- The [build/llm] folder includes a few [llms.txt] files generated from
[cratedb-overview.md]. They can be used to provide better context
for conversations about CrateDB.


[build/llm]: ./build/llm
[CrateDB]: https://cratedb.com/database
[cratedb.md]: ./cratedb.md
[llm]: ./llm
[cratedb-overview.md]: ./src/index/cratedb-overview.md
[llms.txt]: https://llmstxt.org/
71 changes: 71 additions & 0 deletions src/index/cratedb-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# CrateDB

> CrateDB is a distributed and scalable SQL database for storing and
> analyzing massive amounts of data in near real-time, even with
> complex queries. It is based on Lucene, inherits technologies from
> Elasticsearch, and is compatible with PostgreSQL.
>
> CrateDB also provides an HTTP interface, is written in Java, and brings
> together fundamental components to manage big data after the Hadoop and
> Spark batch-processing era, more like Teradata, BigQuery and Snowflake
> are doing it.

Things to remember when working with CrateDB are:

- CrateDB is a distributed database, where individual nodes form a database cluster, using a shared-nothing architecture.
- Clients can connect to CrateDB using HTTP or the PostgreSQL wire protocol.
- The default TCP ports of CrateDB are 4200 for the HTTP interface and 5432 for the PostgreSQL interface.
- The language of choice after connecting to CrateDB is to use SQL, mostly compatible with PostgreSQL's SQL dialect.
- The data storage layer is based on Lucene, the data distribution layer was inspired by Elasticsearch.
- Storage concepts of CrateDB include partitioning and sharding to manage data larger than fitting on a single machine.
- CrateDB Cloud offers a managed option for running CrateDB and provides additional features like automated backups, data ingest / ETL utilities, or scheduling recurrent jobs.
- Get started with CrateDB Cloud at `https://console.cratedb.cloud`.
- CrateDB also provides an option to run it on your premises, optimally by using its Docker/OCI image `docker.io/crate`. Nightly images are available per `docker.io/crate/crate:nightly`.

## Docs

- [CrateDB database](https://cratedb.com/docs/guide/_sources/home/index.md.txt): Benefits of CrateDB at a glance.
- [CrateDB features](https://cratedb.com/docs/guide/_sources/feature/index.md.txt): All features of CrateDB at a glance.
- [CrateDB reference documentation](https://cratedb.com/docs/reference/en/latest/_sources/index.rst.txt): The reference documentation of CrateDB.
- [CrateDB Cloud](https://cratedb.com/docs/cloud/en/latest/_sources/index.md.txt): The full documentation for CrateDB Cloud.
- [Guide: CrateDB sharding](https://cratedb.com/docs/guide/_sources/performance/sharding.rst.txt): A best practice guide about sharding with CrateDB.
- [Guide: CrateDB query optimization](https://cratedb.com/docs/guide/_sources/performance/optimization.rst.txt): Essential principles for optimizing queries in CrateDB while avoiding the most common pitfalls.
- [Guide: Design for scale](https://cratedb.com/docs/guide/_sources/performance/scaling.rst.txt): Critical design considerations to successfully scale CrateDB in large production environments to ensure performance and reliability as workloads grow.
- [Data modeling: Sequences](https://cratedb.com/docs/guide/_sources/performance/inserts/sequences.rst.txt): About autogenerated sequences and PRIMARY KEY values in CrateDB.

## API

- [CrateDB SQL syntax](https://cratedb.com/docs/reference/en/latest/_sources/sql/index.rst.txt): You can use Structured Query Language (SQL) to query your data.
- [CrateDB SQL functions](https://cratedb.com/docs/reference/en/latest/_sources/general/builtins/scalar-functions.rst.txt): The reference documentation about all SQL functions CrateDB provides.
- [CrateDB drivers](https://cratedb.com/docs/crate/clients-tools/en/latest/_sources/connect/index.md.txt): How to connect to a CrateDB cluster using traditional database drivers.
- [CrateDB cluster-wide settings](https://cratedb.com/docs/reference/en/latest/_sources/config/cluster.rst.txt): Cluster-wide settings can be read by querying the `sys.cluster.settings` column. Most cluster settings can be changed at runtime.
- [CrateDB node-specific settings](https://cratedb.com/docs/reference/en/latest/_sources/config/node.rst.txt): Node-specific settings of CrateDB.

## Examples

- [CrateDB SQL gallery](https://github.com/crate/cratedb-toolkit/raw/refs/tags/v0.0.31/cratedb_toolkit/info/library.py): A collection of SQL queries and utilities suitable for diagnostics on CrateDB.
- [CrateDB GTFS / GTFS-RT Transit Data Demo](https://github.com/crate/devrel-gtfs-transit/raw/refs/heads/main/README.md): Capture GTFS and GTFS-RT data for storage and analysis with CrateDB.
- [CrateDB Offshore Wind Farms Demo Application](https://github.com/crate/devrel-offshore-wind-farms-demo/raw/refs/heads/main/README.md): A CrateDB demo application using data from the UK's offshore wind farms.
- [CrateDB RAG / Hybrid Search PDF Chatbot](https://github.com/crate/devrel-pdf-rag-chatbot/raw/refs/heads/main/README.md): A chatbot powered by CrateDB using RAG techniques and data from PDF files.
- [CrateDB Geospatial Data Demo](https://github.com/crate/devrel-shipping-forecast-geo-demo/raw/refs/heads/main/README.md): Spatial data demo application using CrateDB and the Express framework.
- [Plane Spotting with Software Defined Radio, CrateDB and Node.js](https://github.com/crate/devrel-plane-spotting-with-cratedb/raw/refs/heads/main/README.md): Code for the Plane Spotting with Software Defined Radio, CrateDB and Node.js talk.

## Optional

- [Concept: Clustering](https://cratedb.com/docs/reference/en/latest/_sources/concepts/clustering.rst.txt): How the distributed SQL database CrateDB uses a shared nothing architecture to form high-availability, resilient database clusters with minimal effort of configuration.
- [Concept: Distributed joins](https://cratedb.com/docs/reference/en/latest/_sources/concepts/joins.rst.txt): Make joins work on large volumes of data, stored distributed.
- [Concept: Storage and consistency](https://cratedb.com/docs/reference/en/latest/_sources/concepts/storage-consistency.rst.txt): How CrateDB stores and distributes state across the cluster and what consistency and durability guarantees are provided.
- [Concept: Resiliency](https://cratedb.com/docs/reference/en/latest/_sources/concepts/resiliency.rst.txt): How CrateDB copes with network-, disk-, or machine-failures.
- [CrateDB Cloud: Services](https://cratedb.com/docs/cloud/en/latest/_sources/reference/services.md.txt): Services specifications and variants of CrateDB Cloud.
- [CrateDB Cloud: Billing](https://cratedb.com/docs/cloud/en/latest/_sources/organization/billing.md.txt): How billing works in CrateDB Cloud.
- [CrateDB Cloud: API](https://cratedb.com/docs/cloud/en/latest/_sources/organization/api.md.txt): CrateDB Cloud provides an HTTP API for programmatic cluster and resource management.
- [CrateDB Cloud: Import data](https://cratedb.com/docs/cloud/en/latest/_sources/cluster/import.md.txt): How to conveniently import data into CrateDB Cloud.
- [CrateDB Cloud: Export data](https://cratedb.com/docs/cloud/en/latest/_sources/cluster/export.md.txt): How to conveniently export data from CrateDB Cloud.
- [CrateDB Cloud: Automatic backups](https://cratedb.com/docs/cloud/en/latest/_sources/cluster/backups.md.txt): How automatic backups work in CrateDB Cloud.
- [CrateDB Cloud: MongoDB CDC integration](https://cratedb.com/docs/cloud/en/latest/_sources/cluster/integrations/mongo-cdc.md.txt): CrateDB Cloud enables continuous data ingestion from MongoDB using Change Data Capture (CDC), providing seamless, real-time synchronization of your data.
- [Feature: User-defined functions](https://cratedb.com/docs/reference/en/latest/_sources/general/user-defined-functions.rst.txt): CrateDB supports user-defined functions.
- [Integration Tutorials I](https://cratedb.com/docs/guide/_sources/integrate/index.md.txt): Integrating 3rd party software with CrateDB.
- [Integration Tutorials II](https://community.cratedb.com/raw/1015): Overview of CrateDB integration tutorials.
- [Time Series with CrateDB](https://github.com/crate/cratedb-examples/raw/refs/heads/main/topic/timeseries/README.md): Examples, tutorials and runnable code on how to use CrateDB for time-series use cases. Exploratory data analysis, time-series decomposition, anomaly detection, forecasting.
- [Timeseries QA Assistant with CrateDB, LLMs, and Machine Manuals](https://github.com/crate/cratedb-examples/raw/refs/heads/main/topic/chatbot/table-augmented-generation/README.md): A full interactive pipeline for simulating telemetry data from industrial motors, storing that data in CrateDB, and enabling natural-language querying powered by OpenAI — including RAG-style guidance from machine manuals.
- [LangChain and CrateDB](https://github.com/crate/cratedb-examples/raw/refs/heads/main/topic/machine-learning/llm-langchain/README.md): Get started with LangChain and CrateDB.