Skip to content

Conversation

@cpsievert
Copy link
Contributor

@cpsievert cpsievert commented Dec 10, 2025

Closes #128

Some of the Python changes here are a follow up to #119

@cpsievert cpsievert requested a review from Copilot December 10, 2025 17:22

This comment was marked as resolved.

cpsievert and others added 6 commits December 10, 2025 17:26
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@cpsievert cpsievert marked this pull request as ready for review December 10, 2025 17:50
@cpsievert cpsievert requested a review from gadenbuie December 10, 2025 17:50
Co-authored-by: Garrick Aden-Buie <garrick@adenbuie.com>
QueryChat leverages LLMs incredible capability to translate natural language into SQL queries. Frontier models are shockingly good at this task, but even the best models still need to know the overall data structure to perform well. For this reason, QueryChat supplies a [system prompt](context.qmd) with the schema of the data (i.e., column names, types, ranges, etc), but never the raw data itself.

When the LLM generates a SQL query, querychat executes it against a SQL database (DuckDB[^duckdb] by default) to get results in a **safe**, **reliable**, and **verifiable** manner. In short, this execution is **safe** since only `SELECT` statements are allowed, **reliable** since the database engine handles all calculations, and **verifiable** since the user can always see the SQL query that was run. This makes querychat a trustworthy tool for data exploration, as every action taken by the LLM is transparent and independently reproducible.
When the LLM generates a SQL query, QueryChat executes it against a SQL database (DuckDB[^duckdb] by default) to get results in a **safe**, **reliable**, and **verifiable** manner. In short, this execution is **safe** since only `SELECT` statements are allowed, **reliable** since the database engine handles all calculations, and **verifiable** since the user can always see the SQL query that was run. This makes QueryChat a trustworthy tool for data exploration, as every action taken by the LLM is transparent and independently reproducible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the LLM generates a SQL query, QueryChat executes it against a SQL database (DuckDB[^duckdb] by default) to get results in a **safe**, **reliable**, and **verifiable** manner. In short, this execution is **safe** since only `SELECT` statements are allowed, **reliable** since the database engine handles all calculations, and **verifiable** since the user can always see the SQL query that was run. This makes QueryChat a trustworthy tool for data exploration, as every action taken by the LLM is transparent and independently reproducible.
When the LLM generates a query, QueryChat executes it against a SQL database (DuckDB[^duckdb] by default) to get results in a **safe**, **reliable**, and **verifiable** manner.
In short, this execution is **safe** since only `SELECT` statements are allowed, **reliable** since the database engine handles all calculations, and **verifiable** since the user can always see the SQL query that was run.
This makes QueryChat a trustworthy tool for data exploration, as every action taken by the LLM is transparent and independently reproducible.

this execution is safe since only SELECT statements are allowed

This bit gives me pause because it's imposed by prompting, but this sounds like we'd actually take steps to disallow statements that don't rely on SELECT

Copy link
Contributor Author

@cpsievert cpsievert Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that we currently don't do anything, but I'm thinking we've leave this as-is for now, and possibly do more to ensure this?

Copy link
Contributor

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! It's definitely a big step forward for the docs. I've been submitting feedback as I worked through things. I made it through the stuff that shows up in the Python diffs; I'll pick up with the R things in a bit, although I suspect there's some overlap and that some of the Python comments will directly translate to the R docs

This commit addresses all 30 review comments from PR #162, implementing
comprehensive improvements to both R and Python documentation for consistency,
clarity, and better user experience.

## Capitalization Standardization

- Standardized use of "querychat" (lowercase) when referring to the package/product
  in prose throughout all documentation
- Maintained "QueryChat" (camel case) for Python class names in code examples
- Maintained "QueryChat" (camel case) when referring to class/instances in narrative
- Fixed overcorrections to ensure Python class name remains properly capitalized
- Files affected:
  - R: vignettes/tools.Rmd, vignettes/context.Rmd, README.md
  - Python: index.qmd, context.qmd, build.qmd, tools.qmd, models.qmd,
    greet.qmd, data-sources.qmd, _examples/*.py

## Grammar and Language Fixes

- Fixed "up vote" → "upvote" in both Python and R build documentation
- Removed unnecessary words: "In this case", "(safely)"
- Clarified LLM vs querychat roles: "The LLM generates SQL, querychat executes it"
- Improved sentence structure and flow throughout

## Content Improvements

### Introduction/README Changes (R & Python)
- Changed "For analysts" → "For users" (more inclusive)
- Rewrote developer section in second person for directness
- Made benefits more specific and less generic

### Python index.qmd Enhancements
- Fixed "VSCode" → "VS Code" (official branding)
- Mentioned Positron first, then VS Code
- Clarified that saving to file is optional (can run in console)
- Added QUERYCHAT_CLIENT environment variable example
- Simplified code example by removing explicit client parameter

### Context Documentation Restructuring (R & Python)
- Reorganized intro to be more linear:
  1. What querychat automatically gathers
  2. LLMs don't see actual data
  3. Three ways to customize system prompt
- Moved system prompt definition to footnote (Python) or parenthetical (R)
- Made it clearer that customization is optional enhancement

## Structural Improvements

### Python build.qmd Quarto Enhancements
- Extracted inline app code to separate, runnable files:
  - pkg-py/docs/_examples/titanic-dashboard.py
  - pkg-py/docs/_examples/multiple-datasets.py
- Replaced HTML <details>/<summary> with Quarto code-fold feature
- Used Quarto include syntax for cleaner documentation
- Apps can now be run and tested independently

### Site Tagline
- Reverted docs/index.html tagline to original "Chat with your data in any language"
- Original is more inviting and covers both R/Python + multilingual LLM support
- Fixed capitalization in description text

## Files Changed

Modified (10):
- docs/index.html
- pkg-py/docs/build.qmd
- pkg-py/docs/context.qmd
- pkg-py/docs/data-sources.qmd
- pkg-py/docs/index.qmd
- pkg-py/docs/tools.qmd
- pkg-r/README.md
- pkg-r/vignettes/build.Rmd
- pkg-r/vignettes/context.Rmd
- pkg-r/vignettes/tools.Rmd

Added (2):
- pkg-py/docs/_examples/multiple-datasets.py
- pkg-py/docs/_examples/titanic-dashboard.py

Statistics: 12 files changed, 184 insertions(+), 179 deletions(-)

All changes maintain consistency between R and Python documentation while
respecting their different documentation systems (R Markdown vs Quarto).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(R) Update website

3 participants