diff --git a/README.md b/README.md index 379b4526..46ae17a1 100644 --- a/README.md +++ b/README.md @@ -1,46 +1,7 @@ -# querychat: Chat with your data in any language +# querychat querychat website -querychat is a multilingual package that allows you to chat with your data using natural language queries. It's available for: +QueryChat facilitates safe and reliable natural language exploration of tabular data, powered by SQL and large language models (LLMs). -- [R - Shiny](pkg-r/README.md) -- [Python - Shiny for Python](pkg-py/README.md) +To get started, see the [official website](https://posit-dev.github.io/querychat/). -## Overview - -Imagine typing questions like these directly into your dashboard, and seeing the results in realtime: - -* "Show only penguins that are not species Gentoo and have a bill length greater than 50mm." -* "Show only blue states with an incidence rate greater than 100 per 100,000 people." -* "What is the average mpg of cars with 6 cylinders?" - -querychat is a drop-in component for Shiny that allows users to query a data frame using natural language. The results are available as a reactive data frame, so they can be easily used from Shiny outputs, reactive expressions, downloads, etc. - -| ![Animation of a dashboard being filtered by a chatbot in the sidebar](animation.gif) | -|-| - -[Live demo](https://jcheng.shinyapps.io/sidebot/) - -**This is not as terrible an idea as you might think!** We need to be very careful when bringing LLMs into data analysis, as we all know that they are prone to hallucinations and other classes of errors. querychat is designed to excel in reliability, transparency, and reproducibility by using this one technique: denying it raw access to the data, and forcing it to write SQL queries instead. - -## How it works - -### Powered by LLMs - -querychat's natural language chat experience is powered by LLMs (like GPT-4o, Claude 3.5 Sonnet, etc.) that support function/tool calling capabilities. - -### Powered by SQL - -querychat doesn't send the raw data to the LLM, asking it to guess summary statistics. Instead, the LLM generates precise SQL queries to filter the data or directly calculate statistics. This is crucial for ensuring relability, transparency, and reproducibility: - -- **Reliability:** Today's LLMs are excellent at writing SQL, but bad at direct calculation. -- **Transparency:** querychat always displays the SQL to the user, so it can be vetted instead of blindly trusted. -- **Reproducibility:** The SQL query can be easily copied and reused. - -Currently, querychat uses DuckDB for its SQL engine when working with data frames. For database sources, it uses the native SQL dialect of the connected database. - -## Language-specific Documentation - -For detailed information on how to use querychat in your preferred language, see the language-specific READMEs: - -- [R Documentation](pkg-r/README.md) -- [Python Documentation](pkg-py/README.md) \ No newline at end of file +Or, the README for [R](pkg-r/README.md) and [Python](pkg-py/README.md). \ No newline at end of file diff --git a/docs/index.html b/docs/index.html index 62c1e9d4..e6d09f3b 100644 --- a/docs/index.html +++ b/docs/index.html @@ -224,7 +224,7 @@

querychat

Chat with your data in any language

- A drop-in component for Shiny that allows you to chat with your data using natural language queries. + querychat facilitates safe and reliable natural language exploration of tabular data, powered by SQL and large language models (LLMs). Available for both R and Python.

querychat website -Please see [the package documentation site](https://posit-dev.github.io/querychat/py/index.html) for installation, setup, and usage. +

+ +PyPI +MIT License +versions +Python Tests + +

-If you are looking for querychat python examples, -you can find them in the `examples/` directory. + +QueryChat facilitates safe and reliable natural language exploration of tabular data, powered by SQL and large language models (LLMs). For analysts, it offers an intuitive web application where they can quickly ask questions of their data and receive verifiable data-driven answers. For software developers, QueryChat provides a comprehensive Python API to access core functionality -- including chat UI, generated SQL statements, resulting data, and more. This capability enables the seamless integration of natural language querying into bespoke data applications. ## Installation -You can install the package from PyPI using pip: +Install the latest stable release [from PyPI](https://pypi.org/project/querychat/): ```bash pip install querychat ``` -Or you can install querychat directly from GitHub: +## Quick start -```bash -pip install "querychat @ git+https://github.com/posit-dev/querychat" +The main entry point is the [`QueryChat` class](https://posit-dev.github.io/querychat/py/reference/QueryChat.html). It requires a [data source](https://posit-dev.github.io/querychat/py/data-sources.html) (e.g., pandas, polars, etc) and a name for the data. + +```python +from querychat import QueryChat +from querychat.data import titanic + +qc = QueryChat(titanic(), "titanic") +app = qc.app() +# app.run() ``` + +

+ QueryChat interface showing natural language queries +

+ +## Custom apps + +Build your own custom web apps with natural language querying capabilities, such as [this one](https://github.com/posit-conf-2025/llm/blob/main/_solutions/25_querychat/25_querychat_02-end-app.R) which provides a bespoke interface for exploring Airbnb listings: + +

+ A custom app for exploring Airbnb listings, powered by QueryChat. +

+ +## Learn more + +See the [website](https://posit-dev.github.io/querychat/py) to learn more. diff --git a/pkg-py/docs/_examples/multiple-datasets.py b/pkg-py/docs/_examples/multiple-datasets.py new file mode 100644 index 00000000..ce11f29b --- /dev/null +++ b/pkg-py/docs/_examples/multiple-datasets.py @@ -0,0 +1,31 @@ +from querychat.data import titanic +from querychat.express import QueryChat +from seaborn import load_dataset +from shiny.express import render, ui + +penguins = load_dataset("penguins") + +qc_titanic = QueryChat(titanic(), "titanic") +qc_penguins = QueryChat(penguins, "penguins") + +with ui.sidebar(): + with ui.panel_conditional("input.navbar == 'Titanic'"): + qc_titanic.ui() + with ui.panel_conditional("input.navbar == 'Penguins'"): + qc_penguins.ui() + +with ui.nav_panel("Titanic"): + @render.data_frame + def titanic_table(): + return qc_titanic.df() + +with ui.nav_panel("Penguins"): + @render.data_frame + def penguins_table(): + return qc_penguins.df() + +ui.page_opts( + id="navbar", + title="Multiple Datasets with querychat", + fillable=True, +) diff --git a/pkg-py/docs/_examples/titanic-dashboard.py b/pkg-py/docs/_examples/titanic-dashboard.py new file mode 100644 index 00000000..d6d35187 --- /dev/null +++ b/pkg-py/docs/_examples/titanic-dashboard.py @@ -0,0 +1,83 @@ +import plotly.express as px +from faicons import icon_svg +from querychat.data import titanic +from querychat.express import QueryChat +from shiny.express import render, ui +from shinywidgets import render_plotly + +qc = QueryChat(titanic(), "titanic") +qc.sidebar() + +with ui.layout_column_wrap(fill=False): + with ui.value_box(showcase=icon_svg("users")): + "Passengers" + + @render.text + def count(): + return str(len(qc.df())) + + with ui.value_box(showcase=icon_svg("heart")): + "Survival Rate" + + @render.text + def survival(): + rate = qc.df()["survived"].mean() * 100 + return f"{rate:.1f}%" + + with ui.value_box(showcase=icon_svg("coins")): + "Avg Fare" + + @render.text + def fare(): + avg = qc.df()["fare"].mean() + return f"${avg:.2f}" + +with ui.layout_columns(): + with ui.card(): + with ui.card_header(): + "Data Table" + + @render.text + def table_title(): + return f" - {qc.title()}" if qc.title() else "" + + @render.data_frame + def data_table(): + return qc.df() + + with ui.card(): + ui.card_header("Survival by Class") + + @render_plotly + def survival_by_class(): + df = qc.df() + summary = df.groupby("pclass")["survived"].mean().reset_index() + return px.bar( + summary, + x="pclass", + y="survived", + labels={"pclass": "Class", "survived": "Survival Rate"}, + ) + +with ui.layout_columns(): + with ui.card(): + ui.card_header("Age Distribution") + + @render_plotly + def age_dist(): + df = qc.df() + return px.histogram(df, x="age", nbins=30) + + with ui.card(): + ui.card_header("Fare by Class") + + @render_plotly + def fare_by_class(): + df = qc.df() + return px.box(df, x="pclass", y="fare", color="survived") + +ui.page_opts( + title="Titanic Survival Analysis", + fillable=True, + class_="bslib-page-dashboard", +) diff --git a/pkg-py/docs/_quarto.yml b/pkg-py/docs/_quarto.yml index eaf0c7c2..1cb4e17a 100644 --- a/pkg-py/docs/_quarto.yml +++ b/pkg-py/docs/_quarto.yml @@ -72,7 +72,7 @@ quartodoc: sidebar: reference/_sidebar.yml css: reference/_styles-quartodoc.css sections: - - title: The Querychat class + - title: The QueryChat class desc: The starting point for any QueryChat session contents: - name: QueryChat diff --git a/pkg-py/docs/build.qmd b/pkg-py/docs/build.qmd index 71971b24..d3596ee8 100644 --- a/pkg-py/docs/build.qmd +++ b/pkg-py/docs/build.qmd @@ -2,14 +2,14 @@ title: Build an app --- -While the [`.app()` method](reference/QueryChat.qmd#querychat.QueryChat.app) provides a quick way to start exploring data, building bespoke Shiny apps with QueryChat unlocks the full power of integrating natural language data exploration with custom visualizations, layouts, and interactivity. This guide shows you how to integrate QueryChat into your own Shiny applications and leverage its reactive data outputs to create rich, interactive dashboards. +While the [`.app()` method](reference/querychat.qmd#querychat.querychat.app) provides a quick way to start exploring data, building bespoke Shiny apps with querychat unlocks the full power of integrating natural language data exploration with custom visualizations, layouts, and interactivity. This guide shows you how to integrate querychat into your own Shiny applications and leverage its reactive data outputs to create rich, interactive dashboards. ## Starter template -Integrating QueryChat into a Shiny app requires just three steps: +Integrating querychat into a Shiny app requires just three steps: -1. Initialize a `QueryChat()` instance with your data -2. Add the QueryChat UI component (either `.sidebar()` or `.ui()`) +1. Initialize a `querychat()` instance with your data +2. Add the querychat UI component (either `.sidebar()` or `.ui()`) 3. Use reactive values like `.df()`, `.sql()`, and `.title()` to build outputs that respond to user queries Here's a starter template demonstrating these steps: @@ -32,12 +32,12 @@ Here's a starter template demonstrating these steps: ::: ::: callout-note -With Core, you'll need to call the `qc.server()` method within your server function to set up QueryChat's reactive behavior, and capture its return value to access reactive data. This is not necessary with Express, which handles it automatically and exposes reactive values directly on the `QueryChat` instance. +With Core, you'll need to call the `qc.server()` method within your server function to set up querychat's reactive behavior, and capture its return value to access reactive data. This is not necessary with Express, which handles it automatically and exposes reactive values directly on the `querychat` instance. ::: ## Reactives -There are three main reactive values provided by QueryChat for use in your app: +There are three main reactive values provided by querychat for use in your app: ### Filtered data {#filtered-data} @@ -175,7 +175,7 @@ Learn more about customizing Shiny chat UIs in the [Shiny Chat documentation](ht ## Data views -Thanks to Shiny's support for [Jupyter Widgets](https://shiny.posit.co/py/docs/jupyter-widgets.html) like [Plotly](https://shiny.posit.co/py/components/outputs/plot-plotly/), it's straightforward to create rich data views that depend on QueryChat data. Here's an example of an app showing both the filtered data and a bar chart depending on that same data: +Thanks to Shiny's support for [Jupyter Widgets](https://shiny.posit.co/py/docs/jupyter-widgets.html) like [Plotly](https://shiny.posit.co/py/components/outputs/plot-plotly/), it's straightforward to create rich data views that depend on querychat data. Here's an example of an app showing both the filtered data and a bar chart depending on that same data: ```python @@ -214,105 +214,17 @@ Now when a user filters the data through natural language (e.g., "filter to only A more useful, but slightly more involved example like the one below might incorporate other [Shiny components](https://shiny.posit.co/py/components/) like value boxes to summarize key statistics about the filtered data. - -
-app.py - -```python -from shiny.express import render, ui -from shinywidgets import render_plotly -from querychat.express import QueryChat -from querychat.data import titanic -from faicons import icon_svg -import plotly.express as px - -qc = QueryChat(titanic(), "titanic") -qc.sidebar() - -with ui.layout_column_wrap(fill=False): - with ui.value_box(showcase=icon_svg("users")): - "Passengers" - - @render.text - def count(): - return str(len(qc.df())) - - with ui.value_box(showcase=icon_svg("heart")): - "Survival Rate" - - @render.text - def survival(): - rate = qc.df()['survived'].mean() * 100 - return f"{rate:.1f}%" - - with ui.value_box(showcase=icon_svg("coins")): - "Avg Fare" - - @render.text - def fare(): - avg = qc.df()['fare'].mean() - return f"${avg:.2f}" - -with ui.layout_columns(): - with ui.card(): - with ui.card_header(): - "Data Table" - - @render.text - def table_title(): - return f" - {qc.title()}" if qc.title() else "" - - @render.data_frame - def data_table(): - return qc.df() - - with ui.card(): - ui.card_header("Survival by Class") - - @render_plotly - def survival_by_class(): - df = qc.df() - summary = df.groupby('pclass')['survived'].mean().reset_index() - return px.bar( - summary, - x='pclass', - y='survived', - labels={'pclass': 'Class', 'survived': 'Survival Rate'}, - ) - -with ui.layout_columns(): - with ui.card(): - ui.card_header("Age Distribution") - - @render_plotly - def age_dist(): - df = qc.df() - return px.histogram(df, x='age', nbins=30) - - with ui.card(): - ui.card_header("Fare by Class") - - @render_plotly - def fare_by_class(): - df = qc.df() - return px.box(df, x='pclass', y='fare', color='survived') - -ui.page_opts( - title="Titanic Survival Analysis", - fillable=True, - class_="bslib-page-dashboard", -) +```{.python filename="titanic-dashboard.py" code-fold="true" code-summary="Show app code"} +{{< include _examples/titanic-dashboard.py >}} ``` -
- ![](/images/rich-data-views.png){fig-alt="Screenshot of a querychat app showing value boxes, a data table, and multiple plots." class="lightbox shadow rounded mb-3"} ## Programmatic updates -QueryChat's reactive state can be updated programmatically. For example, you might want to add a "Reset Filters" button that clears any active filters and returns the data table to its original state. You can do this by setting both the SQL query and title to their default values. This way you don't have to rely on both the user and LLM to send the right prompt. +querychat's reactive state can be updated programmatically. For example, you might want to add a "Reset Filters" button that clears any active filters and returns the data table to its original state. You can do this by setting both the SQL query and title to their default values. This way you don't have to rely on both the user and LLM to send the right prompt. ::: {.panel-tabset group="shiny-mode"} @@ -346,168 +258,32 @@ def _(): This is equivalent to the user asking the LLM to "reset" or "show all data". -## Multiple datasets - -You can use multiple QueryChat instances in a single app to explore different datasets. Just ensure each instance has a different table name (or `id` which derives the table name) to avoid conflicts. Here's an example with two datasets: - -```python -from seaborn import load_dataset -from shiny.express import render, ui -from querychat.express import QueryChat -from querychat.data import titanic - -penguins = load_dataset("penguins") - -qc_titanic = QueryChat(titanic(), "titanic") -qc_penguins = QueryChat(penguins, "penguins") +## Multiple tables -with ui.sidebar(): - with ui.panel_conditional("input.navbar == 'Titanic'"): - qc_titanic.ui() - with ui.panel_conditional("input.navbar == 'Penguins'"): - qc_penguins.ui() - -with ui.nav_panel("Titanic"): - @render.data_frame - def titanic_table(): - return qc_titanic.df() - -with ui.nav_panel("Penguins"): - @render.data_frame - def penguins_table(): - return qc_penguins.df() - -ui.page_opts( - id="navbar", - title="Multiple Datasets with QueryChat", - fillable=True, -) -``` - -![](/images/multiple-datasets.png){fig-alt="Screenshot of a querychat app with two datasets: titanic and penguins." class="lightbox shadow rounded mb-3"} - - -## Complete example - -Here's a complete example bringing together multiple concepts - a Titanic survival analysis dashboard with natural language exploration, coordinated visualizations, and custom controls: - -```python -from shiny.express import render, ui -from querychat.express import QueryChat -from querychat.data import titanic -import plotly.express as px - -# Create QueryChat -qc = QueryChat( - titanic(), - "titanic", - data_description="Titanic passenger data with survival outcomes", -) - -# Page configuration -ui.page_opts( - title="Titanic Survival Analysis", - fillable=True, - class_="bslib-page-dashboard", -) - -# Create sidebar with chat -with ui.sidebar(width=400): - qc.ui() - ui.hr() - ui.input_action_button("reset", "Reset Filters", class_="w-100") - -# Summary cards -with ui.layout_columns(): - with ui.value_box(showcase=ui.icon("users")): - "Passengers" - - @render.text - def count(): - return str(len(qc.df())) +Currently, you have two options for exploring multiple tables in querychat: - with ui.value_box(showcase=ui.icon("heart")): - "Survival Rate" +1. Join the tables into a single table before passing to QueryChat +2. Use multiple QueryChat instances in the same app - @render.text - def survival(): - rate = qc.df()['survived'].mean() * 100 - return f"{rate:.1f}%" +The first option makes it possible to chat with multiple tables inside a single chat interface, whereas the second option requires a separate chat interface for each table. - with ui.value_box(showcase=ui.icon("coins")): - "Avg Fare" +::: {.callout-note} +### Multiple filtered tables - @render.text - def fare(): - avg = qc.df()['fare'].mean() - return f"${avg:.2f}" - -# Main content area with visualizations -with ui.layout_columns(): - with ui.card(): - with ui.card_header(): - "Data Table" - - @render.text - def table_title(): - return f" - {qc.title()}" if qc.title() else "" - - @render.data_frame - def data_table(): - return qc.df() - - with ui.card(): - ui.card_header("Survival by Class") - - @render.plot - def survival_by_class(): - df = qc.df() - summary = df.groupby('pclass')['survived'].mean().reset_index() - fig = px.bar( - summary, - x='pclass', - y='survived', - labels={'pclass': 'Class', 'survived': 'Survival Rate'}, - ) - return fig - -with ui.layout_columns(): - with ui.card(): - ui.card_header("Age Distribution") - - @render.plot - def age_dist(): - df = qc.df() - fig = px.histogram(df, x='age', nbins=30) - return fig - - with ui.card(): - ui.card_header("Fare by Class") +We plan to support multiple filtered tables in a future release -- if you're interested in this feature, please upvote [the relevant issue](https://github.com/posit-dev/querychat/issues/6) +::: - @render.plot - def fare_by_class(): - df = qc.df() - fig = px.box(df, x='pclass', y='fare', color='survived') - return fig +Here's an example of the second approach, using two separate QueryChat instances to explore both the `titanic` and `penguins` datasets within the same app: -# Reset button handler -@reactive.effect -@reactive.event(input.reset) -def handle_reset(): - qc.sql("") - qc.title(None) - ui.notification_show("Filters cleared", type="message") +```{.python filename="multiple-datasets.py" code-fold="true" code-summary="Show app code"} +{{< include _examples/multiple-datasets.py >}} ``` -This dashboard demonstrates: -- Natural language filtering through chat -- Multiple coordinated views (cards, table, plots) -- Custom reset button alongside natural language -- Dynamic titles reflecting current state -- Responsive layout that updates together +![](/images/multiple-datasets.png){fig-alt="Screenshot of a querychat app with two datasets: titanic and penguins." class="lightbox shadow rounded mb-3"} + ## See also - [Greet users](greet.qmd) - Create welcoming onboarding experiences - [Provide context](context.qmd) - Help the LLM understand your data better -- [Tools](tools.qmd) - Understand what QueryChat can do under the hood +- [Tools](tools.qmd) - Understand what querychat can do under the hood diff --git a/pkg-py/docs/context.qmd b/pkg-py/docs/context.qmd index 840a233c..2669ccbf 100644 --- a/pkg-py/docs/context.qmd +++ b/pkg-py/docs/context.qmd @@ -2,13 +2,19 @@ title: Provide context --- -To improve the LLM's ability to accurately translate natural language queries into SQL, it often helps to provide relevant metadata. Querychat automatically provides things like column names and data types to the LLM, but you can enhance this further with additional context like [data descriptions](#data-description). You can also provide [custom instructions](#extra-instructions) to add additional behaviors and even supply a fully [custom prompt template](#custom-template), if desired. +querychat automatically gathers information about your table to help the LLM write accurate SQL queries. This includes column names and types, numerical ranges, and categorical value examples.^[All of this information is provided to the LLM as part of the **system prompt** -- a string of text containing instructions and context for the LLM to consider when responding to user queries.] -All of this information is provided to the LLM as part of the **system prompt** -- a string of text containing instructions and context for the LLM to consider when responding to user queries. +Importantly, the LLM never sees the actual data itself -- it doesn't need to in order to write SQL queries for you. It only needs to understand the structure and schema of your data. + +You can get even better results by customizing the system prompt in three ways: + +1. Add a [data description](#data-description) to provide more context about what the data represents +2. Add [custom instructions](#extra-instructions) to guide the LLM's behavior +3. Use a fully [custom prompt template](#custom-template) if you want complete control (useful if you want to be certain the model cannot see any literal values from your data) ## Default prompt -For full visibility into the full system prompt that Querychat generates for the LLM, see the `system_prompt` property. This is useful for debugging and understanding exactly what context the LLM is using: +For full visibility into the full system prompt that querychat generates for the LLM, see the `system_prompt` property. This is useful for debugging and understanding exactly what context the LLM is using: ```python from querychat import QueryChat @@ -32,7 +38,7 @@ By default, the system prompt contains the following components: ## Data description {#data-description} -If your column names are descriptive, Querychat may already work well without additional context. However, if your columns are named `x`, `V1`, `value`, etc., you should provide a data description. Use the `data_description` parameter for this: +If your column names are descriptive, querychat may already work well without additional context. However, if your columns are named `x`, `V1`, `value`, etc., you should provide a data description. Use the `data_description` parameter for this: ```{.python filename="titanic-app.py"} from pathlib import Path @@ -46,7 +52,7 @@ qc = QueryChat( app = qc.app() ``` -Querychat doesn't need this information in any particular format -- just provide what a human would find helpful: +querychat doesn't need this information in any particular format -- just provide what a human would find helpful: ```{.markdown filename="data_description.md"} This dataset contains information about Titanic passengers, collected for predicting survival. @@ -95,4 +101,4 @@ LLMs may not always follow your instructions perfectly. Test extensively when ch ## Custom template {#custom-template} -If you want more control over the system prompt, you can provide a custom prompt template using the `prompt_template` parameter. This is for more advanced users who want to fully customize the LLM's behavior. See the [API reference](reference/QueryChat.qmd#attributes) for details on the available template variables. \ No newline at end of file +If you want more control over the system prompt, you can provide a custom prompt template using the `prompt_template` parameter. This is for more advanced users who want to fully customize the LLM's behavior. See the [API reference](reference/querychat.qmd#attributes) for details on the available template variables. \ No newline at end of file diff --git a/pkg-py/docs/data-sources.qmd b/pkg-py/docs/data-sources.qmd index 5ac97e27..ab4fc285 100644 --- a/pkg-py/docs/data-sources.qmd +++ b/pkg-py/docs/data-sources.qmd @@ -69,7 +69,7 @@ If you're [building an app](build.qmd), note you can read the queried data frame You can also connect `querychat` directly to any database supported by [SQLAlchemy](https://www.sqlalchemy.org/). This includes popular databases like SQLite, DuckDB, PostgreSQL, MySQL, and many more. -Assuming you have a database set up and accessible, you can pass a SQLAlchemy [database URL](https://docs.sqlalchemy.org/en/20/core/engines.html) to `create_engine()`, and then pass the resulting engine to `QueryChat`. Below are some examples for common databases. +Assuming you have a database set up and accessible, you can pass a SQLAlchemy [database URL](https://docs.sqlalchemy.org/en/20/core/engines.html) to `create_engine()`, and then pass the resulting engine to `querychat`. Below are some examples for common databases. ::: {.panel-tabset} diff --git a/pkg-py/docs/images/airbnb.png b/pkg-py/docs/images/airbnb.png new file mode 100644 index 00000000..754249c0 Binary files /dev/null and b/pkg-py/docs/images/airbnb.png differ diff --git a/pkg-py/docs/images/multiple-datasets.png b/pkg-py/docs/images/multiple-datasets.png index df7353fa..bef8a22c 100644 Binary files a/pkg-py/docs/images/multiple-datasets.png and b/pkg-py/docs/images/multiple-datasets.png differ diff --git a/pkg-py/docs/images/plotly-data-view.png b/pkg-py/docs/images/plotly-data-view.png index ef24eb7b..7356834b 100644 Binary files a/pkg-py/docs/images/plotly-data-view.png and b/pkg-py/docs/images/plotly-data-view.png differ diff --git a/pkg-py/docs/images/querychat.png b/pkg-py/docs/images/querychat.png index 78a36725..f7c9114d 100644 Binary files a/pkg-py/docs/images/querychat.png and b/pkg-py/docs/images/querychat.png differ diff --git a/pkg-py/docs/images/quickstart-filter.png b/pkg-py/docs/images/quickstart-filter.png index 1819dbae..dbd3ed33 100644 Binary files a/pkg-py/docs/images/quickstart-filter.png and b/pkg-py/docs/images/quickstart-filter.png differ diff --git a/pkg-py/docs/images/quickstart-summary.png b/pkg-py/docs/images/quickstart-summary.png index 7cb0256d..de73336f 100644 Binary files a/pkg-py/docs/images/quickstart-summary.png and b/pkg-py/docs/images/quickstart-summary.png differ diff --git a/pkg-py/docs/images/quickstart.png b/pkg-py/docs/images/quickstart.png index fd423d0d..d580d47d 100644 Binary files a/pkg-py/docs/images/quickstart.png and b/pkg-py/docs/images/quickstart.png differ diff --git a/pkg-py/docs/images/rich-data-views.png b/pkg-py/docs/images/rich-data-views.png index 09fc05fe..2f36ab3f 100644 Binary files a/pkg-py/docs/images/rich-data-views.png and b/pkg-py/docs/images/rich-data-views.png differ diff --git a/pkg-py/docs/images/sidebot.png b/pkg-py/docs/images/sidebot.png deleted file mode 100644 index 73a67616..00000000 Binary files a/pkg-py/docs/images/sidebot.png and /dev/null differ diff --git a/pkg-py/docs/index.qmd b/pkg-py/docs/index.qmd index 94490ff5..a3f380e6 100644 --- a/pkg-py/docs/index.qmd +++ b/pkg-py/docs/index.qmd @@ -22,8 +22,7 @@ Explore data using natural language queries - -Querychat makes it easy to explore data with natural language through the power of [Shiny](https://shiny.posit.co/py) and large language models (LLMs). Start chatting with your data in just one line of code. Or, with a few more lines, design your own rich user experience around data exploration and analysis through natural language. +querychat facilitates safe and reliable natural language exploration of tabular data, powered by SQL and large language models (LLMs). For users, it offers an intuitive web application where they can quickly ask questions of their data and receive verifiable data-driven answers. As a developer, you can access the chat UI component, generated SQL queries, and filtered data to build custom applications that integrate natural language querying into your data workflows. ## Installation @@ -43,18 +42,21 @@ The quickest way to start chatting is to call the `.app()` method, which returns from querychat import QueryChat from querychat.data import titanic -qc = QueryChat(titanic(), "titanic", client="openai/gpt-4.1") +qc = QueryChat(titanic(), "titanic") app = qc.app() ``` -With the above code saved to `titanic-app.py` and an API key set[^api-key], you can [run the app](https://shiny.posit.co/py/get-started/create-run.html#run-your-shiny-application) from a terminal (or [VSCode](https://marketplace.visualstudio.com/items?itemName=Posit.shiny)): +With an API key set[^api-key], you can run that code in a Python console and then call `app.run()` to jump into a chat. Or you can save the code to `titanic-app.py` and [run the app](https://shiny.posit.co/py/get-started/create-run.html#run-your-shiny-application) from a terminal (or Positron, or [VS Code](https://marketplace.visualstudio.com/items?itemName=Posit.shiny)): ```bash -export OPENAI_API_KEY="your_api_key_here" +# Optionally, change the default model: +export QUERYCHAT_CLIENT="anthropic/claude-sonnet-4-5" +# And provide appropriate credentials for your chosen model provider +export ANTHROPIC_API_KEY="your_api_key_here" shiny run --reload titanic-app.py ``` -[^api-key]: By default, Querychat uses OpenAI to power the chat experience. So, for this example to work, you'll need [an OpenAI API key](https://platform.openai.com/). See the [Models](models.qmd) page for details on how to set up credentials for other model providers. +[^api-key]: By default, querychat uses OpenAI to power the chat experience. So, for this example to work, you'll need [an OpenAI API key](https://platform.openai.com/). See the [Models](models.qmd) page for details on how to set up credentials for other model providers. Once running, you'll notice 3 main views: @@ -68,38 +70,44 @@ Suppose we pick a suggestion like "Show me passengers who survived". Since this ![](/images/quickstart-filter.png){fig-alt="Screenshot of the querychat's app with the titanic dataset filtered to passengers who survived." class="lightbox shadow rounded mb-3"} -Querychat can also handle more general questions about the data that require calculations and aggregations. For example, we can ask "What is the average age of passengers who survived?". In this case, querychat will generate/execute the SQL query to perform the relevant calculation, and return the result in the chat: +querychat can also handle more general questions about the data that require calculations and aggregations. For example, we can ask "What is the average age of passengers who survived?". The LLM will generate the SQL query to perform the calculation, querychat will execute it, and return the result in the chat: ![](/images/quickstart-summary.png){fig-alt="Screenshot of the querychat's app with a summary statistic inlined in the chat." class="lightbox shadow rounded mb-3"} -As you'll learn later in [Build an app](build.qmd), you can also access the SQL query and filtered/sorted data frame programmatically for use elsewhere in your app. This makes it rather seemless to have natural language interaction with your data alongside other visualizations and analyses. +## Custom apps + +querychat is designed to be highly extensible -- it provides programmatic access to the chat interface, the filtered/sorted data frame, SQL queries, and more. +This makes it easy to build custom web apps that leverage natural language interaction with your data. +For example, [here](https://github.com/posit-conf-2025/llm/blob/main/_solutions/25_querychat/25_querychat_02-end-app.R)'s a bespoke app for exploring Airbnb listings in Ashville, NC: -Before we build though, let's take a moment to better understand how querychat works under the hood, and whether it's right for you. +![](/images/airbnb.png){fig-alt="A custom app for exploring Airbnb listings, powered by querychat." class="lightbox shadow rounded mb-3"} +To learn more, see [Build an app](build.qmd) for a step-by-step guide. ## How it works -Querychat leverages LLMs incredible capability to translate natural language into SQL queries. Frontier models are shockingly good at this task, but even the best models still need to know the overall data structure to perform well. For this reason, querychat supplies a [system prompt](context.qmd) with the schema of the data (i.e., column names, types, ranges, etc), but never the raw data itself. +querychat uses LLMs to translate natural language into SQL queries. Models of all sizes, from small ones you can run locally to large frontier models from major AI providers, are remarkably effective at this task. But even the best models need to understand your data's overall structure to perform well. -When the LLM generates a SQL query, querychat executes it against a SQL database (DuckDB[^duckdb] by default) to get results in a **safe**, **reliable**, and **verifiable** manner. In short, this execution is **safe** since only `SELECT` statements are allowed, **reliable** since the database engine handles all calculations, and **verifiable** since the user can always see the SQL query that was run. This makes querychat a trustworthy tool for data exploration, as every action taken by the LLM is transparent and independently reproducible. +To address this, querychat includes schema metadata -- column names, types, ranges, categorical values -- in the LLM's [system prompt](context.qmd). Importantly, querychat **does not** send raw data to the LLM; it shares only enough structural information for the model to generate accurate queries. When the LLM produces a query, querychat executes it in a SQL database (DuckDB[^duckdb], by default) to obtain precise results. +This design makes querychat reliable, safe, and reproducible: -::: callout-important -### Data privacy +- **Reliable**: query results come from a real database, not LLM-generated summaries -- so outputs are precise, verifiable, and less vulnerable to hallucination[^hallucination]. +- **Safe**: querychat's tools are read-only by design, avoiding destructive actions on your data.[^permissions] +- **Reproducible**: generated SQL can be exported and re-run in other environments, so your analysis isn't locked into a single tool. -See the [Provide context](context.qmd) and [Tools](tools.qmd) articles to learn more about what information is provided to the LLM and what it's capable of doing with code execution. -:::: +::: callout-important +**Data privacy** -[^duckdb]: Duckdb is extremely fast and has a surprising number of [statistical functions](https://duckdb.org/docs/stable/sql/functions/aggregates.html#statistical-aggregates). +See the [Provide context](context.qmd) and [Tools](tools.qmd) articles for more details on exactly what information is provided to the LLM and how customize it. +::: -### Bespoke interfaces +[^duckdb]: DuckDB is extremely fast and has a surprising number of [statistical functions](https://duckdb.org/docs/stable/sql/functions/aggregates.html#statistical-aggregates). -While the quickstart app is a great way to get started, querychat is designed to be highly extensible. -You can not only customize the underlying model and data source, but also build fully custom Shiny apps around the core chat functionality. +[^hallucination]: The [query tool](tools.qmd) gives query results to the model for context and interpretation. Thus, there is *some* potential that the model to mis-interpret those results. -For a motivating example, consider the following ([sidebot](https://shiny.posit.co/py/docs/genai-inspiration.html#sidebot)) app that leverages querychat's tooling to create reactive summaries and visualizations based on the user's natural language queries: +[^permissions]: To fully guarantee no destructive actions on your production database, ensure querychat's database permissions are read-only. -![](/images/sidebot.png){fig-alt="Screenshot of sidebot, a custom shiny app built with querychat." class="lightbox shadow rounded mb-3"} ## Next steps diff --git a/pkg-py/docs/tools.qmd b/pkg-py/docs/tools.qmd index c4f68c53..e438e1bd 100644 --- a/pkg-py/docs/tools.qmd +++ b/pkg-py/docs/tools.qmd @@ -2,20 +2,18 @@ title: Tools --- -QueryChat combines [tool calling](https://posit-dev.github.io/chatlas/get-started/tools.html) with [reactivity](https://shiny.posit.co/py/docs/reactive-foundations.html) to not only execute SQL, but also reactively update dependent data views. Understanding how these tools work will help you better understand what QueryChat is capable of and how to customize/extend to its behavior. +querychat combines [tool calling](https://posit-dev.github.io/chatlas/get-started/tools.html) with [reactivity](https://shiny.posit.co/py/docs/reactive-foundations.html) to not only execute SQL, but also reactively update dependent data views. Understanding how these tools work will help you better understand what querychat is capable of and how to customize/extend to its behavior. -One important thing to understand generally about Querychat's tools is they are Python functions, and that execution happens on _your machine_, not on the LLM provider's side. In other words, the SQL queries generated by the LLM are executed locally in the Shiny app process, and only the results (if any) are sent back to the LLM. +One important thing to understand generally about querychat's tools is they are Python functions, and that execution happens on _your machine_, not on the LLM provider's side. In other words, the SQL queries generated by the LLM are executed locally in the Python process running the app. -Querychat provides the LLM access to three tools, serving two primary purposes: +querychat provides the LLM access two tool groups: 1. **Data updating** - Filter and sort data (without sending results to the LLM). 2. **Data analysis** - Calculate summaries and return results for interpretation by the LLM. ## Data updating -When a user asks to "Show me..." or "Filter to..." or "Sort by...", the LLM requests a call to the `update_dashboard` tool with an appropriate SQL query as input. An important constraint is that the query must return all original schema columns (typically using `SELECT *`). When called, Querychat will both set a reactive value holding [the current SQL query](build.qmd#sql-query) and execute the query to get the result. - -The result of query then used to set a reactive value holding the [filtered/sorted data frame](build.qmd#filtered-data). Thanks to reactivity, this will automatically update any views depending on this data frame, such as the data table displayed in the UI. +When a user asks to "Show me..." or "Filter to..." or "Sort by...", the LLM requests a call to the `update_dashboard` tool with an appropriate SQL query as input. An important constraint is that the query must return all original schema columns (typically using `SELECT *`). When called, querychat will both set a reactive value holding [the current SQL query](build.qmd#sql-query) and execute the query to get the result. The result of query then used to set a reactive value holding the [filtered/sorted data frame](build.qmd#filtered-data). Thanks to reactivity, this will automatically update any views depending on this data frame, such as the data table displayed in the UI. This tool also takes a `title` parameter, which is a short description of the filter/sort operation (e.g., "First-class passengers"). This, also, is made available through [a reactive value](build.qmd#title) for display somewhere in your app. diff --git a/pkg-r/DESCRIPTION b/pkg-r/DESCRIPTION index 8c368369..77b7bc7c 100644 --- a/pkg-r/DESCRIPTION +++ b/pkg-r/DESCRIPTION @@ -39,11 +39,15 @@ Imports: Suggests: bsicons, DT, + knitr, palmerpenguins, + rmarkdown, RSQLite, shinytest2, testthat (>= 3.0.0), withr +VignetteBuilder: + knitr Remotes: posit-dev/shinychat/pkg-r Config/testthat/edition: 3 diff --git a/pkg-r/README.md b/pkg-r/README.md index 49f3ca4d..0c2ce9c7 100644 --- a/pkg-r/README.md +++ b/pkg-r/README.md @@ -1,257 +1,97 @@ -# querychat: Chat with Shiny apps (R) querychat website +# querychat querychat website -Imagine typing questions like these directly into your Shiny dashboard, and seeing the results in realtime: + +[![R-CMD-check](https://github.com/posit-dev/querychat/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/posit-dev/querychat/actions/workflows/R-CMD-check.yaml) +[![CRAN status](https://www.r-pkg.org/badges/version/querychat)](https://CRAN.R-project.org/package=querychat) + -* "Show only penguins that are not species Gentoo and have a bill length greater than 50mm." -* "Show only blue states with an incidence rate greater than 100 per 100,000 people." -* "What is the average mpg of cars with 6 cylinders?" - -querychat is a drop-in component for Shiny that allows users to query a data frame using natural language. The results are available as a reactive data frame, so they can be easily used from Shiny outputs, reactive expressions, downloads, etc. - -**This is not as terrible an idea as you might think!** We need to be very careful when bringing LLMs into data analysis, as we all know that they are prone to hallucinations and other classes of errors. querychat is designed to excel in reliability, transparency, and reproducibility by using this one technique: denying it raw access to the data, and forcing it to write SQL queries instead. See the section below on ["How it works"](#how-it-works) for more. +querychat facilitates safe and reliable natural language exploration of tabular data, powered by SQL and large language models (LLMs). For users, it offers an intuitive web application where they can quickly ask questions of their data and receive verifiable data-driven answers. As a developer, you can access the chat UI component, generated SQL queries, and filtered data to build custom applications that integrate natural language querying into your data workflows. ## Installation -```r -pak::pak("posit-dev/querychat/pkg-r") -``` - -## How to use - -First, you'll need an OpenAI API key. See the [instructions from Ellmer](https://ellmer.tidyverse.org/reference/chat_openai.html). (Or use a different LLM provider, see below.) - -### Quick Start - -The fastest way to get started is with the built-in app: +Install the stable release from CRAN: ```r -library(querychat) - -qc <- QueryChat$new(mtcars) -qc$app() +install.packages("querychat") ``` -This launches a complete Shiny app with a chat interface, SQL query display, and data table. Perfect for quick exploration and prototyping! - -### Custom Shiny Apps - -For more control, integrate querychat into your own Shiny app: +Or the development version from GitHub: ```r -library(shiny) -library(bslib) -library(querychat) - -# 1. Create a QueryChat instance with your data -qc <- QueryChat$new(mtcars) - -ui <- page_sidebar( - # 2. Use qc$sidebar() in a bslib::page_sidebar. - # Alternatively, use qc$ui() elsewhere if you don't want your - # chat interface to live in a sidebar. - sidebar = qc$sidebar(), - DT::DTOutput("dt") -) - -server <- function(input, output, session) { - # 3. Initialize the QueryChat server (returns session-specific reactive values) - qc_vals <- qc$server() - - output$dt <- DT::renderDT({ - # 4. Use the filtered/sorted data frame anywhere you wish, via qc_vals$df() - DT::datatable(qc_vals$df()) - }) -} - -shinyApp(ui, server) +# install.packages("pak") +pak::pak("posit-dev/querychat/pkg-r") ``` -## Using Database Sources +## Quick start -In addition to data frames, querychat can connect to external databases via DBI: +The quickest way to start chatting with your data is via `querychat_app()`, which provides a fully polished Shiny app. It requires a [data source](articles/data-sources.html) (e.g., data.frame, database connection, etc.) and optionally other parameters (e.g. the LLM `client` [model](articles/models.html)). ```r -library(shiny) -library(bslib) library(querychat) -library(DBI) -library(RSQLite) - -# 1. Connect to a database -conn <- DBI::dbConnect(RSQLite::SQLite(), "path/to/database.db") - -# 2. Create a QueryChat instance with the database connection -qc <- QueryChat$new(conn, "table_name") - -# 3. Use it in your Shiny app as shown above -qc$app() -``` - -## How it works - -### Powered by LLMs - -querychat's natural language chat experience is powered by LLMs. You may use any model that [ellmer](https://ellmer.tidyverse.org) supports that has the ability to do tool calls, but we currently recommend (as of March 2025): - -* GPT-4o -* Claude 3.5 Sonnet -* Claude 3.7 Sonnet - -In our testing, we've found that those models strike a good balance between accuracy and latency. Smaller models like GPT-4o-mini are fine for simple queries but make surprising mistakes with moderately complex ones; and reasoning models like o3-mini slow down responses without providing meaningfully better results. - -The small open source models (8B and below) we've tested have fared extremely poorly. Sorry. 🤷 - -### Powered by SQL +library(palmerpenguins) -querychat does not have direct access to the raw data; it can _only_ read or filter the data by writing SQL `SELECT` statements. This is crucial for ensuring relability, transparency, and reproducibility: - -- **Reliability:** Today's LLMs are excellent at writing SQL, but bad at direct calculation. -- **Transparency:** querychat always displays the SQL to the user, so it can be vetted instead of blindly trusted. -- **Reproducibility:** The SQL query can be easily copied and reused. - -Currently, querychat uses DuckDB for its SQL engine when working with data frames. For database sources, it uses the native SQL dialect of the connected database. DuckDB is extremely fast and has a surprising number of [statistical functions](https://duckdb.org/docs/stable/sql/functions/aggregates.html#statistical-aggregates). - -## Customizing querychat - -### Provide a greeting (recommended) - -When the querychat UI first appears, you will usually want it to greet the user with some basic instructions. By default, these instructions are auto-generated every time a user arrives; this is potentially slow, wasteful, and unpredictable. Instead, you should create a file called `greeting.md`, and when creating your `QueryChat` instance, pass `greeting = "greeting.md"` (or use `readLines()` to read the file as a string). - -You can provide suggestions to the user by using the ` ` tag. - -For example: - -```markdown -* **Filter and sort the data:** - * Show only survivors - * Filter to first class passengers under 30 - * Sort by fare from highest to lowest - -* **Answer questions about the data:** - * What was the survival rate by gender? - * What's the average age of children who survived? - * How many passengers were traveling alone? -``` - -These suggestions appear in the greeting and automatically populate the chat text box when clicked. -This gives the user a few ideas to explore on their own. - -You can use the `$generate_greeting()` method to help create a greeting: - -```r -qc <- QueryChat$new(mtcars) -greeting <- qc$generate_greeting(echo = "text") - -# Save it for reuse -writeLines(greeting, "greeting.md") - -# Then use it in your app -qc <- QueryChat$new(mtcars, greeting = "greeting.md") +querychat_app(penguins, client = "openai/gpt-4.1") ``` -Alternatively, you can completely suppress the greeting by passing `greeting = ""`. +Once running (which requires an API key[^api-key]), you'll notice 3 main views: -### Augment the system prompt (recommended) +[^api-key]: By default, querychat uses OpenAI to power the chat experience. So, for this example to work, you'll need [an OpenAI API key](https://platform.openai.com/). See the [Models](articles/models.html) article for details on how to set up credentials for other model providers. -In LLM parlance, the _system prompt_ is the set of instructions and specific knowledge you want the model to use during a conversation. querychat automatically creates a system prompt which is comprised of: +1. A sidebar chat with suggestions on where to start exploring. +2. A data table that updates to reflect filtering and sorting queries. +3. The SQL query behind the data table, for transparency and reproducibility. -1. The basic set of behaviors the LLM must follow in order for querychat to work properly. (See `inst/prompt/prompt.md` if you're curious what this looks like.) -2. The SQL schema of the data source you provided. -3. (Optional) Any additional description of the data you choose to provide. -4. (Optional) Any additional instructions you want to use to guide querychat's behavior. +![](man/figures/quickstart.png){alt="Screenshot of querychat's app with the penguins dataset." class="rounded shadow"} -#### Data description +Suppose we pick a suggestion like "Show me Adelie penguins". Since this is a filtering operation, both the data table and SQL query update accordingly. -If you give querychat your dataset and nothing else, it will provide the LLM with the basic schema of your data: +![](man/figures/quickstart-filter.png){alt="Screenshot of the querychat's app with the penguins dataset filtered." class="rounded shadow"} -- Column names -- SQL data type (integer, float, boolean, datetime, text) -- For text columns with less than 10 unique values, we assume they are categorical variables and include the list of values -- For integer and float columns, we include the range +querychat can also handle more general questions about the data that require calculations and aggregations. For example, we can ask "What is the average bill length by species?". The LLM will generate the SQL query to perform the calculation, querychat will execute it, and return the result in the chat: -And that's all the LLM will know about your data. -The actual data does not get passed into the LLM. -We calculate these values before we pass the schema information into the LLM. +![](man/figures/quickstart-summary.png){alt="Screenshot of the querychat's app with a summary statistic inlined in the chat." class="rounded shadow"} -If the column names are usefully descriptive, it may be able to make a surprising amount of sense out of the data. But if your data frame's columns are `x`, `V1`, `value`, etc., then the model will need to be given more background info--just like a human would. +## Custom apps -To provide this information, use the `data_description` argument. For example, the `mtcars` data frame used in the example above has pretty minimal column names. You might create a `data_description.md` like this: +querychat is designed to be highly extensible -- it provides programmatic access to the chat interface, the filtered/sorted data frame, SQL queries, and more. +This makes it easy to build custom web apps that leverage natural language interaction with your data. +For example, [here](https://github.com/posit-conf-2025/llm/blob/main/_solutions/25_querychat/25_querychat_02-end-app.R)'s a bespoke app for exploring Airbnb listings in Ashville, NC: -```markdown -The data was extracted from the 1974 Motor Trend US magazine, and -comprises fuel consumption and 10 aspects of automobile design and -performance for 32 automobiles (1973–74 models). +![](man/figures/airbnb.png){alt="A custom app for exploring Airbnb listings, powered by querychat." class="shadow rounded mb-3"} -- mpg: Miles/(US) gallon -- cyl: Number of cylinders -- disp: Displacement (cu.in.) -- hp: Gross horsepower -- drat: Rear axle ratio -- wt: Weight (1000 lbs) -- qsec: 1/4 mile time -- vs: Engine (0 = V-shaped, 1 = straight) -- am: Transmission (0 = automatic, 1 = manual) -- gear: Number of forward gears -- carb: Number of carburetors -``` - -which you can then pass via: - -```r -qc <- QueryChat$new( - mtcars, - data_description = "data_description.md" -) -``` - -querychat doesn't need this information in any particular format; just put whatever information, in whatever format, you think a human would find helpful. +To learn more, see [Build an app](articles/build.html) for a step-by-step guide. -#### Additional instructions - -You can add additional instructions of your own to the end of the system prompt, by passing `extra_instructions` to `QueryChat$new()`. +## How it works -```r -qc <- QueryChat$new( - mtcars, - extra_instructions = c( - "You're speaking to a British audience--please use appropriate spelling conventions.", - "Use lots of emojis! 😃 Emojis everywhere, 🌍 emojis forever. ♾️", - "Stay on topic, only talk about the data dashboard and refuse to answer other questions." - ) -) -``` +querychat uses LLMs to translate natural language into SQL queries. Models of all sizes, from small ones you can run locally to large frontier models from major AI providers, are remarkably effective at this task. But even the best models need to understand your data's overall structure to perform well. -You can also put these instructions in a separate file and pass the file path, as we did for `data_description` above. +To address this, querychat includes schema metadata -- column names, types, ranges, categorical values -- in the LLM's [system prompt](articles/context.html). Importantly, querychat **does not** send raw data to the LLM; it shares only enough structural information for the model to generate accurate queries. When the LLM produces a query, querychat executes it in a SQL database (DuckDB[^duckdb], by default) to obtain precise results. -**Warning:** It is not 100% guaranteed that the LLM will always—or in many cases, ever—obey your instructions, and it can be difficult to predict which instructions will be a problem. So be sure to test extensively each time you change your instructions, and especially, if you change the model you use. +This design makes querychat reliable, safe, and reproducible: -### Use a different LLM provider +- **Reliable**: query results come from a real database, not LLM-generated summaries -- so outputs are precise, verifiable, and less vulnerable to hallucination[^hallucination]. +- **Safe**: querychat's tools are read-only by design, avoiding destructive actions on your data.[^permissions] +- **Reproducible**: generated SQL can be exported and re-run in other environments, so your analysis isn't locked into a single tool. -By default, querychat uses OpenAI with the default model chosen by `ellmer::chat_openai()`. If you want to use a different model, you can provide an ellmer chat object to the `client` argument of `QueryChat$new()`. +::: {.alert .alert-warning} +**Data privacy** -```r -library(ellmer) +See the [Provide context](articles/context.html) and [Tools](articles/tools.html) articles for more details on exactly what information is provided to the LLM and how customize it. +::: -qc <- QueryChat$new( - mtcars, - client = ellmer::chat_anthropic(model = "claude-3-7-sonnet-latest") -) -``` +[^duckdb]: DuckDB is extremely fast and has a surprising number of [statistical functions](https://duckdb.org/docs/stable/sql/functions/aggregates.html#statistical-aggregates). -This would use Claude 3.7 Sonnet instead, which would require you to provide an API key. -See the [instructions from Ellmer](https://ellmer.tidyverse.org/reference/chat_anthropic.html) for more information on how to authenticate with different providers. +[^hallucination]: The [query tool](articles/tools.html) gives query results to the model for context and interpretation. Thus, there is *some* potential that the model to mis-interpret those results. -Alternatively, you can use a provider-model string, which will be passed to `ellmer::chat()`: +[^permissions]: To fully guarantee no destructive actions on your production database, ensure querychat's database permissions are read-only. -```r -qc <- QueryChat$new( - mtcars, - client = "anthropic/claude-3-7-sonnet-latest" -) -``` +## Next steps -Or you can set the `querychat.client` R option to a chat object or provider-model string, which will be used as the default client for all querychat apps in your session: +From here, you might want to learn more about: -```r -options(querychat.client = "anthropic/claude-3-7-sonnet-latest") -``` +- [Models](articles/models.html): customize the LLM behind querychat. +- [Data sources](articles/data-sources.html): different data sources you can use with querychat. +- [Provide context](articles/context.html): provide the LLM with the context it needs to work well. +- [Build an app](articles/build.html): design a custom Shiny app around querychat. +- [Greet users](articles/greet.html): create welcoming onboarding experiences. +- [Tools](articles/tools.html): understand what querychat can do under the hood. diff --git a/pkg-r/man/figures/airbnb.png b/pkg-r/man/figures/airbnb.png new file mode 100644 index 00000000..754249c0 Binary files /dev/null and b/pkg-r/man/figures/airbnb.png differ diff --git a/pkg-r/man/figures/logo.png b/pkg-r/man/figures/logo.png index 8b4b15cb..1b7368cf 100644 Binary files a/pkg-r/man/figures/logo.png and b/pkg-r/man/figures/logo.png differ diff --git a/pkg-r/man/figures/quickstart-filter.png b/pkg-r/man/figures/quickstart-filter.png new file mode 100644 index 00000000..82700de9 Binary files /dev/null and b/pkg-r/man/figures/quickstart-filter.png differ diff --git a/pkg-r/man/figures/quickstart-summary.png b/pkg-r/man/figures/quickstart-summary.png new file mode 100644 index 00000000..a064e77f Binary files /dev/null and b/pkg-r/man/figures/quickstart-summary.png differ diff --git a/pkg-r/man/figures/quickstart.png b/pkg-r/man/figures/quickstart.png new file mode 100644 index 00000000..1d122dc7 Binary files /dev/null and b/pkg-r/man/figures/quickstart.png differ diff --git a/pkg-r/man/querychat-package.Rd b/pkg-r/man/querychat-package.Rd index 63598eda..7c10e48a 100644 --- a/pkg-r/man/querychat-package.Rd +++ b/pkg-r/man/querychat-package.Rd @@ -75,11 +75,19 @@ Useful links: \itemize{ \item \url{https://posit-dev.github.io/querychat/pkg-r} \item \url{https://posit-dev.github.io/querychat} + \item \url{https://github.com/posit-dev/querychat} + \item Report bugs at \url{https://github.com/posit-dev/querychat/issues} } } \author{ -\strong{Maintainer}: Joe Cheng \email{joe@posit.co} +\strong{Maintainer}: Garrick Aden-Buie \email{garrick@posit.co} (\href{https://orcid.org/0000-0002-7111-0077}{ORCID}) + +Authors: +\itemize{ + \item Joe Cheng \email{joe@posit.co} [conceptor] + \item Carson Sievert \email{carson@posit.co} (\href{https://orcid.org/0000-0002-4958-2844}{ORCID}) +} Other contributors: \itemize{ diff --git a/pkg-r/pkgdown/_pkgdown.yml b/pkg-r/pkgdown/_pkgdown.yml index 10da24aa..94cc6cc0 100644 --- a/pkg-r/pkgdown/_pkgdown.yml +++ b/pkg-r/pkgdown/_pkgdown.yml @@ -31,19 +31,39 @@ template: navbar: structure: - left: [get-started, articles, reference, news] + left: [articles, reference, news] right: [search, github, lightswitch] components: - home: ~ + articles: + text: Articles + menu: + - text: Models + href: articles/models.html + - text: Data Sources + href: articles/data-sources.html + - text: Provide context + href: articles/context.html + - text: Build an app + href: articles/build.html + - text: Greet users + href: articles/greet.html + - text: Tools + href: articles/tools.html reference: -- title: Chat interfaces +- title: Convenience functions + contents: + - querychat_app + - querychat +- title: Build a custom app contents: - querychat - QueryChat - -- title: Data Sources +- title: Data sources contents: - DataSource - DataFrameSource - DBISource +- title: Package + contents: + - querychat diff --git a/pkg-r/pkgdown/favicon/apple-touch-icon.png b/pkg-r/pkgdown/favicon/apple-touch-icon.png index 2625265d..a0542180 100644 Binary files a/pkg-r/pkgdown/favicon/apple-touch-icon.png and b/pkg-r/pkgdown/favicon/apple-touch-icon.png differ diff --git a/pkg-r/pkgdown/favicon/favicon-96x96.png b/pkg-r/pkgdown/favicon/favicon-96x96.png index 7c36ad63..a03ebb54 100644 Binary files a/pkg-r/pkgdown/favicon/favicon-96x96.png and b/pkg-r/pkgdown/favicon/favicon-96x96.png differ diff --git a/pkg-r/pkgdown/favicon/web-app-manifest-192x192.png b/pkg-r/pkgdown/favicon/web-app-manifest-192x192.png index 0f7af116..c2226606 100644 Binary files a/pkg-r/pkgdown/favicon/web-app-manifest-192x192.png and b/pkg-r/pkgdown/favicon/web-app-manifest-192x192.png differ diff --git a/pkg-r/pkgdown/favicon/web-app-manifest-512x512.png b/pkg-r/pkgdown/favicon/web-app-manifest-512x512.png index 7d015b39..4335a3d8 100644 Binary files a/pkg-r/pkgdown/favicon/web-app-manifest-512x512.png and b/pkg-r/pkgdown/favicon/web-app-manifest-512x512.png differ diff --git a/pkg-r/vignettes/build.Rmd b/pkg-r/vignettes/build.Rmd new file mode 100644 index 00000000..199e6b88 --- /dev/null +++ b/pkg-r/vignettes/build.Rmd @@ -0,0 +1,443 @@ +--- +title: "Build an App" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Build an App} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = FALSE +) +``` + +While `querychat_app()` provides a quick way to start exploring data, building bespoke Shiny apps with QueryChat unlocks the full power of integrating natural language data exploration with custom visualizations, layouts, and interactivity. This guide shows you how to integrate QueryChat into your own Shiny applications and leverage its reactive data outputs to create rich, interactive dashboards. + +## Starter template + +Integrating QueryChat into a Shiny app requires just three steps: + +1. Initialize a `QueryChat` instance with your data +2. Add the QueryChat UI component (either `$sidebar()` or `$ui()`) +3. Use reactive values like `$df()`, `$sql()`, and `$title()` to build outputs that respond to user queries + +Here's a starter template demonstrating these steps: + +```{r} +library(shiny) +library(bslib) +library(querychat) +library(DT) +library(palmerpenguins) + +# Step 1: Initialize QueryChat +qc <- QueryChat$new(penguins) + +# Step 2: Add UI component +ui <- page_sidebar( + sidebar = qc$sidebar(), + card( + card_header("Data Table"), + dataTableOutput("table") + ), + card( + fill = FALSE, + card_header("SQL Query"), + verbatimTextOutput("sql") + ) +) + +# Step 3: Use reactive values in server +server <- function(input, output, session) { + qc_vals <- qc$server() + + output$table <- renderDataTable({ + datatable(qc_vals$df(), fillContainer = TRUE) + }) + + output$sql <- renderText({ + qc_vals$sql() %||% "SELECT * FROM penguins" + }) +} + +shinyApp(ui, server) +``` + +::: {.alert .alert-info} +You'll need to call the `qc$server()` method within your server function to set up QueryChat's reactive behavior, and capture its return value to access reactive data. +::: + +## Reactives + +There are three main reactive values provided by QueryChat for use in your app: + +### Filtered data {#filtered-data} + +The `$df()` method returns the current filtered and/or sorted data frame. This updates whenever the user prompts a filtering or sorting operation through the chat interface (see [Data updating](tools.html#data-updating) for details). + +```{r} +qc_vals <- qc$server() + +output$table <- renderDataTable({ + qc_vals$df() # Returns filtered/sorted data +}) +``` + +You can use `$df()` to power any output in your app - visualizations, summary statistics, data tables, and more. When a user asks to "show only Adelie penguins" or "sort by body mass", `$df()` automatically updates, and any outputs that depend on it will re-render. + +### SQL query {#sql-query} + +The `$sql()` method returns the current SQL query as a string. This is useful for displaying the query to users for transparency and reproducibility: + +```{r} +qc_vals <- qc$server() + +output$current_query <- renderText({ + qc_vals$sql() %||% "SELECT * FROM penguins" +}) +``` + +You can also use `$sql()` as a setter to programmatically update the query (see [Programmatic filtering](#programmatic-filtering) below). + +### Title {#title} + +The `$title()` method returns a short description of the current filter, provided by the LLM when it generates a query. For example, if a user asks to "show Adelie penguins", the title might be "Adelie penguins". + +```{r} +qc_vals <- qc$server() + +output$card_title <- renderText({ + qc_vals$title() %||% "All Data" +}) +``` + +Returns `NULL` when no filter is active. You can also use `$title()` as a setter to update the title programmatically. + +## Custom UI + +In the starter template above, we used the `$sidebar()` method for a simple sidebar layout. In some cases, you might want to place the chat UI somewhere else in your app layout, or just more fully customize what goes in the sidebar. The `$ui()` method is designed for this -- it returns the chat component without additional layout wrappers. + +For example, you might want to create some additional controls to [reset filters](#programmatic-filtering) alongside the chat UI: + +```{r} +library(querychat) +library(palmerpenguins) + +qc <- QueryChat$new(penguins) + +ui <- page_sidebar( + sidebar = sidebar( + qc$ui(), # Chat component + actionButton("reset", "Reset Filters", class = "w-100"), + fillable = TRUE, + width = 300 + ), + # Main content here +) +``` + +::: {.alert .alert-info} +**Customizing chat UIs** + +See `{shinychat}`'s [docs](https://posit-dev.github.io/shinychat/r/index.html) to learn more about customizing the chat UI component returned by `qc$ui()`. +::: + +## Data views + +Thanks to Shiny's support for interactive visualizations with packages like [plotly](https://plotly.com/r/), it's straightforward to create rich data views that depend on QueryChat data. Here's an example of an app showing both the filtered data and a bar chart depending on that same data: + +
+ app.R + +```{r} +library(shiny) +library(bslib) +library(querychat) +library(DT) +library(plotly) +library(palmerpenguins) + +qc <- QueryChat$new(penguins, client = "claude/claude-sonnet-4-5") + +ui <- page_sidebar( + sidebar = qc$sidebar(), + card( + card_header("Data Table"), + dataTableOutput("table") + ), + card( + card_header("Body Mass by Species"), + plotlyOutput("mass_plot") + ) +) + +server <- function(input, output, session) { + qc_vals <- qc$server() + + output$table <- renderDataTable({ + datatable(qc_vals$df(), fillContainer = TRUE) + }) + + output$mass_plot <- renderPlotly({ + ggplot(qc_vals$df(), aes(x = body_mass_g, fill = species)) + + geom_density(alpha = 0.4) + + theme_minimal() + }) +} + +shinyApp(ui, server) +``` + +
+ +![](images/plotly-data-view.png){alt="Screenshot of a querychat app showing both a data table and a density plot of body mass by species" class="shadow rounded mb-3"} + +A more useful, but slightly more involved example like the one below might incorporate other Shiny components like value boxes to summarize key statistics about the filtered data. + +
+ app.R + +```{r} +library(shiny) +library(bslib) +library(DT) +library(plotly) +library(palmerpenguins) +library(dplyr) +library(bsicons) +library(querychat) + + +qc <- QueryChat$new(penguins) + +ui <- page_sidebar( + title = "Palmer Penguins Analysis", + class = "bslib-page-dashboard", + sidebar = qc$sidebar(), + layout_column_wrap( + width = 1 / 3, + fill = FALSE, + value_box( + title = "Total Penguins", + value = textOutput("count"), + showcase = bs_icon("piggy-bank"), + theme = "primary" + ), + value_box( + title = "Species Count", + value = textOutput("species_count"), + showcase = bs_icon("bookmark-star"), + theme = "success" + ), + value_box( + title = "Avg Body Mass", + value = textOutput("avg_mass"), + showcase = bs_icon("speedometer"), + theme = "info" + ) + ), + layout_columns( + card( + card_header(textOutput("table_title")), + DT::dataTableOutput("data_table") + ), + card( + card_header("Species Distribution"), + plotlyOutput("species_plot") + ) + ), + layout_columns( + card( + card_header("Bill Length Distribution"), + plotlyOutput("bill_length_dist") + ), + card( + card_header("Body Mass by Species"), + plotlyOutput("mass_by_species") + ) + ) +) + +server <- function(input, output, session) { + qc_vals <- qc$server() + + output$count <- renderText({ + nrow(qc_vals$df()) + }) + + output$species_count <- renderText({ + length(unique(qc_vals$df()$species)) + }) + + output$avg_mass <- renderText({ + avg <- mean(qc_vals$df()$body_mass_g, na.rm = TRUE) + paste0(round(avg, 0), "g") + }) + + output$table_title <- renderText({ + qc_vals$title() %||% "All Penguins" + }) + + output$data_table <- DT::renderDataTable({ + DT::datatable( + qc_vals$df(), + fillContainer = TRUE, + options = list( + scrollX = TRUE, + pageLength = 10, + dom = "ti" + ) + ) + }) + + output$species_plot <- renderPlotly({ + plot_ly( + count(qc_vals$df(), species), + x = ~species, + y = ~n, + type = "bar", + marker = list(color = c("#1f77b4", "#ff7f0e", "#2ca02c")) + ) + }) + + output$bill_length_dist <- renderPlotly({ + plot_ly( + qc_vals$df(), + x = ~bill_length_mm, + type = "histogram", + nbinsx = 30, + marker = list(color = "#1f77b4", opacity = 0.7) + ) + }) + + output$mass_by_species <- renderPlotly({ + plot_ly( + qc_vals$df(), + x = ~species, + y = ~body_mass_g, + color = ~sex, + type = "box", + colors = c("#1f77b4", "#ff7f0e") + ) + }) +} + +shinyApp(ui = ui, server = server) +``` + +
+ + +## Programmatic updates {#programmatic-filtering} + +QueryChat's reactive state can be updated programmatically. For example, you might want to add a "Reset Filters" button that clears any active filters and returns the data table to its original state. You can do this by setting both the SQL query and title to their default values. This way you don't have to rely on both the user and LLM to send the right prompt. + +```{r} +ui <- page_sidebar( + sidebar = sidebar( + qc$ui(), + hr(), + actionButton("reset", "Reset Filters") + ), + # Main content + card(dataTableOutput("table")) +) + +server <- function(input, output, session) { + qc_vals <- qc$server() + + output$table <- renderDataTable({ + qc_vals$df() + }) + + observeEvent(input$reset, { + qc_vals$sql("") + qc_vals$title(NULL) + }) +} + +shinyApp(ui, server) +``` + +This is equivalent to the user asking the LLM to "reset" or "show all data". + + +## Multiple tables + +Currently, you have two options for exploring multiple tables in QueryChat: + +1. Join the tables into a single table before passing to QueryChat +2. Use multiple QueryChat instances in the same app + +The first option makes it possible to chat with multiple tables inside a single chat interface, whereas the second option requires a separate chat interface for each table. + +::: {.alert .alert-info} +### Multiple filtered tables + +We do intend on supporting multiple filtered tables in a future release -- if you're interested in this feature, please upvote [the relevant issue](https://github.com/posit-dev/querychat/issues/6) +::: + +
+ app.R + +```{r} +library(shiny) +library(bslib) +library(palmerpenguins) +library(titanic) +library(querychat) + +qc_penguins <- QueryChat$new(penguins) +qc_titanic <- QueryChat$new(titanic_train) + +ui <- page_navbar( + title = "Multiple Datasets", + sidebar = sidebar( + id = "sidebar", + conditionalPanel( + "input.navbar == 'Penguins'", + qc_penguins$ui() + ), + conditionalPanel( + "input.navbar == 'Titanic'", + qc_titanic$ui() + ) + ), + nav_panel( + "Penguins", + card(dataTableOutput("penguins_table")) + ), + nav_panel( + "Titanic", + card(dataTableOutput("titanic_table")) + ), + id = "navbar" +) + +server <- function(input, output, session) { + qc_penguins_vals <- qc_penguins$server() + qc_titanic_vals <- qc_titanic$server() + + output$penguins_table <- renderDataTable({ + qc_penguins_vals$df() + }) + + output$titanic_table <- renderDataTable({ + qc_titanic_vals$df() + }) +} + +shinyApp(ui, server) +``` + +
+ +## See also + +- [Greet users](greet.html) - Create welcoming onboarding experiences +- [Provide context](context.html) - Help the LLM understand your data better +- [Tools](tools.html) - Understand what QueryChat can do under the hood diff --git a/pkg-r/vignettes/context.Rmd b/pkg-r/vignettes/context.Rmd new file mode 100644 index 00000000..77fc65f8 --- /dev/null +++ b/pkg-r/vignettes/context.Rmd @@ -0,0 +1,120 @@ +--- +title: "Provide Context" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Provide Context} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = FALSE +) +``` + +querychat automatically gathers information about your table to help the LLM write accurate SQL queries. This includes column names and types, numerical ranges, and categorical value examples. (All of this information is provided to the LLM as part of the **system prompt** -- a string of text containing instructions and context for the LLM to consider when responding to user queries.) + +Importantly, the LLM never sees the actual data itself -- it doesn't need to in order to write SQL queries for you. It only needs to understand the structure and schema of your data. + +You can get even better results by customizing the system prompt in three ways: + +1. Add a [data description](#data-description) to provide more context about what the data represents +2. Add [custom instructions](#extra-instructions) to guide the LLM's behavior +3. Use a fully [custom prompt template](#custom-template) if you want complete control (useful if you want to be certain the model cannot see any literal values from your data) + +```{r} +library(querychat) +library(palmerpenguins) +``` + +## Default prompt + +For full visibility into the system prompt that querychat generates for the LLM, you can inspect the `system_prompt` field. This is useful for debugging and understanding exactly what context the LLM is using: + +```{r} +qc <- querychat(penguins) +cat(qc$system_prompt) +``` + +By default, the system prompt contains the following components: + +1. The basic set of behaviors and guidelines the LLM must follow in order for querychat to work properly, including how to use [tools](tools.html) to execute queries and update the app. +2. The SQL schema of the data frame you provided. This includes: + - Column names + - Data types (integer, real, boolean, date/datetime, text) + - For text columns with less than 10 unique values, we assume they are categorical variables and include the list of values + - For integer and real columns, we include the range +3. A [data description](#data-description) (if provided via `data_description`) +4. [Additional instructions](#additional-instructions) you want to use to guide querychat's behavior (if provided via `extra_instructions`). + +## Data description {#data-description} + +If your column names are descriptive, querychat may already work well without additional context. However, if your columns are named `x`, `V1`, `value`, etc., you should provide a data description. Use the `data_description` parameter for this: + +```{r} +qc <- querychat( + penguins, + data_description = "data_description.md" +) + +cat(qc$system_prompt) +``` + +querychat doesn't need this information in any particular format -- just provide what a human would find helpful: + +```markdown + + +This dataset contains information about Palmer Archipelago penguins, +collected for studying penguin populations. + +- species: Penguin species (Adelie, Chinstrap, Gentoo) +- island: Island where observed (Torgersen, Biscoe, Dream) +- bill_length_mm: Bill length in millimeters +- bill_depth_mm: Bill depth in millimeters +- flipper_length_mm: Flipper length in millimeters +- body_mass_g: Body mass in grams +- sex: Penguin sex (male, female) +- year: Year of observation +``` + +## Additional instructions {#extra-instructions} + +You can add custom instructions to guide the LLM's behavior using the `extra_instructions` parameter: + +```{r} +qc <- querychat( + penguins, + extra_instructions = "instructions.md" +) + +cat(qc$system_prompt) +``` + +Or as a string: + +```{r} +instructions <- " +- Use British spelling conventions +- Stay on topic and only discuss the data dashboard +- Refuse to answer unrelated questions +" + +qc <- querychat( + penguins, + extra_instructions = instructions +) + +cat(qc$system_prompt) +``` + +::: {.alert .alert-warning} +LLMs may not always follow your instructions perfectly. Test extensively when changing instructions or models. +::: + +## Custom template {#custom-template} + +If you want more control over the system prompt, you can provide a custom prompt template using the `prompt_template` parameter. This is for more advanced users who want to fully customize the LLM's behavior. See the [QueryChat reference](../reference/QueryChat.html) for details on the available template variables. diff --git a/pkg-r/vignettes/data-sources.Rmd b/pkg-r/vignettes/data-sources.Rmd new file mode 100644 index 00000000..340de5ae --- /dev/null +++ b/pkg-r/vignettes/data-sources.Rmd @@ -0,0 +1,148 @@ +--- +title: "Data Sources" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Data Sources} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = FALSE +) +``` + +`querychat` supports several different data sources, including: + +1. Data frames +2. DBI database connections (e.g., SQLite, PostgreSQL, MySQL, DuckDB) +3. Custom `DataSource` interfaces + +The sections below describe how to use each type of data source with `querychat`. + +## Data frames + +You can use any data frame as a data source in `querychat`. Simply pass it to `QueryChat$new()`: + +```{r} +library(querychat) +library(palmerpenguins) + +querychat_app(mtcars) +``` + +Behind the scenes, `querychat` creates an in-memory DuckDB database and registers your data frame as a table for SQL query execution. + +## Database connections + +You can also connect `querychat` directly to a table in any database supported by [DBI](https://dbi.r-dbi.org/). This includes popular databases like SQLite, DuckDB, PostgreSQL, MySQL, and many more. + +Assuming you have a database set up and accessible, you can create a DBI connection and pass it to `QueryChat$new()`. Below are some examples for common databases. + +### DuckDB + +```{r} +library(DBI) +library(duckdb) +library(querychat) + +# Connect to a DuckDB database file +con <- dbConnect(duckdb::duckdb(), dbdir = "my_database.duckdb") + +querychat_app(con, "my_table") + +# Don't forget to disconnect when done +# dbDisconnect(con) +``` + +### SQLite + +```{r} +library(DBI) +library(RSQLite) +library(querychat) + +# Connect to a SQLite database file +con <- dbConnect(RSQLite::SQLite(), "my_database.db") + +querychat_app(con, "my_table") + +# Don't forget to disconnect when done +# dbDisconnect(con) +``` + +### PostgreSQL + +```{r} +library(DBI) +library(RPostgres) +library(querychat) + +# Connect to PostgreSQL +con <- dbConnect( + RPostgres::Postgres(), + host = "localhost", + port = 5432, + dbname = "mydatabase", + user = "myuser", + password = "mypassword" +) + +querychat_app(con, "my_table") + +# Don't forget to disconnect when done +# dbDisconnect(con) +``` + +### MySQL + +```{r} +library(DBI) +library(RMariaDB) +library(querychat) + +# Connect to MySQL +con <- dbConnect( + RMariaDB::MariaDB(), + host = "localhost", + port = 3306, + dbname = "mydatabase", + user = "myuser", + password = "mypassword" +) + +querychat_app(con, "my_table") + +# Don't forget to disconnect when done +# dbDisconnect(con) +``` + +## Creating a database from a data frame + +If you don't have a database set up, you can easily create a local DuckDB database from a data frame: + +```{r} +library(DBI) +library(duckdb) + +con <- dbConnect(duckdb::duckdb(), dbdir = "my_database.duckdb") + +# Write a data frame to the database +dbWriteTable(con, "penguins", penguins) + +# Or from CSV +dbExecute(con, " + CREATE TABLE my_table AS + SELECT * FROM read_csv_auto('path/to/your/file.csv') +") +``` + +Then you can connect to this database using the DuckDB example above. + +## Custom sources + +If you have a custom data source that doesn't fit into the above categories, you can implement the `DataSource` interface. +See the [DataSource reference](../reference/DataSource.html) for more details on implementing this interface. diff --git a/pkg-r/vignettes/greet.Rmd b/pkg-r/vignettes/greet.Rmd new file mode 100644 index 00000000..89f8e215 --- /dev/null +++ b/pkg-r/vignettes/greet.Rmd @@ -0,0 +1,78 @@ +--- +title: "Greet Users" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Greet Users} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = FALSE +) +``` + +```{r} +library(querychat) +library(palmerpenguins) +``` + +## Provide a greeting + +When the querychat UI first appears, you will usually want it to greet the user with some basic instructions. By default, these instructions are auto-generated every time a user arrives. In a production setting with multiple users/visitors, this is slow, wasteful, and non-deterministic. Instead, you should create a greeting file and pass it when creating your `QueryChat` object: + +```{r} +querychat_app( + penguins, + greeting = "greeting.md" +) +``` + +You can provide suggestions to the user by using the ` ` tag: + +```markdown +* **Filter and sort the data:** + * Show only Adelie penguins + * Filter to penguins with body mass over 4000g + * Sort by flipper length from longest to shortest + +* **Answer questions about the data:** + * What is the average bill length by species? + * How many penguins are in each island? + * Which species has the largest average body mass? +``` + +These suggestions appear in the greeting and automatically populate the chat text box when clicked. + +## Generate a greeting + +If you need help coming up with a greeting, you can use the `$generate_greeting()` method: + +```{r} +library(querychat) + +# Create QueryChat object with your dataset +qc <- querychat(penguins) + +# Generate a greeting (this calls the LLM) +greeting_text <- qc$generate_greeting(echo = "text") +#> Hello! I'm here to help you explore and analyze the penguins dataset. +#> Here are some example prompts you can try: +#> ... + +# Save it for reuse +writeLines(greeting_text, "penguins_greeting.md") +``` + +This approach generates a greeting once and saves it for reuse, avoiding the latency and cost of generating it for every user. + +```{r} +# Then use the saved greeting in your app +querychat_app( + penguins, + greeting = "penguins_greeting.md" +) +``` diff --git a/pkg-r/vignettes/images/multiple-datasets.png b/pkg-r/vignettes/images/multiple-datasets.png new file mode 100644 index 00000000..bef8a22c Binary files /dev/null and b/pkg-r/vignettes/images/multiple-datasets.png differ diff --git a/pkg-r/vignettes/images/plotly-data-view.png b/pkg-r/vignettes/images/plotly-data-view.png new file mode 100644 index 00000000..18d610fd Binary files /dev/null and b/pkg-r/vignettes/images/plotly-data-view.png differ diff --git a/pkg-r/vignettes/images/rich-data-views.png b/pkg-r/vignettes/images/rich-data-views.png new file mode 100644 index 00000000..3485e1b4 Binary files /dev/null and b/pkg-r/vignettes/images/rich-data-views.png differ diff --git a/pkg-r/vignettes/models.Rmd b/pkg-r/vignettes/models.Rmd new file mode 100644 index 00000000..fca9f072 --- /dev/null +++ b/pkg-r/vignettes/models.Rmd @@ -0,0 +1,93 @@ +--- +title: "Models" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Models} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = FALSE +) +``` + +Under the hood, `querychat` is powered by [ellmer](https://ellmer.tidyverse.org/), a library for building chat-based applications with large language models (LLMs). `ellmer` supports a wide range of LLM providers -- [see here](https://ellmer.tidyverse.org/reference/index.html#chat-constructors) for a full list. + +```{r} +library(querychat) +library(palmerpenguins) +library(ellmer) +``` + +## Specify a model + +To use a particular model, pass a `"{provider}/{model}"` string to the `client` parameter. Under the hood, this gets passed along to `ellmer::chat()`: + +```{r} +querychat_app(penguins, client = "anthropic/claude-sonnet-4-5") +``` + +And, if you'd like to effectively set a new default model, you can use the `querychat.client` R option or the `QUERYCHAT_CLIENT` environment variable. + +```{r} +# In your .Rprofile +options(querychat.client = "anthropic/claude-sonnet-4-5") +``` + +Note that it can also be useful to pass a full `Chat` object to the `client` parameter for more advanced use cases (e.g., custom parameters, tools, etc): + +```{r} +client <- chat_anthropic(model = "claude-sonnet-4-5") +querychat_app(penguins, client = client) +``` + +## Credentials + +Most models require an API key or some other form of authentication. See the reference page for the relevant model provider (e.g., [`chat_anthropic()`](https://ellmer.tidyverse.org/reference/chat_anthropic.html)) to learn more on how to set up credentials. + +::: {.alert .alert-info} +**GitHub model marketplace** + +If you are already setup with GitHub credentials, [GitHub model marketplace](https://github.com/marketplace/models) provides a free and easy way to get started. See [here](https://ellmer.tidyverse.org/reference/chat_github.html) for more details on how to get setup. + +```{r} +library(ellmer) + +# Just works if GITHUB_TOKEN is set in your environment +client <- chat_github(model = "gpt-4.1") +``` +::: + +In general, most providers will prefer credentials stored as environment variables. Common practice is to use an `.Renviron` file to manage these variables. For example, for `chat_openai()`, you might add to your `.Renviron` file: + +```bash +OPENAI_API_KEY="your_api_key_here" +``` + +Then, you can edit your `.Renviron` file using: + +```{r} +usethis::edit_r_environ() +``` + +## Recommended models + +In theory, you could use any model that has tool calling support, but we currently recommend (as of November 2025): + +- GPT-4.1 (the default) +- Claude 4.5 Sonnet +- Google Gemini 3.0 + +In our testing, we've found that those models strike a good balance between accuracy and latency. Smaller/cheaper models like GPT-4o-mini are fine for simple queries but make surprising mistakes with more complex ones; and reasoning models like o3-mini slow down responses without providing meaningfully better results. + +We've also seen some decent results with frontier local models, but even if you have the compute to run the largest models, they still tend to lag behind the cloud-hosted options in terms of accuracy and speed. + +::: {.alert .alert-info} +**Data privacy concerns?** + +If you have data privacy concerns, consider that your org may provide access to private instances of these models with data residency guarantees. For example, Azure, AWS Bedrock, and Google Vertex AI all provide private instances of popular LLMs. You can interface with these enterprise providers by passing the right string (e.g., `"bedrock-anthropic"`) or `Chat` object (e.g., `chat_bedrock_anthropic()`) to the `client` parameter. See the [ellmer docs](https://ellmer.tidyverse.org/reference/index.html#chat-constructors) for more details. +::: diff --git a/pkg-r/vignettes/tools.Rmd b/pkg-r/vignettes/tools.Rmd new file mode 100644 index 00000000..46454042 --- /dev/null +++ b/pkg-r/vignettes/tools.Rmd @@ -0,0 +1,76 @@ +--- +title: "Tools" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Tools} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = FALSE +) +``` + +querychat combines [tool calling](https://ellmer.tidyverse.org/articles/tool-calling.html) with [reactivity](https://shiny.posit.co/r/articles/build/reactivity-overview.html) to not only execute SQL, but also reactively update dependent data views. Understanding how these tools work will help you better understand what querychat is capable of and how to customize/extend its behavior. + +One important thing to understand generally about querychat's tools is they are R functions, and that execution happens on _your machine_, not on the LLM provider's side. In other words, the SQL queries generated by the LLM are executed locally in the R process running the app. + +querychat provides the LLM access to two tool groups: + +1. **Data updating** - Filter and sort data (without sending results to the LLM). +2. **Data analysis** - Calculate summaries and return results for interpretation by the LLM. + +```{r} +library(querychat) +library(palmerpenguins) +``` + +## Data updating {#data-updating} + +When a user asks to "Show me..." or "Filter to..." or "Sort by...", the LLM requests a call to the `update_dashboard` tool with an appropriate SQL query as input. An important constraint is that the query must return all original schema columns (typically using `SELECT *`). When called, querychat will both set a reactive value holding [the current SQL query](build.html#sql-query) and execute the query to get the result. The result of query then used to set a reactive value holding the [filtered/sorted data frame](build.html#filtered-data). Thanks to reactivity, this will automatically update any views depending on this data frame, such as the data table displayed in the UI. + +This tool also takes a `title` parameter, which is a short description of the filter/sort operation (e.g., "Adelie penguins"). This, also, is made available through [a reactive value](build.html#title) for display somewhere in your app. + +Here's a basic example of this tool in action with the `$app()` method. Notice how this pre-built app not only shows the data table, but also the SQL query and title generated by the LLM (for transparency): + +```{r} +querychat_app(penguins) +``` + +![](../reference/figures/quickstart-filter.png){alt="Screenshot of the querychat's app with the penguins dataset filtered." class="shadow rounded"} + +The other data updating tool is `reset_dashboard`, which clears any active filters and returns the data table to its original unfiltered state. The LLM typically uses this when users say "reset", "start over", or "clear filters". + +## Data analysis + +When a user asks analytical questions like "What is the average...?", "How many...?", or "Which item has the highest...?", the LLM generates a SQL query and requests a call to the `query` tool. Unlike the data updating tools, this tool will not update any reactive values. Instead, it will: + +1. Execute the SQL query +2. Display both the SQL query and results in the UI +3. Return the results back to the LLM for interpretation + +Here's an example of it in action: + +```{r} +querychat_app(penguins) +``` + +![](../reference/figures/quickstart-summary.png){alt="Screenshot of the querychat's app with a summary statistic inlined in the chat." class="shadow rounded"} + +## View the source + +If you'd like to better understand how the tools work and how the LLM is prompted to use them, check out the following resources: + +**Source code:** + +- [`querychat_tools.R`](https://github.com/posit-dev/querychat/blob/main/pkg-r/R/querychat_tools.R) + +**Prompts:** + +- [`prompts/tool-update-dashboard.md`](https://github.com/posit-dev/querychat/blob/main/pkg-r/inst/prompts/tool-update-dashboard.md) +- [`prompts/tool-reset-dashboard.md`](https://github.com/posit-dev/querychat/blob/main/pkg-r/inst/prompts/tool-reset-dashboard.md) +- [`prompts/tool-query.md`](https://github.com/posit-dev/querychat/blob/main/pkg-r/inst/prompts/tool-query.md) diff --git a/pyproject.toml b/pyproject.toml index 14e979c6..a98aa48b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -106,6 +106,7 @@ target-version = "py310" extend-ignore = [ "A002", # Shadowing a built-in "ARG001", # Unused argument + "D103", # Missing docstring in public function "D200", # One-line docstring should fit on one line with quotes "D203", # 1 blank line required before class docstring "D212", # Multi-line docstring summary should start at the first line