Skip to content

API for enabling/disabling DDL / DML / Config changes via SQL #7328

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Some people want to use DataFusion as a read only engine (for example we do in IOx). We do not want to allow users to:

  1. Create memory backed tables (the state is ephemeral, so they won't be able to use them)
  2. Write to local files (via COPY) as this is a security issue
  3. Set session configuration (e.g. batch_size) as this can cause unwanted memory use / Denial of service attacks

Other users, such as datafusion-cli want to allow all the features

Also, DataFusion has gained additional capabilities, such as the ability to INSERT into the included table providers like Csv and Json, it may not be obvious to builders on top of DataFusion that such modifications are allowed and depending on their usecase may actually be a security risk

While working on #7272 from @UlfarErl , it is pretty clear that the distinction between APIs that handle read only sql and SQL that modifies the catalog is confusing. Additionally
the new COPY command, is a normal execution plan, and thus without additional work on IOx (see https://github.com/influxdata/influxdb_iox/pull/8515#discussion_r1297654343 ) datafusion could allow users to run COPY (and overwrite local files, etc)

Describe the solution you'd like

Thus I propose making an API on SessionContext and SessionState with the specific options about what types of operations are supported:

Something like:

struct SQLOptions {
  /// allow DDL catalog modification commands (e.g. `CREATE TABLE ...`)
  allow_ddl: bool,
  /// allow DML data modification commands (e.g. `INSERT and COPY`)
  allow_dml: bool,
/// allow configuration changes (e.g. `SET ...`)
allow_config: bool
}

And then add this:

impl SessionContext {

/// Existing API will allow all types of SQL:
pub async fn sql(&self, sql: &str) -> Result<DataFrame> {.
  self.sql_with_options(sql SQLOptions {
    allow_ddl: true,
    allow_dml: true,
    allow_config: true,
    })
}

/// New API will generate errors if a type of command is not allowed
pub async fn sql_with_options(&self, sql: &str, options: SQLOptions) -> Result<DataFrame> {
  let plan = ...;
  if is_dml(plan) && !optiobs.allow_dml {
    return plan_err!("DML Plan {plan} is not allowed")
  }
  ...
}

Describe alternatives you've considered

No response

Additional context

Related to an earlier proposal #4720

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions