Skip to content

Support CSV Limit Pushdown to Object Storage #2930

@sitano

Description

@sitano

Describe the bug

If you will take a 10 GB file from a S3 remote storage the following request:

SELECT * FROM test LIMIT 1;

will try to read the WHOLE file (10GB) instead of just a first row (chunk).

To Reproduce
Steps to reproduce the behavior:

  1. Put 1GB CSV file to S3
  2. Add s3 contrib object store that is fine
//  let mut ctx: Context = Context::new_local(&session_config);
    let mut ctx = {
        let runtime = RuntimeEnv::new(RuntimeConfig::default()).unwrap();
        runtime.register_object_store("s3", Arc::new(S3FileSystem::default().await));
        Context::Local(SessionContext::with_config_rt(
            session_config.clone(),
            Arc::new(runtime.clone()),
        ))
    };
  1. CREATE EXTERNAL TABLE test (...) STORED AS CSV WITH HEADER ROW LOCATION 's3://blah/blah.csv';
  2. SELECT * FROM test LIMIT 1;
list file from: s3://blah/blah.csv
sync_chunk_reader: 0-10428263736
sending get object request blah/blah.csv
ArrowError(ExternalError(Custom { kind: TimedOut, error: AWS("Timeout") }))

Expected behavior

It must read only a small chunk that is enough to execute the LIMIT 1 query.

Additional context

The contrib module is fine... It's an engine that requests this epic lenght.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions