Skip to content

datafusion-cli fails to run ClickBench queries with 8GB of RAM #18473

@alamb

Description

@alamb

Describe the bug

@pmcgleenon reports on #17721 (comment):

The instance type c6a.xlarge only has 8GB RAM (the clickbench dataset is 15GB) and there are a number of issues with it, including

...
OOM errors happens during test execution, with results reported as null for several tests

To Reproduce

  1. Use docker to build datafusion:
cd .devcontainer
docker build -t datafusion-build .
cd ..
docker run -m 4G -v `pwd`:/datafusion  -it datafusion-build   /bin/bash

Now, in the docker container

# build datafusion-cli
cd /datafusion
cargo install   --profile=release-nonlto --path datafusion-cli
# Get benchmark data
cd /datafusion/benchmarks
./bench.sh data clickbench_partitioned
cd /datafusion/benchmarks/data
# make symlink to hits so queries can run without modification
ln -s hits_partitioned hits

# run the queries
for q in `ls ../queries/clickbench/queries/*.sql` ; do datafusion-cli -f $q ; done

This is the loop that runs the queries

for q in `ls ../queries/clickbench/queries/*.sql` ; do
  echo "Running $q..." ;
  datafusion-cli -f $q ;
done

You'll see queries get killed due to OOM like this:

Running ../queries/clickbench/queries/q18.sql...
DataFusion CLI v50.3.0
bash: line 1: 57537 Killed                  datafusion-cli -f $q

The queries that are killed are:

Running ../queries/clickbench/queries/q18.sql...
Running ../queries/clickbench/queries/q20.sql...
Running ../queries/clickbench/queries/q22.sql...
Running ../queries/clickbench/queries/q23.sql...
Running ../queries/clickbench/queries/q32.sql...
Running ../queries/clickbench/queries/q33.sql...
Running ../queries/clickbench/queries/q34.sql...
Running ../queries/clickbench/queries/q35.sql...

You can find these queries here: https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench/queries

Expected behavior

The queries should not be killed

Additional context

This ticket tracks build OOM'ing:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions