Improve read performance

**Is your feature request related to a problem? Please describe.**
We are a slightly disappointed by the read performance from DB2 databases via this python package. We use `ibm_db_dbi` and pandas `read_sql` to read in data. This calls into https://github.com/ibmdb/python-ibmdb/blob/c3aaf0229d0fe546c346434bf9fde37169b7c5ff/ibm_db_dbi.py#L1764
where the result is processed in a python loop and every fetched tuple is appended to a list. This is suboptimal performance wise.

There is a faster solution since the end of last year due to this PR https://github.com/ibmdb/python-ibmdb/pull/971, `fetchall` implemented in https://github.com/ibmdb/python-ibmdb/blob/c3aaf0229d0fe546c346434bf9fde37169b7c5ff/ibm_db.c#L17042
Here, the loop is implemented in C but there are a lot of python-checks for every fetched tuple. The performance is 4x faster than the `ibm_db_dbi` interface.

Other packages are providing a much better performance. E.g., the rust package https://github.com/pacman82/arrow-odbc wrapped in https://github.com/pacman82/arrow-odbc-py calls the IBM driver via the ODBC interface and bulk reads in rust into the arrow format. The python wrapper of the rust crate has a very easy syntax:
```{python}
reader = arrow_odbc.read_arrow_batches_from_odbc(
        query=f"""
                SELECT col1, col2, ...
                FROM TEST_PULL """,
        connection_string=connection_string,
        batch_size=10_000,
    )

batches = []
for batch in reader:
    batches.append(batch)

df = pl.from_arrow(batches) # polars df
```

A little test gave us the following performance chart (lower is better)

![Image](https://github.com/user-attachments/assets/1b5e3542-2c92-4f7d-b7ff-7620dd97f2c9)
Note the logarithmic scale of the axis.

Summarized: `arrow-odbc-py` gives as ~15x performance boost over `ibm_db_dbi` interface and ~4x boost over `fetchall` from `ibm_db`. 

**Describe the solution you'd like**
We would like to continue to use `ibm_db_dbi`, but the performance sacrifices compared to other packages are too large. Please enhance the performance of reading from DB2.

**Describe alternatives you've considered**
Other packages, like `arrow-odbc-py`, allow for significantly faster reads.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve read performance #1004

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve read performance #1004

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions