Currently, Superset requires pyarrow>=14.0.1,<15, but this creates compatibility issues when working with databases that return StringView types (introduced in PyArrow 16).
I've tested Superset with PyArrow 18.1.0 and verified it works correctly in my (admittedly bare-bones) setup. This update would:
2024-12-16 12:48:10,731:ERROR:flask_appbuilder.api:Unrecognized type: 24
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 110, in wraps
return f(self, *args, kwargs)
File "/app/superset/views/base_api.py", line 127, in wraps
raise ex
File "/app/superset/views/base_api.py", line 121, in wraps
duration, response = time_function(f, self, *args, kwargs)
File "/app/superset/utils/core.py", line 1470, in time_function
response = func(args, **kwargs)
File "/app/superset/utils/log.py", line 255, in wrapper
value = f(args, kwargs)
File "/app/superset/databases/api.py", line 742, in table_metadata
table_info = get_table_metadata(database, table_name, schema_name)
File "/app/superset/databases/utils.py", line 67, in get_table_metadata
columns = database.get_columns(table_name, schema_name)
File "/app/superset/models/core.py", line 839, in get_columns
return self.db_engine_spec.get_columns(
File "/app/superset/db_engine_specs/base.py", line 1341, in get_columns
cast(list[SQLAColumnType], inspector.get_columns(table_name, schema))
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 497, in get_columns
col_defs = self.dialect.get_columns(
File "<string>", line 2, in get_columns
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache
ret = fn(self, con, *args, kw)
File "/usr/local/lib/python3.10/site-packages/flightsql/sqlalchemy.py", line 87, in get_columns
return connection.connection.flightsql_get_columns(table, schema)
File "/usr/local/lib/python3.10/site-packages/flightsql/util.py", line 8, in g
return f(self, *args, kwargs)
File "/usr/local/lib/python3.10/site-packages/flightsql/dbapi.py", line 173, in flightsql_get_columns
reader = ipc.open_stream(table_schema)
File "/usr/local/lib/python3.10/site-packages/pyarrow/ipc.py", line 190, in open_stream
return RecordBatchStreamReader(source, options=options,
File "/usr/local/lib/python3.10/site-packages/pyarrow/ipc.py", line 52, in init**
self._open(source, options=options, memory_pool=memory_pool)
File "pyarrow/ipc.pxi", line 929, in pyarrow.lib._RecordBatchStreamReader._open
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Unrecognized type: 24
Bug description
Currently, Superset requires pyarrow>=14.0.1,<15, but this creates compatibility issues when working with databases that return StringView types (introduced in PyArrow 16).
I've tested Superset with PyArrow 18.1.0 and verified it works correctly in my (admittedly bare-bones) setup. This update would:
Proposed change:
Update the pyarrow dependency in pyproject.toml from:
"pyarrow>=14.0.1, <15"to:
"pyarrow>=14.0.1, <19"Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.10
Node version
Not applicable
Browser
Not applicable
Additional context
I'm using the https://github.com/influxdata/flightsql-dbapi DB API2 layer to query a database that returns native Arrow arrays. It is returning StringView types that pyarrow 14 can't understand. I force upgraded to pyarrow 18.1 and it started working.
Checklist