Skip to content

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Sep 23, 2025

Improves storage efficiency and query performance for DAG serialization data by using PostgreSQL's native JSONB type instead of JSON. JSONB provides better compression, faster equality comparisons, and removes whitespace/duplicate keys. Would have also made the query in #55975 simpler.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@ashb
Copy link
Member

ashb commented Sep 23, 2025

airflow-core/tests/unit/cli/commands/test_dag_command.py::TestCliDags::test_show_dag_dependencies_save - sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) function json_extract_path(jsonb, unknown, unknown) does not exist

That's now failing because we are using json_extract_path on a jsonb type. However I'm not even sure we need that function and custom type -- can't we use the json ops in sqla (even 1.4 had this): https://docs.sqlalchemy.org/en/14/core/type_basics.html#:~:text=path%20index%20operations

data_table.c.data[('key_1', 'key_2', 5, ..., 'key_n')]

The postgres docs say that this:

Extracts JSON sub-object at the specified path. (This is functionally equivalent to the #> operator, but writing the path out as a variadic list can be more convenient in some cases.)

In [36]: metadata = MetaData()

In [37]: data_table = Table('data_table', metadata,
    ...:     Column('id', Integer, primary_key=True),
    ...:     Column('data', JSONB)
    ...: )

In [38]: clause = select(data_table.c.data[('key_1', 'key_2', 5, 'key_n')])

In [39]: str(clause.compile(dialect=_25, compile_kwargs={"literal_binds": True}))
Out[39]: "SELECT data_table.data #> '{key_1, key_2, 5, key_n}' AS anon_1 \nFROM data_table"

Edit: ah the custom type is in HITL model

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
@kaxil kaxil added this to the Airflow 3.2.0 milestone Sep 23, 2025
@kaxil kaxil merged commit 3b2242b into apache:main Sep 23, 2025
109 checks passed
@kaxil kaxil deleted the posgres-jsin branch September 23, 2025 14:16
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Sep 30, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 1, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 2, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 3, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 4, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 5, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 5, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 7, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 8, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 9, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 10, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 11, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 12, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 14, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 15, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 17, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 19, 2025
…e#55979)

Improves storage efficiency and query performance for DAG serialization
data by using PostgreSQL's native `JSONB` type instead of `JSON`. `JSONB`
provides better compression, faster equality comparisons, and removes
whitespace/duplicate keys. Would have also made the query in apache#55975 simpler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants