Skip to content

Subsequent JOIN s don't match to correct row #6744

@MidasLamb

Description

@MidasLamb

Describe the bug

I have a query where I JOIN three tables, and then have a WHERE clause on the last table ("twintag.owner"). If I give a value that exists there, all the results are returned (regardless of whether or not the "twintag.owner" matches the actual value), and if I give a value that doesn't exist, it returns nothing.

The query is as follows:

SELECT "75620d04a8b7b34c70c68f7af68953af".*
FROM "75620d04a8b7b34c70c68f7af68953af" 
JOIN "f4be58656cd5e54b9bffa1706c3c49ff" AS "linkedtwintag"
    ON "75620d04a8b7b34c70c68f7af68953af"."82ad7656c6b6be56e9abfac772c89b4c" = "linkedtwintag"."776ff32c7d7bfb1c1a49d5c0d94e0db8" 
JOIN "__bag" AS "linkedtwintag_twintag"
    ON "linkedtwintag"."dataScope" = "linkedtwintag_twintag"."storage_qid" 
WHERE 1 = 1 
AND "linkedtwintag_twintag"."owner" = '324ae4533db15dcb19e3473fea54177a' 
ORDER BY "75620d04a8b7b34c70c68f7af68953af"."qid" ASC

If I start from the intermediate table it works as expected and I get back the one result I'm looking for:

SELECT "f4be58656cd5e54b9bffa1706c3c49ff".*
FROM "f4be58656cd5e54b9bffa1706c3c49ff" 
JOIN "__bag" AS "twintag"
    ON "f4be58656cd5e54b9bffa1706c3c49ff"."dataScope" = "twintag"."storage_qid"
WHERE 1 = 1 
AND "twintag"."owner" = '324ae4533db15dcb19e3473fea54177a'  ORDER BY "f4be58656cd5e54b9bffa1706c3c49ff"."qid" ASC

To Reproduce

Create some tables where you can construct a JOIN from table A to B and from B to C.
Execute the (type of) query, where you use the intermediate table to join from table A to C, filtering on something in table C:

SELECT "75620d04a8b7b34c70c68f7af68953af".*
FROM "75620d04a8b7b34c70c68f7af68953af" 
JOIN "f4be58656cd5e54b9bffa1706c3c49ff" AS "linkedtwintag"
    ON "75620d04a8b7b34c70c68f7af68953af"."82ad7656c6b6be56e9abfac772c89b4c" = "linkedtwintag"."776ff32c7d7bfb1c1a49d5c0d94e0db8" 
JOIN "__bag" AS "linkedtwintag_twintag"
    ON "linkedtwintag"."dataScope" = "linkedtwintag_twintag"."storage_qid" 
WHERE 1 = 1 
AND "linkedtwintag_twintag"."owner" = '324ae4533db15dcb19e3473fea54177a' 
ORDER BY "75620d04a8b7b34c70c68f7af68953af"."qid" ASC

Expected behavior

I expect only the items which match the filter on table C to be returned.
Currently it returns either ALL items if an item matches, or NO items if there is no match found. In the example above I expect 1 match, but I get all the items from table A back.
If I change the value I'm looking for in table C to be non-existent there, I get back no results instead, which is expected.

Additional context

I'm also using datafusion-remote-table from seafowl (https://github.com/splitgraph/seafowl/tree/main/datafusion_remote_tables)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions