Skip to content

Queries with external data specified are not parsed correctly in the clickhouse server #26046

@Zark1999

Description

@Zark1999

Bug Description

Clickhouse Version: 21.3.4.25
JDBC Version: 0.3.1-patch

When executing queries with several ClickhouseExternalData, the server side has a possiblility that the parsed sql (store in system.query_log) is incorrect, which causes either syntax error or missing column exception.


e.g.
Querying about the avg, sum, min, max and any of column Ca with some extra conditions
L0, L1 ... and L9 are ClickhouseExternalData specified on the client side that pass along with the http uri

A correct query would be

select avgIf(Ca, Cb in L0) as avgCa1, sumIf(Ca, Cb in L0) as sumCa1, minIf(Ca, Cb in L0) as minCa1, maxIf(Ca, Cb in L0) as maxCa1, anyIf(Ca, Cb in L0) as anyCa1, avgIf(Ca, Cb in L1) as avgCa2, sumIf(Ca, Cb in L1) as sumCa2, minIf(Ca, Cb in L1) as minCa2, maxIf(Ca, Cb in L1) as maxCa2, anyIf(Ca, Cb in L1) as anyCa2, avgIf(Ca, Cb in L2) as avgCa3, sumIf(Ca, Cb in L2) as sumCa3, minIf(Ca, Cb in L2) as minCa3, maxIf(Ca, Cb in L2) as maxCa3, anyIf(Ca, Cb in L2) as anyCa3, avgIf(Ca, Cb in L3) as avgCa4, sumIf(Ca, Cb in L3) as sumCa4, minIf(Ca, Cb in L3) as minCa4, maxIf(Ca, Cb in L3) as maxCa4, anyIf(Ca, Cb in L3) as anyCa4, avgIf(Ca, Cb in L4) as avgCa5, sumIf(Ca, Cb in L4) as sumCa5, minIf(Ca, Cb in L4) as minCa5, maxIf(Ca, Cb in L4) as maxCa5, anyIf(Ca, Cb in L4) as anyCa5, avgIf(Ca, Cb in L5) as avgCa6, sumIf(Ca, Cb in L5) as sumCa6, minIf(Ca, Cb in L5) as minCa6, maxIf(Ca, Cb in L5) as maxCa6, anyIf(Ca, Cb in L5) as anyCa6, avgIf(Ca, Cb in L6) as avgCa7, sumIf(Ca, Cb in L6) as sumCa7, minIf(Ca, Cb in L6) as minCa7, maxIf(Ca, Cb in L6) as maxCa7, anyIf(Ca, Cb in L6) as anyCa7, avgIf(Ca, Cb in L7) as avgCa8, sumIf(Ca, Cb in L7) as sumCa8, minIf(Ca, Cb in L7) as minCa8, maxIf(Ca, Cb in L7) as maxCa8, anyIf(Ca, Cb in L7) as anyCa8, avgIf(Ca, Cb in L8) as avgCa9, sumIf(Ca, Cb in L8) as sumCa9, minIf(Ca, Cb in L8) as minCa9, maxIf(Ca, Cb in L8) as maxCa9, anyIf(Ca, Cb in L8) as anyCa9, avgIf(Ca, Cb in L9) as avgCa10, sumIf(Ca, Cb in L9) as sumCa10, minIf(Ca, Cb in L9) as minCa10, maxIf(Ca, Cb in L9) as maxCa10, anyIf(Ca, Cb in L9) as anyCa10 from test.test_table where TIMESTAMP >= 1625644740 and TIMESTAMP < 1625644800 and Cc = 'some value'
FORMAT TabSeparatedWithNamesAndTypes

It is supposed to have two lines , whereas the first line is a large query and the second line is added by jdbc.

However, in the query_log I saw queries like

select avgIf(Ca, Cb in L0) as avgCa1, sumIf(Ca, Cb in L0) as sumCa1, minIf(Ca, Cb in L0) as minCa1, maxIf(Ca, Cb in L0) as maxCa1, anyIf(Ca, Cb in L0) as anyCa1, avgIf(Ca, Cb in L1) as avgCa2, sumIf(Ca, Cb in L1) as sumCa2, minIf(Ca, Cb in L1) as minCa2, maxIf(Ca, Cb in L1) as maxCa2, anyIf(Ca, Cb in L1) as anyCa2, avgIf(Ca, Cb in L2) as avgCa3, sumIf(Ca, Cb in L2) as sumCa3, minIf(Ca, Cb in L2) as minCa3, maxIf(Ca, Cb in L2) as maxCa3, anyIf(Ca, Cb in L2) as anyCa3, avgIf(Ca, Cb in L3) as avgCa4, sumIf(Ca, Cb in L3) as sumCa4, minIf(Ca, Cb in L3) as minCa4, maxIf(Ca, Cb in L3) as maxCa4, anyIf(Ca, Cb in L3) as anyCa4, avgIf(Ca, Cb in L4) as avgCa5, sumIf(Ca, Cb in L4) as sumCa5, minIf(Ca, Cb in L4) as minCa5, maxIf(Ca, Cb in L4) as maxCa5, anyIf(Ca, Cb in L4) as anyCa5, avgIf(Ca, Cb in L5) as avgCa6, sumIf(Ca, Cb in L5) as sumCa6, minIf(Ca, Cb in L5) as minCa6, maxIf(Ca, Cb in L5) as maxCa6, anyIf(Ca, Cb in L5) as anyCa6, avgIf(Ca, Cb in L6) as avgCa7, sumIf(Ca, Cb in L6) as sumCa7, minIf(Ca, Cb in L6) as minCa7, maxIf(Ca, Cb in L6) as maxCa7, anyIf(Ca, Cb in L6) as anyCa7, avgIf(Ca, Cb in L7) as avgCa8, sumIf(Ca, Cb in L7) as sumCa8, minIf(Ca, Cb in L7) as minCa8, maxIf(Ca, Cb in L7) as maxCa8, anyIf(Ca, Cb in L7) as anyCa8, avgIf(Ca, Cb in L8) as avgCa9, sumIf(Ca, Cb in L8) as sumCa9, minIf(Ca, Cb in L8) as minCa9, maxIf(Ca, Cb in L8) as maxCa9, anyIf(Ca, Cb in L8) as anyCa9, avgIf(Ca, Cb in L9) as avgCa10, sumIf(Ca, Cb in L9) as sumCa10, minIf(Ca, Cb in L9) as minCa10, maxIf(Ca, Cb in L9) as maxCa10, anyIf(Ca, Cb in L9) as anyCa10 from test.test_table where TIMESTAMP >= 1625646900 and TIMESTAMP < 1625646960 and Cc = 'some value'
 TIMESTAMP >= 1625646900 and TIMESTAMP < 1625646960 and Cc = 'some value'
FORMAT TabSeparatedWithNamesAndTypes

and

select avgIf(Ca, DN in L0) as avgCa1, sumIf(Ca, DN in L0) as sumCa1, minIf(Ca, DN in L0) as minCa1, maxIf(Ca, DN in L0) as maxCa1, anyIf(Ca, DN in L0) as anyCa1, avgIf(Ca, DN in L1) as avgCa2, sumIf(Ca, DN in L1) as sumCa2, minIf(Ca, DN in L1) as minCa2, maxIf(Ca, DN in L1) as maxCa2, anyIf(Ca, DN in L1) as anyCa2, avgIf(Ca, DN in L2) as avgCa3, sumIf(Ca, DN in L2) as sumCa3, minIf(Ca, DN in L2) as minCa3, maxIf(Ca, DN in L2) as maxCa3, anyIf(Ca, DN in L2) as anyCa3, avgIf(Ca, DN in L3) as avgCa4, sumIf(Ca, DN in L3) as sumCa4, minIf(Ca, DN in L3) as minCa4, maxIf(Ca, DN in L3) as maxCa4, anyIf(Ca, DN in L3) as anyCa4, avgIf(Ca, DN in L4) as avgCa5, sumIf(Ca, DN in L4) as sumCa5, minIf(Ca, DN in L4) as minCa5, maxIf(Ca, DN in L4) as maxCa5, anyIf(Ca, DN in L4) as anyCa5, avgIf(Ca, DN in L5) as avgCa6, sumIf(Ca, DN in L5) as sumCa6, minIf(Ca, DN in L5) as minCa6, maxIf(Ca, DN in L5) as maxCa6, anyIf(Ca, DN in L5) as anyCa6, avgIf(Ca, DN in L6) as avgCa7, sumIf(Ca, DN in L6) as sumCa7, minIf(Ca, DN in L6) as minCa7, maxIf(Ca, DN in L6) as maxCa7, anyIf(Ca, DN in L6) as anyCa7, avgIf(Ca, DN in L7) as avgCa8, sumIf(Ca, DN in L7) as sumCa8, minIf(Ca, DN in L7) as minCa8, maxIf(Ca, DN in L7) as maxCa8, anyIf(Ca, DN in L7) as anyCa8, avgIf(Ca, DN in L8) as avgCa9, sumIf(Ca, DN in L8) as sumCa9, minIf(Ca, DN in L8) as minCa9, maxIf(Ca, DN in L8) as maxCa9, anyIf(Ca, DN in L8) as anyCa9, avgIf(Ca, DN in L9) as avgCa10, sumIf(Ca, DN in L9) as sumCa10, minIf(Ca, DN in L9) as minCa10, maxIf(Ca, DN in L9) as maxCa10, anyIf(Ca, DN in L9) as anyCa10 from test.DT_NETWORKIF1 where TIMESTAMP >= 1625646840 and TIMESTAMP < 1625646900 and IFNAME = 'eth0'
a, Cb in L3) as avgCa4, sumIf(Ca, Cb in L3) as sumCa4, minIf(Ca, Cb in L3) as minCa4, maxIf(Ca, Cb in L3) as maxCa4, anyIf(Ca, Cb in L3) as anyCa4, avgIf(Ca, Cb in L4) as avgCa5, sumIf(Ca, Cb in L4) as sumCa5, minIf(Ca, Cb in L4) as minCa5, maxIf(Ca, Cb in L4) as maxCa5, anyIf(Ca, Cb in L4) as anyCa5, avgIf(Ca, Cb in L5) as avgCa6, sumIf(Ca, Cb in L5) as sumCa6, minIf(Ca, Cb in L5) as minCa6, maxIf(Ca, Cb in L5) as maxCa6, anyIf(Ca, Cb in L5) as anyCa6, avgIf(Ca, Cb in L6) as avgCa7, sumIf(Ca, Cb in L6) as sumCa7, minIf(Ca, Cb in L6) as minCa7, maxIf(Ca, Cb in L6) as maxCa7, anyIf(Ca, Cb in L6) as anyCa7, avgIf(Ca, Cb in L7) as avgCa8, sumIf(Ca, Cb in L7) as sumCa8, minIf(Ca, Cb in L7) as minCa8, maxIf(Ca, Cb in L7) as maxCa8, anyIf(Ca, Cb in L7) as anyCa8, avgIf(Ca, Cb in L8) as avgCa9, sumIf(Ca, Cb in L8) as sumCa9, minIf(Ca, Cb in L8) as minCa9, maxIf(Ca, Cb in L8) as maxCa9, anyIf(Ca, Cb in L8) as anyCa9, avgIf(Ca, Cb in L9) as avgCa10, sumIf(Ca, Cb in L9) as sumCa10, minIf(Ca, Cb in L9) as minCa10, maxIf(Ca, Cb in L9) as maxCa10, anyIf(Ca, Cb in L9) as anyCa10 from test.test_table where TIMESTAMP >= 1625646840 and TIMESTAMP < 1625646900 and Cc= 'some value'
FORMAT TabSeparatedWithNamesAndTypes

Some parts in the original query was duplicated in the second line and that directly caused many kinds of expcetions. The parsed query stored in system.query_log table differs almost every time I run it (in the case where exception is thrown)

BTW, it worked fine with Clickhouse version 20.8.12.2 and jdbc version 0.3.1-patch. I have checked the client side using debugger mode, and the parsedStmt and httpContext seemed ok to me.


How to reproduce

  1. creare a table with engine replicatedMergeTree, no need to insert data.
  2. On the java side, send a query to the server side with aggregate function and externaldata specified (like the example shown above),
  3. The larger the query size and number of externalData, the higher the possibility that the query throws exception.

Metadata

Metadata

Assignees

Labels

bugConfirmed user-visible misbehaviour in official releasest-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions