Cloudberry Database version
Matview up-go-date maintenance feature in CloudBerry is subsystem designed to answer, whether matview data is up-to-dat or not (refresh needed).
On current HEAD, it is broken. Multiple ways.
There are three ways to fool gp_matview_aux that I have discovered. Maybe there are more.
- Table inheritance.
- Rewrite rules
- CTE
The latter two of them leads to incorrect query execution results
rewrite:
db2=# create table tt1(i int, j int);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db2=# create table tt2(i int, j int);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db2=# create materialized view mv1 as select * from ^C
db2=# insert into tt2 values(1,2),(2,2), (3,2),(1,4);
INSERT 0 4
db2=# create materialized view mv1 as select j from tt2 where i = 1;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'j' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 2
db2=# create rule r1 as on insert TO tt1 do also insert into tt2 values(1,1);
CREATE RULE
db2=# analyze mv1 ;
ANALYZE
db2=# analyze tt2;
ANALYZE
db2=# select j from tt2 where i = 1;
j
---
2
4
(2 rows)
db2=# table tt2;
i | j
---+---
2 | 2
3 | 2
1 | 2
1 | 4
(4 rows)
db2=# insert into tt1 values(1,1);
INSERT 0 1
db2=# table tt2;
i | j
---+---
2 | 2
3 | 2
1 | 2
1 | 4
1 | 1
(5 rows)
db2=# select j from tt2 where i = 1;
j
---
2
4
1
(3 rows)
db2=# set enable_answer_query_using_materialized_views to true;
SET
db2=# select j from tt2 where i = 1;
j
---
2
4
(2 rows)
db2=#
CTE:
db3=# set enable_answer_query_using_materialized_views to false;
SET
db3=# select count(1) from tt1 where i = 1;
count
-------
12
(1 row)
db3=# set enable_answer_query_using_materialized_views to true;
SET
db3=# select count(1) from tt1 where i = 1;
count
-------
12
(1 row)
db3=# with mod1 as (insert into tt1 values(1,1) returning *) select * from mod1;
i | j
---+---
1 | 1
(1 row)
db3=# select count(1) from tt1 where i = 1;
count
-------
12
(1 row)
db3=# set enable_answer_query_using_materialized_views to false;
SET
db3=# select count(1) from tt1 where i = 1;
count
-------
13
(1 row)
db3=#
I did not reproduce any incorrect result with table inheritance. This is bacause of has_subclass() check inside answer_query_using_materialized_views function.
The problem is here https://github.com/cloudberrydb/cloudberrydb/blob/main/src/backend/executor/execMain.c#L1047
We cannot rely on neither es_processed (rewrite rules case) nor (operation != CMD_SELECT) check (CTE case).
Rewrite rules case can be fixed via relhasrules check. This will disable AQUMV if relation once has rules. Including SELECT rules.
I dont have any fix for CTE for know. Maybe ill come up with something later.
Also, gp_matview_aux should be fixed. In says
db4=# create table tt1(i int, j int);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
db4=# insert into tt1 select * from generate_series(1,10)a, generate_series(1,10)b;
INSERT 0 100
db4=# create table tt2 (i int) inherits(tt1);
NOTICE: table has parent, setting distribution columns to match parent table
NOTICE: merging column "i" with inherited definition
CREATE TABLE
db4=# insert into tt2 values(1,1);
INSERT 0 1
db4=# create materialized view m1 as select * from tt1 where i = 1;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'i' as the Cloudberry Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 11
db4=# drop table tt2;
DROP TABLE
db4=# select ^C
db4=# table gp_matview_aux ;
mvoid | mvname | datastatus
-------+--------+------------
57503 | m1 | u
(1 rows)
It says datastatus = 'u' which is 100% false.
Also, why gp_matview_tables is shared catalog relation?
What happened
What you think should happen instead
How to reproduce
Operating System
Anything else
No response
Are you willing to submit PR?
Code of Conduct
Cloudberry Database version
Matview up-go-date maintenance feature in CloudBerry is subsystem designed to answer, whether matview data is up-to-dat or not (refresh needed).
On current HEAD, it is broken. Multiple ways.
There are three ways to fool gp_matview_aux that I have discovered. Maybe there are more.
The latter two of them leads to incorrect query execution results
rewrite:
CTE:
I did not reproduce any incorrect result with table inheritance. This is bacause of has_subclass() check inside
answer_query_using_materialized_viewsfunction.The problem is here https://github.com/cloudberrydb/cloudberrydb/blob/main/src/backend/executor/execMain.c#L1047
We cannot rely on neither es_processed (rewrite rules case) nor (operation != CMD_SELECT) check (CTE case).
Rewrite rules case can be fixed via relhasrules check. This will disable AQUMV if relation once has rules. Including SELECT rules.
I dont have any fix for CTE for know. Maybe ill come up with something later.
Also, gp_matview_aux should be fixed. In says
It says datastatus = 'u' which is 100% false.
Also, why gp_matview_tables is shared catalog relation?
What happened
What you think should happen instead
How to reproduce
Operating System
Anything else
No response
Are you willing to submit PR?
Code of Conduct