-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the problem you faced
after I update hudi to 0.11 from 0.8, using spark.table(fullTableName) to read a hudi table is not working, the table has been sync to hive metastore and spark is connected to the metastore. the error is
org.sparkproject.guava.util.concurrent.UncheckedExecutionException: org.apache.hudi.exception.HoodieException: 'path' or 'Key: 'hoodie.datasource.read.paths' , default: null description: Comma separated list of file paths to read within a Hudi table. since version: version is not defined deprecated after: version is not defined)' or both must be specified.
at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.
...
Caused by: org.apache.hudi.exception.HoodieException: 'path' or 'Key: 'hoodie.datasource.read.paths' , default: null description: Comma separated list of file paths to read within a Hudi table. since version: version is not defined deprecated after: version is not defined)' or both must be specified.
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:78)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:353)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:261)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
To Reproduce
Steps to reproduce the behavior:
- using hudi 0.8 to create a hudi table and sync to hive metastore using hive jdbc sync mode
- update hudi to 0.11
- add a new column to the table and sync to hive metastore using hive jdbc sync mode
- read the table using
spark.table
Expected behavior
reading the table should be ok.
Environment Description
-
Hudi version : 0.11
-
Spark version : 3.1.2
-
Hive version : 3.1.2
-
Hadoop version : 3.1.2
-
Storage (HDFS/S3/GCS..) : S3
-
Running on Docker? (yes/no) : no
Additional context
we are using hive jdbc sync mode to sync hudi table to hive metastore. before we upgrade hudi to 0.11, we will get error for show create table command. after we upgrade hudi to 0.11, we add one new column to the table. the error happen after we add the new column. I run show create table using spark-sql after the error, the command run successful, but the return create table statement is without a location. I also run hive sql, both show create table and select statement is ok.
here are more information. we are using hive jdbc sync mode to sync hudi table to hive metastore. before we upgrade hudi to 0.11, we will get error for show create table command. after we upgrade hudi to 0.11, we add one new column to the table. the error happen after we add the new column. I run show create table using spark-sql after the error, the command run successful, but the return create table statement is without a location. I also run hive sql, both show create table and select statement is ok.
after I drop the hive table and rerun hive sync, it is ok
before hive sync rerun
spark-sql> show create table ods.track_signup;
CREATE TABLE `ods`.`track_signup` (
`_hoodie_commit_time` STRING,
`_hoodie_commit_seqno` STRING,
`_hoodie_record_key` STRING,
`_hoodie_partition_path` STRING,
`_hoodie_file_name` STRING,
`act` STRING,
`time` BIGINT,
`env` STRING,
`id` STRING,
`seer_time` STRING,
`hh` STRING,
`app_id` INT,
`ip` STRING,
`g` STRING,
`u` STRING,
`ga_id` STRING,
`app_version` STRING,
`platform` STRING,
`url` STRING,
`referer` STRING,
`medium` STRING,
`source` STRING,
`campaign` STRING,
`stage` STRING,
`content` STRING,
`term` STRING,
`lang` STRING,
`su` STRING,
`campaign_track_id` STRING,
`last_component_id` STRING,
`regSourceId` STRING,
`dt` STRING)
USING hudi
PARTITIONED BY (dt)
TBLPROPERTIES (
'bucketing_version' = '2',
'last_modified_time' = '1655107146',
'last_modified_by' = 'hive',
'last_commit_time_sync' = '20220613152622014')
after hive sync rerun
spark-sql> show create table ods.track_signup;
CREATE TABLE `ods`.`track_signup` (
`_hoodie_commit_time` STRING,
`_hoodie_commit_seqno` STRING,
`_hoodie_record_key` STRING,
`_hoodie_partition_path` STRING,
`_hoodie_file_name` STRING,
`act` STRING COMMENT 'xxx',
`time` BIGINT COMMENT 'xxx',
`env` STRING COMMENT 'xxx',
`id` STRING COMMENT 'xxx',
`seer_time` STRING COMMENT 'xxx',
`hh` STRING,
`app_id` INT COMMENT 'xxx',
`ip` STRING COMMENT 'xxx',
`g` STRING COMMENT 'xxx',
`u` STRING COMMENT 'xxx',
`ga_id` STRING COMMENT 'xxx',
`app_version` STRING COMMENT 'xxx',
`platform` STRING COMMENT 'xxx',
`url` STRING COMMENT 'xxx',
`referer` STRING COMMENT 'xxx',
`medium` STRING COMMENT 'xxx',
`source` STRING COMMENT 'xxx',
`campaign` STRING COMMENT 'xxx',
`stage` STRING COMMENT 'xxx',
`content` STRING COMMENT 'xxx',
`term` STRING COMMENT 'xxx',
`lang` STRING COMMENT 'xxx',
`su` STRING COMMENT 'xxx',
`campaign_track_id` STRING COMMENT 'xxx',
`last_component_id` STRING COMMENT 'xxx',
`regSourceId` STRING,
`dt` STRING)
USING hudi
OPTIONS (
`hoodie.query.as.ro.table` 'false')
PARTITIONED BY (dt)
LOCATION 's3://xxxx/track_signup'
TBLPROPERTIES (
'bucketing_version' = '2',
'last_modified_time' = '1655134599',
'last_modified_by' = 'hive',
'last_commit_time_sync' = '20220613153932664')