[SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11

**Describe the problem you faced**

after I update hudi to 0.11 from 0.8, using `spark.table(fullTableName)` to read a hudi table is not working, the table has been sync to hive metastore and spark is connected to the metastore. the error is
```
org.sparkproject.guava.util.concurrent.UncheckedExecutionException: org.apache.hudi.exception.HoodieException: 'path' or 'Key: 'hoodie.datasource.read.paths' , default: null description: Comma separated list of file paths to read within a Hudi table. since version: version is not defined deprecated after: version is not defined)' or both must be specified.
at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.

...

Caused by: org.apache.hudi.exception.HoodieException: 'path' or 'Key: 'hoodie.datasource.read.paths' , default: null description: Comma separated list of file paths to read within a Hudi table. since version: version is not defined deprecated after: version is not defined)' or both must be specified.
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:78)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:353)
	at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:261)
	at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
	at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
	at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
	at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
	at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
```

**To Reproduce**

Steps to reproduce the behavior:

1. using hudi  0.8 to create a hudi table and sync to hive metastore using hive jdbc sync mode
2. update hudi to 0.11
3. add a new column to the table and sync to hive metastore using hive jdbc sync mode
4. read the table using `spark.table`

**Expected behavior**

reading the table should be ok.

**Environment Description**

* Hudi version : 0.11

* Spark version : 3.1.2

* Hive version : 3.1.2

* Hadoop version : 3.1.2

* Storage (HDFS/S3/GCS..) : S3

* Running on Docker? (yes/no) : no


**Additional context**

we are using hive jdbc sync mode to sync hudi table to hive metastore. before we upgrade hudi to 0.11, we will get error for  show create table command.  after we upgrade hudi to 0.11, we add one new column to the table. the error happen after we add the new column. I run show create table  using spark-sql after the error, the command run successful, but the return create table statement is without a location. I also run hive sql, both show create table and select statement is ok.

here are more information. we are using hive jdbc sync mode to sync hudi table to hive metastore. before we upgrade hudi to 0.11, we will get error for  show create table command.  after we upgrade hudi to 0.11, we add one new column to the table. the error happen after we add the new column. I run show create table  using spark-sql after the error, the command run successful, but the return create table statement is without a location. I also run hive sql, both show create table and select statement is ok.

after I drop the hive table and rerun hive sync, it is ok


before hive sync rerun
```
spark-sql> show create table ods.track_signup;
CREATE TABLE `ods`.`track_signup` (
  `_hoodie_commit_time` STRING,
  `_hoodie_commit_seqno` STRING,
  `_hoodie_record_key` STRING,
  `_hoodie_partition_path` STRING,
  `_hoodie_file_name` STRING,
  `act` STRING,
  `time` BIGINT,
  `env` STRING,
  `id` STRING,
  `seer_time` STRING,
  `hh` STRING,
  `app_id` INT,
  `ip` STRING,
  `g` STRING,
  `u` STRING,
  `ga_id` STRING,
  `app_version` STRING,
  `platform` STRING,
  `url` STRING,
  `referer` STRING,
  `medium` STRING,
  `source` STRING,
  `campaign` STRING,
  `stage` STRING,
  `content` STRING,
  `term` STRING,
  `lang` STRING,
  `su` STRING,
  `campaign_track_id` STRING,
  `last_component_id` STRING,
  `regSourceId` STRING,
  `dt` STRING)
USING hudi
PARTITIONED BY (dt)
TBLPROPERTIES (
  'bucketing_version' = '2',
  'last_modified_time' = '1655107146',
  'last_modified_by' = 'hive',
  'last_commit_time_sync' = '20220613152622014')
```
after hive sync rerun
```
spark-sql> show create table ods.track_signup;
CREATE TABLE `ods`.`track_signup` (
  `_hoodie_commit_time` STRING,
  `_hoodie_commit_seqno` STRING,
  `_hoodie_record_key` STRING,
  `_hoodie_partition_path` STRING,
  `_hoodie_file_name` STRING,
  `act` STRING COMMENT 'xxx',
  `time` BIGINT COMMENT 'xxx',
  `env` STRING COMMENT 'xxx',
  `id` STRING COMMENT 'xxx',
  `seer_time` STRING COMMENT 'xxx',
  `hh` STRING,
  `app_id` INT COMMENT 'xxx',
  `ip` STRING COMMENT 'xxx',
  `g` STRING COMMENT 'xxx',
  `u` STRING COMMENT 'xxx',
  `ga_id` STRING COMMENT 'xxx',
  `app_version` STRING COMMENT 'xxx',
  `platform` STRING COMMENT 'xxx',
  `url` STRING COMMENT 'xxx',
  `referer` STRING COMMENT 'xxx',
  `medium` STRING COMMENT 'xxx',
  `source` STRING COMMENT 'xxx',
  `campaign` STRING COMMENT 'xxx',
  `stage` STRING COMMENT 'xxx',
  `content` STRING COMMENT 'xxx',
  `term` STRING COMMENT 'xxx',
  `lang` STRING COMMENT 'xxx',
  `su` STRING COMMENT 'xxx',
  `campaign_track_id` STRING COMMENT 'xxx',
  `last_component_id` STRING COMMENT 'xxx',
  `regSourceId` STRING,
  `dt` STRING)
USING hudi
OPTIONS (
  `hoodie.query.as.ro.table` 'false')
PARTITIONED BY (dt)
LOCATION 's3://xxxx/track_signup'
TBLPROPERTIES (
  'bucketing_version' = '2',
  'last_modified_time' = '1655134599',
  'last_modified_by' = 'hive',
  'last_commit_time_sync' = '20220613153932664')
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11 #5861

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11 #5861

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions