Skip to content

feat(hbase): support gen HFile for hbase v2 (BETA)#358

Merged
imbajin merged 40 commits intoapache:masterfrom
haohao0103:schemaCache-optimize
Nov 10, 2022
Merged

feat(hbase): support gen HFile for hbase v2 (BETA)#358
imbajin merged 40 commits intoapache:masterfrom
haohao0103:schemaCache-optimize

Conversation

@haohao0103
Copy link
Copy Markdown
Collaborator

@haohao0103 haohao0103 commented Nov 7, 2022

close #357

1、Support write vertex/edge directly to KV storage
2、Just support customString and customNumber ID now
3、submit the loader code for bypass server for hbase writing

@imbajin
Copy link
Copy Markdown
Member

imbajin commented Nov 7, 2022

@JackyYangPassion Is this an improved part?

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 7, 2022

Codecov Report

Merging #358 (e3c8a90) into master (c893f50) will decrease coverage by 2.37%.
The diff coverage is 6.92%.

@@             Coverage Diff              @@
##             master     #358      +/-   ##
============================================
- Coverage     64.82%   62.44%   -2.38%     
- Complexity     1851     1864      +13     
============================================
  Files           255      260       +5     
  Lines          9081     9462     +381     
  Branches        837      874      +37     
============================================
+ Hits           5887     5909      +22     
- Misses         2810     3169     +359     
  Partials        384      384              
Impacted Files Coverage Δ
...om/baidu/hugegraph/loader/builder/EdgeBuilder.java 67.74% <0.00%> (-25.60%) ⬇️
...baidu/hugegraph/loader/builder/ElementBuilder.java 89.71% <ø> (ø)
.../baidu/hugegraph/loader/builder/VertexBuilder.java 61.29% <0.00%> (-21.32%) ⬇️
...com/baidu/hugegraph/loader/constant/Constants.java 75.00% <ø> (ø)
...u/hugegraph/loader/direct/loader/DirectLoader.java 0.00% <0.00%> (ø)
...egraph/loader/direct/loader/HBaseDirectLoader.java 0.00% <0.00%> (ø)
...aidu/hugegraph/loader/direct/util/SinkToHBase.java 0.00% <0.00%> (ø)
...ugegraph/loader/metrics/LoadDistributeMetrics.java 0.00% <0.00%> (ø)
...u/hugegraph/loader/spark/HugeGraphSparkLoader.java 0.00% <0.00%> (ø)
...m/baidu/hugegraph/loader/executor/LoadOptions.java 70.40% <30.00%> (-4.60%) ⬇️
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@haohao0103 haohao0103 changed the title bypass server for hbase writing hugegraph-loader (BETA) feat(hbase): support gen HFile for hbase(BETA) Nov 7, 2022
@JackyYangPassion
Copy link
Copy Markdown
Contributor

@JackyYangPassion Is this an improved part?

  1. support bulkload from hive with client bypass server feature.
  2. this feature has been launched, which solves the problem of importing large amounts of data through API and affecting queries

@imbajin imbajin added enhancement New feature or request todo labels Nov 7, 2022
@imbajin
Copy link
Copy Markdown
Member

imbajin commented Nov 7, 2022

OK, mark it also as to be reviewed.

and could u handle the third-party dependencies check?

@haohao0103
Copy link
Copy Markdown
Collaborator Author

1、The code style has been adjusted,
2、third-party dependencies has added to the known-dependencies.txt
@JackyYangPassion @javeme @imbajin

@imbajin
Copy link
Copy Markdown
Member

imbajin commented Nov 8, 2022

1、The code style has been adjusted,
2、third-party dependencies has added to the known-dependencies.txt
@JackyYangPassion @javeme @imbajin

thanks,the 3rd party check seems failed,need some help?

Copy link
Copy Markdown
Contributor

@javeme javeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution~
please also address other comments: https://github.com/apache/incubator-hugegraph-toolchain/pull/358/files (search by "ago"), and also address this file LoadOptions.java

imbajin
imbajin previously approved these changes Nov 9, 2022
@haohao0103
Copy link
Copy Markdown
Collaborator Author

@imbajin Hi, I can help solve the loader ci check failure

@imbajin
Copy link
Copy Markdown
Member

imbajin commented Nov 9, 2022

@imbajin Hi, I can help solve the loader ci check failure

Thanks, I have already adopted the basic code, and current the differ is:

expected:

{
    "version":"2.0",
    "structs":[
        {
            "id":"1",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"users.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "Gender",
                    "Age",
                    "Occupation",
                    "Zip-code"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[
                {
                    "label":"user",
                    "skip":false,
                    "id":null,
                    "unfold":false,
                    "field_mapping":{
                        "UserID":"id"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Occupation",
                        "Zip-code",
                        "Gender",
                        "Age"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ],
            "edges":[

            ]
        },
        {
            "id":"2",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"ratings.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "MovieID",
                    "Rating",
                    "Timestamp"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[

            ],
            "edges":[
                {
                    "label":"rating",
                    "skip":false,
                    "source":[
                        "UserID"
                    ],
                    "unfold_source":false,
                    "target":[
                        "MovieID"
                    ],
                    "unfold_target":false,
                    "field_mapping":{
                        "UserID":"id",
                        "MovieID":"id",
                        "Rating":"rate"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Timestamp"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ]
        }
    ]
}

actual:

{
    "version":"2.0",
    "structs":[
        {
            "id":"1",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"users.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "Gender",
                    "Age",
                    "Occupation",
                    "Zip-code"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[
                {
                    "label":"user",
                    "skip":false,
                    "id":null,
                    "unfold":false,
                    "field_mapping":{
                        "UserID":"id"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Occupation",
                        "Zip-code",
                        "Gender",
                        "Age"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ],
            "edges":[

            ]
        },
        {
            "id":"2",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"ratings.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "MovieID",
                    "Rating",
                    "Timestamp"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[

            ],
            "edges":[
                {
                    "label":"rating",
                    "skip":false,
                    "source":[
                        "UserID"
                    ],
                    "unfold_source":false,
                    "target":[
                        "MovieID"
                    ],
                    "unfold_target":false,
                    "field_mapping":{
                        "UserID":"id",
                        "MovieID":"id",
                        "Rating":"rate"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Timestamp"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ]
        }
    ],
    "backendStoreInfo":null
}

seems "backendStoreInfo":null is newly, other problems u could fix it~

@haohao0103
Copy link
Copy Markdown
Collaborator Author

The configuration information of the storage layer that bulkLoad depends on is specified in struct.json, so backendstoreinfo is added. The follow-up iteration is to obtain the configuration information of the storage layer from the server;

@imbajin
Copy link
Copy Markdown
Member

imbajin commented Nov 9, 2022

The configuration information of the storage layer that bulkLoad depends on is specified in struct.json, so backendstoreinfo is added. The follow-up iteration is to obtain the configuration information of the storage layer from the server

it's fine, just adopt it in test 😄 (so as other test problem if exists)

javeme
javeme previously approved these changes Nov 9, 2022
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

@haohao0103
Copy link
Copy Markdown
Collaborator Author

seems "backendStoreInfo":null is newly, other problems u could fix it~

Do I need to solve 3rd dependencies check failed?
I believe many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

ok

@haohao0103
Copy link
Copy Markdown
Collaborator Author

haohao0103 commented Nov 10, 2022

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1 ?

@simon824 could u exclude it in pom? (like #363)

@imbajin imbajin changed the title feat(hbase): support gen HFile for hbase(BETA) feat(hbase): support gen HFile for hbase v2 (BETA) Nov 10, 2022
@imbajin imbajin merged commit a622f98 into apache:master Nov 10, 2022
@simon824
Copy link
Copy Markdown
Member

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1 ?

@simon824 could u exclude it in pom? (like #363)

We can downgrade the version if necessary, hadoop dependency seems can not be excluded , loader needs it to read hdfs files.

@haohao0103
Copy link
Copy Markdown
Collaborator Author

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1 ?
@simon824 could u exclude it in pom? (like #363)

We can downgrade the version if necessary, hadoop dependency seems can not be excluded , loader needs it to read hdfs files.

Yes, loader needs hadoop dependency . Internally, we read data from hdfs and load it into the graph

@imbajin imbajin linked an issue Nov 10, 2022 that may be closed by this pull request
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request todo

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

feat(hbase): bypass server for hbase writing hugegraph-loader (BETA) [Summary] toolchain release v1.0 todo list

6 participants