Expected behavior 期望表现
分页查询获得完整结果
Actual behavior 实际表现
使用hugegraph-tools 1.4.0对以ScyllaDB为后端存储的数据进行backup操作,触发"Unexpected fetched page size"错误。
查看源码得知该错误位置
https://github.com/hugegraph/hugegraph/blob/ed610a0b889cc9a0539c5a43ea3be04ce3e6e940/hugegraph-cassandra/src/main/java/com/baidu/hugegraph/backend/store/cassandra/CassandraEntryIterator.java#L43-L59
猜测是由于ScyllaDB的分页大小限制为1MB导致,这是一个硬限制,无法调整。参考以下网页
https://www.scylladb.com/2017/11/17/7-rules-planning-queries-maximum-performance/
https://stackoverflow.com/questions/56697213/when-does-driver-datastax-driver-paging-yields-fewer-pages-than-requested
使用Cassandra为后端存储未发生该错误
参考DataStax 的Java API,我们做出以下修改:
https://github.com/hugegraph/hugegraph/blob/ed610a0b889cc9a0539c5a43ea3be04ce3e6e940/hugegraph-cassandra/src/main/java/com/baidu/hugegraph/backend/store/cassandra/CassandraEntryIterator.java#L67-L94
将上面的fetch函数修改为:
protected final boolean fetch() {
assert this.current == null;
if (this.next != null) {
this.current = this.next;
this.next = null;
}
while (this.remaining > 0 && this.rows.hasNext()) {
if (this.query.paging()) {
if (!this.results.isFullyFetched())
this.results.fetchMoreResults();
this.remaining--;
}
Row row = this.rows.next();
BackendEntry merged = this.merger.apply(this.current, row);
if (this.current == null) {
// The first time to read
this.current = merged;
} else if (merged == this.current) {
// The next entry belongs to the current entry
assert merged != null;
} else {
// New entry
assert this.next == null;
this.next = merged;
break;
}
}
return this.current != null;
}
强制执行isFullyFetched检测是否还有未获取的page,并执行fetchMoreResults函数。
经过修改之后,可以备份99.98%的数据,目前仍有千分之二的数据无法导出;且多次执行backup命令进行测试,没有导出的数据id不完全相同。
分页查询会影响到包括scan查询、索引重建等多个任务,麻烦hugegraph团队定位一下问题,感谢。
Status of loaded data 数据状态
Vertex/Edge summary 数据量
- loaded vertices amount: 亿级
- loaded edges amount: 十亿级
Specifications of environment 环境信息
- hugegraph version: 0.10.4
- operating system: centos 7.4, 16 CPUs, 128G RAM
- hugegraph backend: scylladb 4.0.4 cluster with 3 nodes, 1 x 1TB SSD disk each node
Expected behavior 期望表现
分页查询获得完整结果
Actual behavior 实际表现
使用hugegraph-tools 1.4.0对以ScyllaDB为后端存储的数据进行backup操作,触发"Unexpected fetched page size"错误。
查看源码得知该错误位置
https://github.com/hugegraph/hugegraph/blob/ed610a0b889cc9a0539c5a43ea3be04ce3e6e940/hugegraph-cassandra/src/main/java/com/baidu/hugegraph/backend/store/cassandra/CassandraEntryIterator.java#L43-L59
猜测是由于ScyllaDB的分页大小限制为1MB导致,这是一个硬限制,无法调整。参考以下网页
https://www.scylladb.com/2017/11/17/7-rules-planning-queries-maximum-performance/
https://stackoverflow.com/questions/56697213/when-does-driver-datastax-driver-paging-yields-fewer-pages-than-requested
使用Cassandra为后端存储未发生该错误
参考DataStax 的Java API,我们做出以下修改:
https://github.com/hugegraph/hugegraph/blob/ed610a0b889cc9a0539c5a43ea3be04ce3e6e940/hugegraph-cassandra/src/main/java/com/baidu/hugegraph/backend/store/cassandra/CassandraEntryIterator.java#L67-L94
将上面的fetch函数修改为:
强制执行isFullyFetched检测是否还有未获取的page,并执行fetchMoreResults函数。
经过修改之后,可以备份99.98%的数据,目前仍有千分之二的数据无法导出;且多次执行backup命令进行测试,没有导出的数据id不完全相同。
分页查询会影响到包括scan查询、索引重建等多个任务,麻烦hugegraph团队定位一下问题,感谢。
Status of loaded data 数据状态
Vertex/Edge summary 数据量
Specifications of environment 环境信息