Skip to content

Conversation

@wg1026688210
Copy link
Contributor

@wg1026688210 wg1026688210 commented Mar 3, 2024

Purpose

When reading a split ,recordReader will loads the schema of the split from the FileSystem.
The pr is for adding the cache of TableSchema for SchemaManager to reduce the access of FileSystem.

Tests

SchemaManagerTest#testCache

API and Format

Documentation

@wg1026688210 wg1026688210 marked this pull request as ready for review March 3, 2024 06:32
* The class is responsible for providing a schemaManager with a concurrent and serializable schema
* cache.
*/
public class SchemaCache implements Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to introduce this class, just merge this into SchemaManager.

}

private void writeObject(ObjectOutputStream out) throws IOException {
Map<Long, TableSchema> map = new HashMap<>(cache.asMap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need serialize this cache.

@wg1026688210 wg1026688210 force-pushed the core/schema_cache branch 4 times, most recently from 0aa4f0b to 6c6e483 Compare March 4, 2024 12:37
private Map<Long, TableSchema> loadSchemaCache(FileIO fileIO, Path path) {
Map<Long, TableSchema> schemaCache = new ConcurrentHashMap<>();
SchemaManager schemaManager = new SchemaManager(fileIO, path);
for (TableSchema schema : schemaManager.listAll()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to list all at first?

protected final TableSchema tableSchema;
protected final CatalogEnvironment catalogEnvironment;

protected final Map<Long, TableSchema> schemaCache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just store SchemaManager here?

@wg1026688210 wg1026688210 changed the title [Core]add cache for SchemaManager [Core]add table schema cache for SchemaManager Mar 11, 2024
private final Map<Long, TableSchema> cache;

public SchemaManager(FileIO fileIO, Path tableRoot) {
this(fileIO, tableRoot, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new hash map? just keep cache not null?

}
this.tableSchema = tableSchema;
this.catalogEnvironment = catalogEnvironment;
tableSchemaManager = new SchemaManager(fileIO, path, new ConcurrentHashMap<>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use public SchemaManager(FileIO fileIO, Path tableRoot)

manager.commitChanges(SchemaChange.setOption("ccc", "ddd"));

Map<Long, TableSchema> cachedSchema = manager.getCachedSchema();
assertThat(cachedSchema).hasSize(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to getCachedSchema.
You can just verify same instances.

@wg1026688210 wg1026688210 requested a review from JingsongLi March 12, 2024 10:28
@JingsongLi
Copy link
Contributor

See #3021

@JingsongLi
Copy link
Contributor

At present, this solution is relatively obscure. Generally speaking, it is difficult to accept a solution where the cache is serialized and reused by distributed tasks.

Considering that #3021 has already been merged and can solve most problems in most cases (without schema changes), I am considering closing this PR.

You can reopen this PR at any time if you have any further needs.

@JingsongLi JingsongLi closed this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants