-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[Core]add table schema cache for SchemaManager #2939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * The class is responsible for providing a schemaManager with a concurrent and serializable schema | ||
| * cache. | ||
| */ | ||
| public class SchemaCache implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to introduce this class, just merge this into SchemaManager.
| } | ||
|
|
||
| private void writeObject(ObjectOutputStream out) throws IOException { | ||
| Map<Long, TableSchema> map = new HashMap<>(cache.asMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need serialize this cache.
0aa4f0b to
6c6e483
Compare
6c6e483 to
7b5b862
Compare
7b5b862 to
a013cad
Compare
| private Map<Long, TableSchema> loadSchemaCache(FileIO fileIO, Path path) { | ||
| Map<Long, TableSchema> schemaCache = new ConcurrentHashMap<>(); | ||
| SchemaManager schemaManager = new SchemaManager(fileIO, path); | ||
| for (TableSchema schema : schemaManager.listAll()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to list all at first?
| protected final TableSchema tableSchema; | ||
| protected final CatalogEnvironment catalogEnvironment; | ||
|
|
||
| protected final Map<Long, TableSchema> schemaCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just store SchemaManager here?
7df4bf1 to
9608263
Compare
| private final Map<Long, TableSchema> cache; | ||
|
|
||
| public SchemaManager(FileIO fileIO, Path tableRoot) { | ||
| this(fileIO, tableRoot, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new hash map? just keep cache not null?
| } | ||
| this.tableSchema = tableSchema; | ||
| this.catalogEnvironment = catalogEnvironment; | ||
| tableSchemaManager = new SchemaManager(fileIO, path, new ConcurrentHashMap<>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use public SchemaManager(FileIO fileIO, Path tableRoot)
| manager.commitChanges(SchemaChange.setOption("ccc", "ddd")); | ||
|
|
||
| Map<Long, TableSchema> cachedSchema = manager.getCachedSchema(); | ||
| assertThat(cachedSchema).hasSize(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to getCachedSchema.
You can just verify same instances.
428f5bb to
3ae9815
Compare
|
See #3021 |
|
At present, this solution is relatively obscure. Generally speaking, it is difficult to accept a solution where the cache is serialized and reused by distributed tasks. Considering that #3021 has already been merged and can solve most problems in most cases (without schema changes), I am considering closing this PR. You can reopen this PR at any time if you have any further needs. |
Purpose
When reading a split ,recordReader will loads the schema of the split from the FileSystem.
The pr is for adding the cache of TableSchema for SchemaManager to reduce the access of FileSystem.
Tests
SchemaManagerTest#testCache
API and Format
Documentation