You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some features and optimizations were left out on the initial PR for removing the manager to avoid having too much in a single PR. This issue contains a list of tasks that can be done to further optimize and add features to the new architecture.
Look into removing as much metadata as possible from the server's local data folder.
In general we should attempt to remove as much of the information that is going to be duplicated between the remote data folder and the local data folder of each node.
When metadata cannot be removed, saving Delta Lake tables and related metadata should be combined into a single request. This includes when we create tables (and save metadata right after) and when we drop tables (and delete metadata right after).
For example if a node receives data it does not have a table for, check the remote object store for the table. Also, if data transfer fails because the table does not exist in the remote object store, drop it locally.
Add better load balancing for queries to replace the current random selection.
Use a very simple optimization that checks if the DeltaTable has been changed since last and saving it in the Cluster struct instead of reading all nodes every time.
Add optimization to cloud nodes where they handle load balancing automatically among themselves to not require the user to always use “get_flight_info()”.
When a task in the list is started, we should consider moving it to a separate issue.
Some features and optimizations were left out on the initial PR for removing the manager to avoid having too much in a single PR. This issue contains a list of tasks that can be done to further optimize and add features to the new architecture.
When a task in the list is started, we should consider moving it to a separate issue.