-
Notifications
You must be signed in to change notification settings - Fork 3.7k
SQL result cache and partition cache #3994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Set max cache's size of query results, the unit is M byte | ||
| CONF_Int32(cache_max_size, "256"); | ||
|
|
||
| //Cache memory is pruened when reach cache_max_size + cache_elasticity_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache memory will be shrinked?
i do not understand what does this mean....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to avoid frequent cache cleaning and keep high hit rate, set two config item, cache_max_size and cache_elasticity_size,such as default config value, when reach 256M+128M,cache memory is pruned to 256M
| ``` | ||
| MySQL [(none)]> set [global] enable_sql_cache=true; | ||
| ``` | ||
| 注:globa是全局变量,不加指当前会话变量 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 注:globa是全局变量,不加指当前会话变量 | |
| 注:global是全局变量,不加指当前会话变量 |
|
|
||
| typedef std::unordered_map<UniqueId, ResultNode*> ResultNodeMap; | ||
|
|
||
| // a doubly linked list class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use std::list?
| * SELECT xxx FROM app_event INNER JOIN user_Profile ON app_event.user_id = user_profile.user_id xxx | ||
| * SELECT xxx FROM app_event INNER JOIN user_profile ON xxx INNER JOIN site_channel ON xxx | ||
| */ | ||
| public void checkCacheMode(long now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check whether a SQL hit the cache in explain?
otherwise we will check the log, it's inconvenient
|
I split the PR and submit the be part first |
47c06ff to
c28775a
Compare
4a05946 to
885d9ce
Compare
1. Cache SQL result for T+1 table 2. Cache Partition result for partition table of realtime updated 3. Config and session variables for cache
885d9ce to
bc75361
Compare
#2581
Solutions
This cache give priority to ensuring data consistency. On this basis, it refines the cache granularity and improves the hit rate. Therefore, it has the following characteristics:
Two cache mode
SQLCache
Sql cache stores and fetches the cache according to the SQL signature, partition ID of the query table, and the latest version of the partition.
The combination of the three determines a cache dataset. If any one of them changes, such as SQL changes, query fields or conditions are not the same, or the version after data update changes, the cache will not be hit.
If multiple tables are joined, the latest partition ID and the latest version number are used. If one of the tables is updated, the partition ID or version number will be different, and the cache will not be hit.
Sql cache is more suitable for the scenario of T + 1 update. When the data is updated in the morning, the results of the first query are obtained from be and put into the cache, and the subsequent same query is obtained from the cache. Real time update data can also be used, but there may be a low hit rate. Please refer to the following partitioncache.
PartitionCache
Query the number of users per day in the last 7 days, such as partitioning by date, data is only written to the current partition, and the data of other partitions other than that day are fixed. Under the same query SQL, query a partition that does not update The indicators are fixed. As follows, the number of users in the 7 days before the query on 2020-03-09, the data from 2020-03-03 to 2020-03-07 comes from the cache, the first query from 2020-03-08 comes from the partition, and the subsequent queries come from the cache , 2020-03-09 because of the non-stop writing that day, so from the partition.
Therefore, querying the data of N days, the latest D days of the data update, each day is only a query with a similar date range, only need to query D partitions, the other parts are all from the cache, which can effectively reduce the cluster load and reduce the query time.
Reference
For more information, please read partition_cache.md