Skip to content

Conversation

@songenjie
Copy link
Contributor

-Support users with high QPS in the same query through result set cache

Fixes #4283

Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also add a trigger that when a table has load or delete ops, you should refresh all cache related that table

@kangkaisen
Copy link
Contributor

@songenjie Hi, what's your design doc for this PR? I think this PR is repeated with SQL mode cache in #4330 , what do you think it?

@songenjie
Copy link
Contributor Author

@songenjie Hi, what's your design doc for this PR? I think this PR is repeated with SQL mode cache in #4330 , what do you think it?

@kangkaisen

  1. Sql Cache Mode
  • If this switch is turned on, the SQL query result set will be cached. If the interval between the last visit version time in all partitions of all tables in the query is greater than cache_last_version_interval_second, and the result set is less than cache_result_max_row_count, the result set will be cached, and the next same SQL will hit the cache.
  1. this PR Result Cache
  • Solve the same connect ID, the same SQL high QPS problem, the same user, in result_ cache_ expire_ after_ In_ Within milliseconds time, executing the same SQL will hit the result cache

@kangkaisen
Copy link
Contributor

  1. this PR Result Cache
  • Solve the same connect ID, the same SQL high QPS problem, the same user, in result_ cache_ expire_ after_ In_ Within milliseconds time, executing the same SQL will hit the result cache

Hi, I think the selfsame OLAP querys in prod env is very rare. Do you test this PR in your prod env? what's the cache hit rate?

The Apache Kylin use SQL result cache Like this PR, After Kylin run many years in MEITUAN, we could confirm the SQL result cache is meaningless for OLAP system.

The Apache Druid use segment cache like #4330, After Druid run many years in MEITUAN, we could confirm the segment cache is useful for OLAP system.

So I think If we already have segment query cache, we don't need the query result cache.

@songenjie
Copy link
Contributor Author

Hi, I think the selfsame OLAP querys in prod env is very rare. Do you test this PR in your prod env? what's the cache hit rate? The Apache Kylin use SQL result cache Like this PR, After Kylin run many years in MEITUAN, we could confirm the SQL result cache is meaningless for OLAP system.

3ks

the result cache has been used in Jingdong for more than half a year, and the cache hit rate is also very high, especially in the e-commerce industry like ours, in special periods and business scenarios, its hit rate is very high

As far as I know, most of the business of meituan is near real-time, and the scope of business use and analysis is slightly different from ours. So maybe the hit rate of cache is not high

@morningman morningman added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 26, 2020
@morningman
Copy link
Contributor

I didn't see any code about cache invalid? Did I miss something?

@songenjie
Copy link
Contributor Author

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any code about cache invalid? Did I miss something?

https://github.com/apache/incubator-doris/blob/6c2ca4690139acdf917193ec4d0acda39b7a148b/fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java#L635

I mean, where is the logic for judging cache invalidation? The cache key (NamedKey) is only composed by connection id and origin stmt? What if the data has been unchanged?

@songenjie
Copy link
Contributor Author

I mean, where is the logic for judging cache invalidation? The cache key (NamedKey) is only composed by connection id and origin stmt? What if the data has been unchanged?

https://github.com/apache/incubator-doris/blob/dd42c6b580afcbb53436dd8b531e03e5fa5d01ed/fe/fe-core/src/main/java/org/apache/doris/cache/SimpleLocalCache.java#L63

use Caffeine.newBuilder().recordStats();

@morningman
Copy link
Contributor

I mean, where is the logic for judging cache invalidation? The cache key (NamedKey) is only composed by connection id and origin stmt? What if the data has been unchanged?

https://github.com/apache/incubator-doris/blob/dd42c6b580afcbb53436dd8b531e03e5fa5d01ed/fe/fe-core/src/main/java/org/apache/doris/cache/SimpleLocalCache.java#L63

use Caffeine.newBuilder().recordStats();

So it only depends on time? If data changed within Config.result_cache_expire_after_in_milliseconds, user in same context still read the old data?

@songenjie
Copy link
Contributor Author

I mean, where is the logic for judging cache invalidation? The cache key (NamedKey) is only composed by connection id and origin stmt? What if the data has been unchanged?

https://github.com/apache/incubator-doris/blob/dd42c6b580afcbb53436dd8b531e03e5fa5d01ed/fe/fe-core/src/main/java/org/apache/doris/cache/SimpleLocalCache.java#L63

use Caffeine.newBuilder().recordStats();

So it only depends on time? If data changed within Config.result_cache_expire_after_in_milliseconds, user in same context still read the old data?

right

@morningman morningman closed this May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[doris][fe]Support users with high QPS in the same query through result cache

4 participants