Skip to content

Conversation

@seawinde
Copy link
Contributor

@seawinde seawinde commented Aug 9, 2024

Proposed changes

commitId: 08c9e05
pr: #38909

CalvinKirs and others added 30 commits July 20, 2024 10:52
…adata to avoid holding the lock for an extended period.apache#38162 (apache#38163)

## Proposed changes
apache#38162

<!--Describe your changes.-->
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: stephen <hello-stephen@qq.com>
…n tasks (apache#37782) (apache#38189)

## Proposed changes

For some urgent compaction tasks, their submittion should take
parallelism into account.

Currently, we apply the control policy for data loading in specific.
Other source of urgent tasks are considered as eager.
…itory (apache#38192)

delete_if_exists is a temporary solution introduced in apache#25847, to avoid
concurrent testing conflicts.

Cherry-pick apache#38190
… (apache#38233)

For large cluster, too many parallel tasks will cause performance issue.
So this PR limit the max parallel tasks in Doris.

pick apache#38196

<!--Describe your changes.-->
…fo level for debug convenience (apache#38133) (apache#38230)

## Proposed changes

As title.
…38131)

Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
…che#38056)

pick from master apache#36902 

## Proposed changes

[fix](mtmv)Fix mtmv name to resolve conflicts

Co-authored-by: Dongyang Li <hello_stephen@qq.com>
…f periodic tasks. (apache#38264)

…

Since there might be delays in execution, taskCount should be set to
>=1.

## Proposed changes

Issue Number: close #xxx

apache#38263
… too (apache#37489) (apache#38281)

pick from master apache#37489

after adjust nullable, some children nullable has changed. so, we need
to update agg_state type inner type nullable too.
apache#38278)

Enabling index compaction in the inverted index V2 format currently
causes unexpected errors, especially in the case of tables with hybrid
indexes, such as BKD index and Fulltext index together.

backport apache#38209
Hastyshell and others added 21 commits August 8, 2024 19:35
…void infinite transaction (apache#38991) (apache#39108)

## Proposed changes

Issue Number: close apache#38956 

As title.
…pache#38894) (apache#39107)

## Proposed changes

1. Use column idx of ref block instead of new block to indicate the ref
column.
2. Rename some variables to clarify their meanings.
3. Clarify some log msg.
4. Add a minimal case to verify the change.
…tions apache#39015  (apache#39099)

## Proposed changes
apache#39015
### Description:

This issue proposes the addition of new features to the project,
including a deadlock detection tool and monitored lock implementations.
These features will help in identifying and debugging potential
deadlocks and monitoring lock usage. Features:


#### AbstractMonitoredLock:

A monitored version of Lock that tracks and logs lock acquisition and
release times. Functionality:
Overrides lock(), unlock(), tryLock(), and tryLock(long timeout,
TimeUnit unit) methods. Logs information about lock acquisition time,
release time, and any failure to acquire the lock within the specified
timeout. ##### eg
```log
2024-08-07 12:02:59  [ Thread-2:2006 ] - [ WARN ]  Thread ID: 12, Thread Name: Thread-2 - Lock held for 1912 ms, exceeding hold timeout of 1000 ms 
Thread stack trace:
	at java.lang.Thread.getStackTrace(Thread.java:1564)
	at org.example.lock.AbstractMonitoredLock.afterUnlock(AbstractMonitoredLock.java:49)
	at org.example.lock.MonitoredReentrantLock.unlock(MonitoredReentrantLock.java:32)
	at org.example.ExampleService.timeout(ExampleService.java:17)
	at org.example.Main.lambda$test2$1(Main.java:39)
	at java.lang.Thread.run(Thread.java:750)
```












#### DeadlockCheckerTool:

Uses ScheduledExecutorService for periodic deadlock checks. Logs
deadlock information including thread names, states, lock info, and
stack traces.

**ThreadMXBean accesses thread information in the local JVM, which is
already in memory, so accessing it is less expensive than fetching data
from external resources such as disk or network. Thread state cache: The
JVM typically maintains a cache of thread states, reducing the need for
real-time calculations or additional data processing.** ##### eg
```log
Thread Name: Thread-0
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Lock Owner Name: Thread-1
Lock Owner Id: 12
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$3(Main.java:79)
	at org.example.Main$$Lambda$1/1221555852.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


2024-08-07 14:11:28  [ pool-1-thread-1:2001 ] - [ WARN ]  Deadlocks detected:
Thread Name: Thread-1
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Lock Owner Name: Thread-0
Lock Owner Id: 11
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$4(Main.java:93)
	at org.example.Main$$Lambda$2/1556956098.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


```
##### benchmark
```
    @WarmUp(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @threads(1)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   15889.407          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     678.061          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.000            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     668.249          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.080            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.075          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.006            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      20.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       6.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  103130.635          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2      ≈ 10⁻⁴          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁶            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts

    @WarmUp(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @threads(100)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   10994.606          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     488.508          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.002            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     481.390          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.163            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.020          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.002            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      18.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       9.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  558652.036          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2       0.016          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁴            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
…che#39019) (apache#39128)

pick from master apache#39019

not add case because be return wrong answer for this

select cast(2.0 as boolean); -- should return 1 not 2
pick from master apache#38758

ensure header do not be changed unexpectedly
apache#39112)

```
Fail to serialize doris.PFetchDataResult
```

If the size of `PFetchDataResult` is greater than 2G, protocol buffer
cannot serialize the message.

pick apache#37990
… (apache#39129)

pick from master apache#39109

when first regulator child output nullable is not right, we may get
wrong nullable output, and lead be crash
## Proposed changes
backport : apache#34848
Issue Number: close #xxx

<!--Describe your changes.-->
…alues of non-key columns for delete stmt in publish phase apache#38703" (apache#39074)

picks apache#38703
… to reduce conflict (apache#38981) (apache#39133)

The Expression.hashCode default is getClass().hashCode(), just contains one level information, so the lots of expressions which is same type will return the same hash code and conflict, then compare deeply in the HashMap cause inefficient and hold table lock for long time.

This pr support fast compute hash code by the bottom literal and slot, reduce the compare expression time because of the conflict of hash code

In my test case, the sql planner time can reduce from 20 minutes(not finished) to 35 seconds
…le (apache#38909)

mv def is 

            select l_orderkey, l_partkey, o_custkey, l_shipdate, o_orderdate 
            from ${hive_catalog_name}.${hive_database}.${hive_table} 
            left join ${internal_catalog}.${olap_db}.${olap_table} on l_orderkey = o_orderkey 

if we query the sql as following, it will rewrite fail by mv, the fail
info is `mv can not offer any partition for query`

            select l_orderkey, l_partkey, o_custkey, l_shipdate, o_orderdate 
            from ${hive_catalog_name}.${hive_database}.${hive_table} 
            left join ${internal_catalog}.${olap_db}.${olap_table} on l_orderkey = o_orderkey 

This pr fix this problem. it would be rewritten by mv successfully.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.