-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Labels
Component: C++Critical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: bug
Milestone
Description
Describe the bug, including details regarding any error messages, version, and platform.
On some calls to Table.join_asof my python process becomes unresponsive and is using zero cpu. It appears to be a thread deadlock or something similar. I have created an example that causes the deadlock with high probability.
Here are the details of my setup:
- Python 3.12.7
- pyarrow==19.0.1
- numpy==2.2.4
- pandas==2.2.3
- Ubuntu 22.04.5
- CPU: 13th Gen Intel(R) Core(TM) i9-13980HX
I was also able to produce the deadlock on a colleague's Mac laptop with Apple silicon using this example, so I assume it won't make a big difference what hardware it runs on.
On my laptop this always gets deadlocked before the 300th iteration
import numpy as np
import pandas as pd
import pyarrow as pa
n_left = 100
n_right = 200_000
left_start = pd.Timestamp("2025-04-07T07:45:55", tz="UTC")
right_start = pd.Timestamp("2025-04-07T00:00:00", tz="UTC")
time_end = pd.Timestamp("2025-04-07T12:05:59", tz="UTC")
tolerance_nanos = 60 * 1_000_000_000
np.random.seed(0)
def get_timestamps(start, end, n):
seconds = (end - start).total_seconds()
td = np.random.uniform(0, 1, n)
td *= np.random.choice([0, 1], n)
td *= seconds / td.sum()
td = td.cumsum()
return start + pd.to_timedelta(td, "seconds")
left_schema = pa.schema([pa.field("timestamp", pa.timestamp("ns", "UTC"))])
right_schema = pa.schema(
[
pa.field("timestamp", pa.timestamp("ns", "UTC")),
pa.field("value", pa.float64()),
]
)
left = pa.table(
{"timestamp": get_timestamps(left_start, time_end, n_left)},
schema=left_schema,
)
right = pa.table(
{
"timestamp": get_timestamps(right_start, time_end, n_right),
"value": np.random.normal(100, 5, n_right),
},
schema=right_schema,
)
for i in range(1000):
print(f"{i:>5} | {pd.Timestamp.now()}")
left.join_asof(
right,
on="timestamp",
by=[],
tolerance=tolerance_nanos,
)Component(s)
Python
Metadata
Metadata
Assignees
Labels
Component: C++Critical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: bug