ARROW-16083: [C++] Implement AsofJoin execution node #13028

icexelloss · 2022-04-28T20:46:34Z

Overview

This is a work in progress implementation of the AsofJoin node in Arrow C++ compute. The code needs quite a bit of clean up but I have worked on this long enough that I think I benefit from some inputs/comments from Arrow maintainers about the high levels before I potentially spend too much time in the wrong direction.

All Credit to @stmrtn (Steven Martin) who is the original author of the code.

Implementation

There are quite a bit of code and here is how it works at the high level:

Classes:

InputState: A class that handles queuing for input batches and purging unneeded batches. There are one input state per input table.
MemoStore: A class that responsible for advancing row index and getting the latest row for each key for each key given a timestamp. (Latest timestamp that is <= the given timestamp)
CompositeReferenceTable: A class that is responsible for storing temporary output rows and produces RecordBatches from those rows.

Algorithm:

The node takes one left side table and n right side tables, and produces a joined table
It is currently assumed that each input table will call InputReceived with time-ordered batches. InputReceived will queue the batches inside InputState (it doesn't do any work). There is a separate process thread that wakes up when there is new inputs and attempts to produces a output batch. If the current data is not enough to produce the output batch (i.e., we have not received all the potential right side rows that could be a match for the current left batch), it will wait for new inputs.
The process thread works as follows:
1. Advance left row index for the current batch. Then advance right tables to get the latest right row (i.e., latest right row with timestamp <= left row timestamp)
2. Once advances are done, it will continue to check to produce the output row for the current left row
3. Go to 1 until left batches are processed
4. Output batch for the current left batch
5. Purge batches that are no longer needed
6. Wait until enough batches are received to process the next left batch

Entry point for the algorithm is process()

TODO

More Tests
Decide if we can replace CompositeReferenceTable with sth that already exits (perhaps RowEncoder?)
Life cycle management for the process thread (or whether or not we should have it)
Lint & Code Style
Handle null results properly
Handle errors properly (e.g., unsupported types)
Clean up debug statement

Follow up

Handle more datatypes (both key and value)
Handle multiple keys
Change from column name to column index for time and key column (Substrait integration)
Look into using table_builder to reproduce the materialize logic.

github-actions · 2022-04-28T20:46:54Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

cpp/src/arrow/compute/exec/asof_join.cc

github-actions · 2022-04-28T20:48:25Z

https://issues.apache.org/jira/browse/ARROW-16083

github-actions · 2022-04-28T20:48:26Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

icexelloss · 2022-04-28T20:51:30Z

@westonpace Not sure if you are the best person to review this. I think I got the basics working but I felt it's also a bit messy. So I am looking for any feedback that can help improve it. Especially the around 1. ergonomic around managing thread/execution model and 2. reusing existing utils/classes for certain things. The algorithm itself is not complicated IMO so it's more like "how do I do it in the Arrow way".

cpp/src/arrow/compute/exec/asof_join_node.cc

stmrtn · 2022-04-28T21:47:47Z

The original implementation of this was a research proof of concept that I was experimenting with. I'd expect it to be significantly revised for a production version. There are parts of the code and some complexity that aren't necessary for this version, and I'd expect changes (for example: potentially using the task mechanism) to make it consistent with other arrow ExecNodes.

westonpace · 2022-04-28T22:04:57Z

Thank you for the contribution. Yes, there would need to be some significant style changes but it would be a welcome addition.

I'll try and give it a more detailed look tomorrow and play around with it. At a glance a few thoughts:

We don't want to use any blocking queues, they will need to be resizable. The execution engine follows a thread-per-core model so any blocking thread is wasted resources (and a potential deadlock). Instead, when the queue "limit" is surpassed, we should call PauseProducing on the inputs until the queue is sufficiently drained.
What happens if the key column(s) have very many values? I didn't trace all the paths but I think that means the memo store could get quite large and become a memory pressure concern. Maybe it is only a concern for malicious inputs and we can just reject the query as invalid. Long term we could probably investigate just storing the row and not the batch, or, if even that is too large, spilling the memo table to disk. I don't think any of this is something we need to solve now, just random thoughts.
We have some utilities for working with row-major data that we had to come up with for the hash-join. I don't recall if these are in the current implementation, the proposed implementation, or both. However, I bet we can find some overlap here to share utilities. I'll try and figure out some suggestions after a detailed look.
What approach are you thinking wrt sequencing as discussed on the mailing list?
Have you given much thought to what an as-of join looks like in Substrait (for example, the current keys option won't work I don't think because it is name based (if I understand the meaning) and not index based)
Happy to give some thoughts on threading. My gut instinct is that threading the join itself won't be critically important for most hardware but I could easily be very wrong on this.

icexelloss · 2022-04-29T14:03:43Z

Thanks @westonpace

We don't want to use any blocking queues, they will need to be resizable. The execution engine follows a thread-per-core model so any blocking thread is wasted resources (and a potential deadlock). Instead, when the queue "limit" is surpassed, we should call PauseProducing on the inputs until the queue is sufficiently drained.

I see. I will try to change this.

What happens if the key column(s) have very many values? I didn't trace all the paths but I think that means the memo store could get quite large and become a memory pressure concern. Maybe it is only a concern for malicious inputs and we can just reject the query as invalid. Long term we could probably investigate just storing the row and not the batch, or, if even that is too large, spilling the memo table to disk. I don't think any of this is something we need to solve now, just random thoughts.

Hmm..not sure if I follow what u mean... Memo store just stores row/index into the batch (it only keeps the latest row for each batch for each key). Even if there are many keys they might just point to the same batch, so it should be a pretty light weight data structure (a couple of pointers per key). Maybe you are concerning many key values pointing to different batches?

We have some utilities for working with row-major data that we had to come up with for the hash-join. I don't recall if these are in the current implementation, the #12326, or both. However, I bet we can find some overlap here to share utilities. I'll try and figure out some suggestions after a detailed look.

Cool thanks. I was sure if I should reuse RowEncoder or not.

What approach are you thinking wrt sequencing as discussed on the mailing list?

I haven't put too much thought into it and want to leave it out of the scope of this PR. (Seems like adding foundation logic for ordered data could be a separate PR). At the high level, I think upstream provides batch index and downstream node reorder them and split out to desk is a reasonable approach, just not sure if there is other solution that is simpler.

Have you given much thought to what an as-of join looks like in Substrait (for example, the current keys option won't work I don't think because it is name based (if I understand the meaning) and not index based)

Yes - @rtpsw did some POC on Substrait + Asof Join and managed to get it to work. We did change to index based from name based in the substrait plan IIRC.

Happy to give some thoughts on threading. My gut instinct is that threading the join itself won't be critically important for most hardware but I could easily be very wrong on this.

Cool thanks.

westonpace · 2022-04-30T04:06:21Z

Ok, managed to look at it a bit more today.

Your sidecar processing thread is probably fine for a first approach. Eventually we will probably want to get rid of it with something that looks like:

InputRecieved(...) {
  // Called for every input
  StoreBatch();
  // Only called sometimes
  if (HaveEnoughToProcess()) {
    Process();
  }
}

The main advantage of the above approach is just to avoid scheduling conflicts / context switches that we would have by introducing another busy thread.

Yes, I misunderstood and my concern about running out of memory was not valid.

The MemoStore approach seems sound but I bet we could come up with something more efficient later. Just thinking out loud I would think something like...

 * When a batch arrives immediately compute a column of hashes based on the key columns (this happens as part of InputReceived before any storage or processing.  Doing it in InputReceived will be important when we get to micro batches because you will want to compute the hashes while the data is still in the CPU cache).  Store the column of hashes with the batch.
 * In processing, take your LHS time column and for each RHS table do something like...

MaterializeAsOf(reference_times, rhs_hashes, rhs_times rhs_columns) {
  vector<row_idx_t> rhs_row_indices = Magic(reference_times, rhs_hashes, rhs_times);
  rhs_columns = []
  for (rhs_column in rhs_columns) {
    rhs_columns.append(Take(rhs_column, rhs_row_indices));
  }
  return rhs_columns;
}

I'd have to think a bit more on how Magic works (and I think it would require an unordered_map) but if you didn't have a key column it would look something like...

Magic([1, 5, 10], [2, 4, 6, 8, 10]) -> [null, 1, 4] // I.e. index of latest_rhs_time for each reference_time

Then Take would just be normal take. But maybe you also have something like...

Magic([1, 2, 3, 4, 5, 6, 7, 8], [2, 4, 6]) -> [null, 0, 0, 1, 1, 2, 2, 2]

I guess Take is just a normal Take there too. That would save you from having to write the materialize logic yourself. This also keeps things "column-based" which will allow for better utilization of SIMD. Others will probably have much smarter opinions too 😄 so we can always play with performance later.

I could be way off base here too. I'll leave it to you on how you want to proceed next. If you want to do more optimization and benchmarking you can. If you want to get something that works going first then I think we can focus on getting this cleaned up and migrated over to the Arrow style (see https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci ). We will also want some more tests. I think my personal preference would be to get something in that is clean with a lot of tests first. Then get some benchmarks in place. Then we can play around with performance.

icexelloss · 2022-05-02T14:36:06Z

Thanks @westonpace I will work on this more Today. Yeah I'd like to get something working and cleaned up first with some baseline performance.

westonpace

Not adding anything new here. Just making an official "review" comment so that Github you can click "re-request reveiw" when you're ready for me to take a look again.

icexelloss · 2022-05-03T19:04:34Z

@westonpace I am trying to extend the implementation to support multiple keys and key types and wonder if you can give some pointers.

Basically I think I would create a "mapper" that maps an input "row" to the "key" and use that as the the hash map key for the given row. This mapper would

take the column name/index that are key during initialization
maps a batch + row index -> key
I am not sure what the type of the "key" is but looking at hash join it seems to use just "string" as the key type (using RowEncoder::encoded_row)

I also take a look at aggregation for other options but didn't find anything obvious.

Did such a "mapper" class already exist in Arrow compute that I can use for this purpose?

westonpace · 2022-05-03T19:40:07Z

arrow::compute::internal::KeyEncoder converts an array of values into an array of bytes, such that each value is represented by one or more contiguous bytes. For example, a standard Arrow boolean array is represented by two non-contiguous "bit buffers" of length/8 bytes. KeyEncoder represents each boolean value with two contiguous bytes, one for validity and one for the value.

arrow::compute::internal::RowEncoder combines the representation from multiple arrow::compute::internal::KeyEncoder instances into a a std::string for the row. This std::string is just a small byte buffer and shouldn't be treated as a "string" in any way. The bytes are the bytes from each key encoder for that row.

So if you have three key columns then the string will be the bytes for the first column followed by the bytes for the second column followed by the bytes for the third column. This string can indeed be used as the key for a hash map.

This approach works ok, but is not the most performant. A newer version is being integrated which uses arrow::compute::Hashing64::HashMultiColumn to calculate an array of 8 bytes hashes. There is no intermediate string that is created.

All of this should not be confused with arrow::compute::KeyEncoder which is a different class entirely that is worried about converting from a columnar format to a row-based format. This is important in hash-join because the output batches are built from random-access into the hash table. It should not be important for as-of join because the output batches are still built from sequential (though possibly skipping) access to the input tables.

icexelloss · 2022-05-03T20:29:02Z

Thanks @westonpace that's super helpful. I will take a look at those classes. Sounds like it's probably worthwhile to take a look at
arrow::compute::Hashing64::HashMultiColumn if that's what we are moving to for hash join.

icexelloss · 2022-05-06T18:24:29Z

I took a stab at using arrow::compute::Hashing64::HashMultiColumn but it seems lot of what I need is added in this PR:
#12326

I will probably wait until that is merged use and the utility functions added by that PR, e.g.

Hashing32::HashBatch

icexelloss · 2022-05-06T19:23:04Z

@westonpace reading back your comments - I wonder if you can explain a bit more "thread-per-core" model here?

The execution engine follows a thread-per-core model so any blocking thread is wasted resources (and a potential deadlock)

The term seems to be used in different context so want to make sure I understand it correctly

westonpace · 2022-05-06T23:38:22Z

Sure. "Thread per core" is probably a bit of a misnomer too, but I haven't found a nicer term yet. The default thread pool size is std::hardware_concurrency which is the maximum number of concurrent threads the hardware supports. So we do not over-allocate threads.

When dealing with I/O you normally want to make sure the system is doing useful work while the I/O is happening. One possible solution is the synchronous approach where you create a pool with a lot of threads, more than your CPU can handle. When I/O is encountered you simply block synchronously on the I/O and let the OS schedule a different thread onto the hardware.

We don't do that today. Instead we take an asynchronous approach. To implement this we actually have two thread pools. The I/O thread pool is sized based on how many concurrent I/O requests make sense (e.g. not very many for HDD and a lot for S3). It is expected these threads are usually in a waiting state.

The second thread pool (the one that, by default, drives the execution engine) is the CPU thread pool. This thread pool (again, by default) has a fixed size based on the processor hardware. It's very important not to block a CPU thread because that usually means you are under utilizing the hardware.

westonpace · 2022-05-06T23:45:20Z

The second thing you will often see mentioned is the "morsel / batch" model. When reading data in you often want to read it in largish blocks of data (counter-intuitively, these large blocks are referred to as "morsels"). This can lead to an execution engine that is roughly:

parallel for morsel in data_source:
  for operator in pipeline:
    morsel = operator(morsel)
  send_to_sink(morsel)

However, since each operator is often going over the same data, and morsels are often bigger than CPU caches, this can be inefficient. Instead the ideal approach is:

parallel for morsel in data_source:
  for l2_batch in morsel:
    for operator in operators:
      l2_batch = operator(l2_batch)
    send_to_sink(l2_batch)

This is the model we are trying to work towards in the current execution engine (the hash join is pretty close, the projection and filter nodes still have some work to do before they can handle small batches).

Also note that only the outermost loop is parallel. The "morsel" (the larger chunk) is the unit of parallelism. For data sources that have very large row groups we actually have another round of slicing but that's slightly off-topic.

icexelloss · 2022-05-10T14:48:07Z

Thanks for the explanation!

icexelloss · 2022-05-11T16:19:46Z

@github-actions autotune

icexelloss · 2022-05-11T17:27:49Z

@westonpace I am sort of stuck on this PR and can use some help.

CI
The CI appears to be failed due to a GTest issue:

Building Substrait from source
-- Building (vendored) mimalloc from source
CMake Error at cmake_modules/ThirdpartyToolchain.cmake:239 (find_package):
  Could not find a package configuration file provided by "GTest" (requested
  version 1.10.0) with any of the following names:
    GTestConfig.cmake
    gtest-config.cmake
  Add the installation prefix of "GTest" to CMAKE_PREFIX_PATH or set
  "GTest_DIR" to a directory containing one of the above files.  If "GTest"
  provides a separate development package or SDK, be sure it has been
  installed.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1972 (resolve_dependency)
  CMakeLists.txt:552 (include)

I am not sure what causes this issue or if this is something I did - any suggestions?

Lint & Code Style
I have been trying to deal with lint the past day or two and has very limited success. I am able to run ninja format lint clang-tidy lint_cpp_cli but it takes a very long time (hours?) and reformatted / gave errors on some files that I do not touch. (I think clang-tidy is the one that is particular problematic). I wonder if I did something wrong. (I installed clang-format 12)

The iwyu step doesn't seem to work for me. I followed the instruction and run until ./$IWYU_SH match arrow/compute/exec/ but it doesn't seem to remove any includes. I ran

cmake .. --preset ninja-debug-basic -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

(Didn't run the exact cmake command from the development doc because I it gives me an missing openSSL error when configuring for some subcomponents)
then

(arrow-dev) icexelloss@LiUbuntu:~/workspace/arrow/cpp/iwyu$ $IWYU_SH match arrow/compute/exec/asof
include-what-you-use 0.17 based on Ubuntu clang version 13.0.1-2ubuntu2
Running IWYU on  /home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/asof_join.h /home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/asof_join_node.cc /home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/asof_join_node_test.cc

but nothing seem to have changed

westonpace · 2022-05-11T19:21:42Z

Thank you for asking. I think I probably should have been clearer. It sounds like you might be doing more than is needed (we should maybe update our developer docs or put some work into getting these pieces working again).

The CI appears to be failed due to a GTest issue:

Yes, this particular GTest issue is not related to your change. I think it's being worked on but you don't need to worry about it.

Lint & Code Style

I wouldn't worry about running clang-tidy. As long as you don't get any compiler warnings and pass clang format then that should be good enough. Not all style issues have static checks however. For example, concurrent_queue should be ConcurrentQueue but we can do a code review to look for those.

The iwyu step doesn't seem to work for me.

I don't know that it works for anyone. I would say that our philosophy is iwyu but we don't have any way of enforcing it. So this is more of a "if you have any doubts use the rules of iwyu to know what to include" but don't stress too much if you might have missed something.

If you remove [WIP] from your title then more CI jobs should run. As long as these are passing I think we can catch any remaining style issues in code review.

icexelloss · 2022-05-12T19:26:13Z

Thanks @westonpace that's very helpful. I think I know how to proceed now (pending the CI failure issue).

icexelloss · 2022-06-01T20:41:58Z

@westonpace I have pushed another revision and addressed most of the comments (and replied the ones that I didn't address or have questions)

Please take another look thank you!

cpp/src/arrow/compute/exec/asof_join_node.cc

westonpace · 2022-06-03T21:46:09Z

cpp/src/arrow/compute/exec/asof_join_node.cc

+    // Build the result
+    assert(sizeof(size_t) >= sizeof(int64_t));  // Make takes signed int64_t for num_rows
+
+    // TODO: check n_rows for cast


I think I would probably just do...

DCHECK_LE(n_rows, std::numeric_limits<int32_t>::max());

...but I wouldn't be surprised if we have a shortcut for that somewhere that I'm just not aware of yet.

cpp/src/arrow/compute/exec/asof_join_node.cc

cpp/src/arrow/compute/exec/asof_join_node_test.cc

westonpace · 2022-06-03T22:48:00Z

cpp/src/arrow/compute/exec/asof_join_node_test.cc

+  join.inputs.emplace_back(Declaration{
+      "source", SourceNodeOptions{l_batches.schema, l_batches.gen(false, false)}});
+  join.inputs.emplace_back(Declaration{
+      "source", SourceNodeOptions{r0_batches.schema, r0_batches.gen(false, false)}});
+  join.inputs.emplace_back(Declaration{
+      "source", SourceNodeOptions{r1_batches.schema, r1_batches.gen(false, false)}});


We should have at least some testing with slow / parallel inputs. I don't think a slow input is capable of reordering data but it might be so we can hold off if that is the case.

Hmm..

I tried to change to parallel test by

auto exec_ctx = arrow::internal::make_unique<ExecContext>(default_memory_pool(), arrow::internal::GetCpuThreadPool());

but then I see something weird happening - the node seems to receive data out of order. (Note for k=0, input data is out of order).

(k=0) time=([ 2000, 2000 ]) InputFinished find InputFinished END (k=1) time=([ 1500, 2000, 2500 ]) InputReceived END (k=0) time=([ 0, 0, 500, 1000, 1500, 1500 ])

My understanding was even with parallel execution, each data source should be single thread, which doesn't seem to be be case. Before I dig too deep, I wonder if my understanding is correct about how arrow::internal::GetCpuThreadPool() works?

My thinking was to leave the executor as nullptr but change things like l_batches.gen(false, false) to l_batches.gen(false, true). Each input should still be ordered but since there is some delay there might be some variation in arrival order. I think right now you will always get (assuming two batches per input)...

l_batches::0 l_batches::1 l_batches::end r0_batches::0 r0_batches::1 r0_batches::end r1_batches::0 r1_batches::1 r1_batches::end

I was hoping it would be possible that adding slow could yield something like...

l_batches::0 r0_batches::0 l_batches::1 r1_batches::0 r0_batches::1 l_batches::end r0_batches::end r1_batches::1 r1_batches::end

However, if that is not what you are seeing, then we don't need to dig too deeply and can worry about it more when we add support for parallelism.

sounds good

westonpace · 2022-06-03T22:52:23Z

cpp/src/arrow/compute/exec/asof_join_node_test.cc

+namespace arrow {
+namespace compute {
+
+BatchesWithSchema GenerateBatchesFromString(


Rather than copy this from hash join node tests can we move this method to src/arrow/compute/exec/test_util? It seems generally useful.

Good call. Moved

westonpace · 2022-06-03T22:58:13Z

cpp/src/arrow/compute/exec/asof_join_node_test.cc

+      schema({field("time", int64()), field("key", int32()), field("r0_v0", utf8())}));
+}
+
+}  // namespace compute


This testing is probably ok to get started. However, the node has to deal with synchronization and has several corner cases. I think it would be good someday to get a test that generated a couple hundred batches of "structured" random data that we can later verify. I don't think we need to do this right now but it is something to keep in mind.

westonpace · 2022-06-03T23:05:58Z

cpp/src/arrow/compute/exec/asof_join_node.cc

+    auto rb = *batch.ToRecordBatch(input->output_schema());
+
+    _state.at(k)->push(rb);
+    _process.push(true);


Yes, that seems right. Not sure what I was thinking originally.

Looking at it with fresh eyes I think there is a minor possibility that batches_processed_ could be concurrently updated if a single input had multiple empty batches arriving at the same time but it might be easier to change batches_processed_ to an atomic counter instead of using a mutex.

westonpace

I think this is getting close. A few more minor changes but nothing significant. The tests are nice and clean but they might be complete enough to flesh out all race conditions and corner cases. That can be something we grow out with the feature set too.

iChauster

This prints the name of the unsupported type rather than the address.

cpp/src/arrow/compute/exec/asof_join_node.cc

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Ivan Chau <ivanmchau@gmail.com>

icexelloss · 2022-06-14T21:36:18Z

@westonpace Sorry for the delay - I took some time off and now got back to this.

I left a question regarding parallel execution test and wonder if you have some thoughts - appreciate your help.

iChauster

Requires these adjustments to build

cpp/src/arrow/compute/exec/asof_join_node.cc

westonpace · 2022-06-16T21:45:13Z

@westonpace Sorry for the delay - I took some time off and now got back to this.

@icexelloss

No problem. I'm actually in the middle of moving so I have somewhat spotty availability this week (and probably next week) anyways. I've responded to your question.

icexelloss · 2022-06-22T13:36:44Z

@westonpace This should be good to go. I addressed all your comments and left follow ups. CI is also green.

Let me know you have more comments.

Li

icexelloss · 2022-06-23T18:33:39Z

@westonpace Gentle ping - Any thing else you would like me to change ?

westonpace · 2022-06-24T00:46:23Z

@icexelloss Apologies, I've spent this week (and most of the last) moving. I'm in my new location now so I should be able to get to this soon (ideally tomorrow).

icexelloss · 2022-06-24T14:40:13Z

@westonpace No worries. Appreciate the heads up.

westonpace

Thanks for getting this created. This is a cool new capability and a good starting point. Looking forward to seeing new changes built on top of this.

@stmrtn

## Overview This is a work in progress implementation of the AsofJoin node in Arrow C++ compute. The code needs quite a bit of clean up but I have worked on this long enough that I think I benefit from some inputs/comments from Arrow maintainers about the high levels before I potentially spend too much time in the wrong direction. All Credit to @stmrtn (Steven Martin) who is the original author of the code. ## Implementation There are quite a bit of code and here is how it works at the high level: Classes: * `InputState`: A class that handles queuing for input batches and purging unneeded batches. There are one input state per input table. * `MemoStore`: A class that responsible for advancing row index and getting the latest row for each key for each key given a timestamp. (Latest timestamp that is <= the given timestamp) * `CompositeReferenceTable`: A class that is responsible for storing temporary output rows and produces RecordBatches from those rows. Algorithm: * The node takes one left side table and n right side tables, and produces a joined table * It is currently assumed that each input table will call `InputReceived` with time-ordered batches. `InputReceived` will queue the batches inside `InputState` (it doesn't do any work). There is a separate process thread that wakes up when there is new inputs and attempts to produces a output batch. If the current data is not enough to produce the output batch (i.e., we have not received all the potential right side rows that could be a match for the current left batch), it will wait for new inputs. * The process thread works as follows: 1. Advance left row index for the current batch. Then advance right tables to get the latest right row (i.e., latest right row with timestamp <= left row timestamp) 2. Once advances are done, it will continue to check to produce the output row for the current left row 3. Go to 1 until left batches are processed 4. Output batch for the current left batch 5. Purge batches that are no longer needed 6. Wait until enough batches are received to process the next left batch Entry point for the algorithm is `process()` ## TODO - [x] More Tests - [x] Decide if we can replace `CompositeReferenceTable` with sth that already exits (perhaps `RowEncoder`?) - [x] Life cycle management for the process thread (or whether or not we should have it) - [x] Lint & Code Style - [x] Handle null results properly - [x] Handle errors properly (e.g., unsupported types) - [x] Clean up debug statement ## Follow up - Handle more datatypes (both key and value) - Handle multiple keys - Change from column name to column index for time and key column (Substrait integration) - Look into using `table_builder` to reproduce the materialize logic. Authored-by: Li Jin <ice.xelloss@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

github-actions bot added the Component: C++ label Apr 28, 2022

icexelloss commented Apr 28, 2022

View reviewed changes

cpp/src/arrow/compute/exec/asof_join.cc Outdated Show resolved Hide resolved

icexelloss changed the title ~~[WIP][ARROW-16083][C++] Implement AsofJoin execution node~~ ARROW-16083]: [WIP][C++] Implement AsofJoin execution node Apr 28, 2022

icexelloss commented Apr 28, 2022

View reviewed changes

cpp/src/arrow/compute/exec/asof_join_node.cc Outdated Show resolved Hide resolved

westonpace self-requested a review April 28, 2022 21:30

westonpace reviewed May 2, 2022

View reviewed changes

kou changed the title ~~ARROW-16083]: [WIP][C++] Implement AsofJoin execution node~~ ARROW-16083: [WIP][C++] Implement AsofJoin execution node May 6, 2022

icexelloss changed the title ~~ARROW-16083: [WIP][C++] Implement AsofJoin execution node~~ ARROW-16083: [C++] Implement AsofJoin execution node May 11, 2022

wip

23b8c71

Use implicit ctor for optional

9f3d5c9

Refactor tests

e61b9c1

westonpace self-requested a review June 2, 2022 16:32

Address comments and add check/test for unsuported datatypes

1b4f26b

westonpace reviewed Jun 3, 2022

View reviewed changes

westonpace requested changes Jun 3, 2022

View reviewed changes

iChauster reviewed Jun 13, 2022

View reviewed changes

cpp/src/arrow/compute/exec/asof_join_node.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/exec/asof_join_node.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/exec/asof_join_node.cc Outdated Show resolved Hide resolved

icexelloss and others added 2 commits June 14, 2022 11:17

Apply suggestions from code review

121b3bd

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Apply suggestions from code review

66b6b98

Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Ivan Chau <ivanmchau@gmail.com>

iChauster suggested changes Jun 16, 2022

View reviewed changes

icexelloss added 5 commits June 21, 2022 14:51

Address comments

478f3b7

Remove debug statement

b33effa

Fix minor comment

be417c3

Fix lint

1261e20

Fix lint again

26fae6d

icexelloss requested a review from westonpace June 23, 2022 13:37

westonpace approved these changes Jun 24, 2022

View reviewed changes

westonpace merged commit 916c453 into apache:master Jun 24, 2022

rtpsw mentioned this pull request Aug 23, 2022

ARROW-17412: [C++] AsofJoin multiple keys and types #13880

Merged

zanmato1984 mentioned this pull request May 7, 2025

GH-46224: [C++][Acero] Fix the hang in asof join #46300

Merged

ARROW-16083: [C++] Implement AsofJoin execution node #13028

ARROW-16083: [C++] Implement AsofJoin execution node #13028

Uh oh!

Conversation

icexelloss commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Implementation

TODO

Follow up

Uh oh!

github-actions bot commented Apr 28, 2022

Uh oh!

Uh oh!

github-actions bot commented Apr 28, 2022

Uh oh!

github-actions bot commented Apr 28, 2022

Uh oh!

icexelloss commented Apr 28, 2022

Uh oh!

Uh oh!

stmrtn commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonpace commented Apr 28, 2022

Uh oh!

icexelloss commented Apr 29, 2022

Uh oh!

westonpace commented Apr 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icexelloss commented May 2, 2022

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

icexelloss commented May 3, 2022

Uh oh!

westonpace commented May 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icexelloss commented May 3, 2022

Uh oh!

icexelloss commented May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icexelloss commented May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonpace commented May 6, 2022

Uh oh!

westonpace commented May 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icexelloss commented May 10, 2022

Uh oh!

icexelloss commented May 11, 2022

Uh oh!

icexelloss commented May 11, 2022

Uh oh!

westonpace commented May 11, 2022

Uh oh!

icexelloss commented May 12, 2022

Uh oh!

icexelloss commented Jun 1, 2022

Uh oh!

Uh oh!

westonpace Jun 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

icexelloss commented Apr 28, 2022 •

edited

Loading

stmrtn commented Apr 28, 2022 •

edited

Loading

westonpace commented Apr 30, 2022 •

edited

Loading

westonpace commented May 3, 2022 •

edited

Loading

icexelloss commented May 6, 2022 •

edited

Loading

icexelloss commented May 6, 2022 •

edited

Loading

westonpace commented May 6, 2022 •

edited

Loading

westonpace Jun 3, 2022 •

edited

Loading