Skip to content

Conversation

@lidavidm
Copy link
Member

No description provided.

@lidavidm
Copy link
Member Author

Depends on ARROW-7001/#9607 so not quite ready yet. @westonpace, I made commits here to deprecate uses of Scan() in Python/R which you might want to just cherry-pick into your own PR instead.

@github-actions
Copy link

@lidavidm lidavidm force-pushed the arrow-9731 branch 2 times, most recently from 65e2ba8 to 6bcd208 Compare March 31, 2021 17:03
westonpace and others added 7 commits April 6, 2021 19:31
ARROW-7001: First stab at converting datasets logic to async

ARROW-7001: Fixed a bunch of .result()'s in unit tests that weren't really valid (returning a reference to something then deleted)

ARROW-7001: Missed on change during rebase

ARROW-7001: Renamed ScanSync to Scan and ExecuteSync to Execute to preserve the old mirror APIs until the public bindings can be removed

ARROW-7001: Added a few more mirror APIs to get python build working

ARROW-7001: WIP

ARROW-7001: Various WIP

ARROW-7001: WIP

ARROW-7001: First stab at converting datasets logic to async

ARROW-7001: Fixed a bunch of .result()'s in unit tests that weren't really valid (returning a reference to something then deleted)

ARROW-7001: Renamed ScanSync to Scan and ExecuteSync to Execute to preserve the old mirror APIs until the public bindings can be removed

ARROW-7001: Added a few more mirror APIs to get python build working

ARROW-7001: WIP

ARROW-7001: Various WIP

ARROW-7001: Minor fixes to get semantics right

ARROW-7001: Cleanup

ARROW-7001: Fixing some compile errors after rebase

ARROW-7001: Fixing errors from rebase

ARROW-7001: Added a test for reordering datasets.  Removed old concept of splittable.  Fixed bug where file errors may not pass through

ARROW-7001: Somewhere in the rebasing I lost the 1-arg ScannerBuilder constructor.  Added it back in and created a unit test for it for good measure.

ARROW-7001: Removing a ... to see if it removes illegal instruction on mac

ARROW-7001: Fixed a potential memory issue in the preserve ordering test

ARROW-7001: lint

ARROW-7001: Changed from using optional<bool> which isn't allowed to just returning the scan task in Scanner::ToTableAsync::table_building_task

ARROW-7001: Removed the forced transfer as it was not truly doing anything

ARROW-7001: The CSV scan task was doing a read on the CPU thread pool and it was preventing the async chain from getting setup immediately slowing things down.  In addition, the later readahead buffers need to be larger to prevent the CPU thread from idling when things arrive out of order.

ARROW-7001: Need to put the impl for Scanner::ToTable in the cc file so it ends up in the so

ARROW-7001: Added a reordering test

ARROW-7001: Added ordering to scanner

ARROW-7001: Converted Future<Generator> to Generator

ARROW-7001: File readahead was not working correctly and to fix it required quite an overhaul of the scanner but, on the bright side, performance is better on I/O bound tasks

ARROW-7001: Fix failing unit test

ARROW-7001: Cleaned up lint.  Deprecated the old Scan method.  Reworked existing logic to adapt

ARROW-7001: Removing unused code detected by build

ARROW-7001: Moved some code around between header/impl to make MSVC happy.  Fixed up a memory leak in a unit test caused by a circular shared_ptr reference
…R code. To address in ARROW-11782

ARROW-7001: Removed incorrect comment from MakeMappedGenerator

ARROW-7001: Fixed a regression present when reading IPC fully buffered in memory

ARROW-7001: Made the InMemoryDataset creation methods consistent.

ARROW-7001: Adding back in (hopefully legacy) constructor for InMemoryScanTask needed by cglib
@lidavidm
Copy link
Member Author

lidavidm commented Apr 8, 2021

I'll replace this once ARROW-11797/#9589 lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants