-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1572: [C++] Implement "value counts" kernels for tabulating value frequencies #1970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
1761 commits
Select commit
Hold shift + click to select a range
3e87f77
ARROW-2131: [Python] Prepend module path to PYTHONPATH when spawning …
wesm bcbcf02
[JS] Fix typo in npm target for esNext/CommonJS. (#1645)
jheer 5f10067
ARROW-2180: [C++] Remove deprecated APIs from 0.8.0 cycle
wesm cdc347c
ARROW-2132: Add link to Plasma in main README
wesm 27f7eba
ARROW-2069: [Python] Add note that Plasma is not supported on Windows
wesm 81bfb38
ARROW-2185: Strip CI directives from commit messages
wesm d52f2ff
[Dev] Follow-up, use angle brackets for commit author instead of squa…
wesm c2865d0
ARROW-2093: [Python] Do not install PyTorch in Travis CI
wesm 2f01658
ARROW-2201: [Website] Publish JS API Docs
3e3f7c2
ARROW-2066: [Python] Document using pyarrow with Azure Blob Store
rjrussell77 cca4a74
ARROW-2197: Document C++ ABI issue and workaround
pitrou e2dd864
ARROW-2184: [C++] Add static constructor for FileOutputStream return…
xuepanchen c0b0e33
ARROW-2191: [C++] Only use specific version of jemalloc
xhochy 2fd8f0a
ARROW-2204: Fix TLS errors in manylinux1 build
xhochy 27d8339
ARROW-2214: [JS] add nullBitmap getter to DictionaryData that proxies…
trxcllnt 655eb74
ARROW-2212: [C++/Python] Build Protobuf in base manylinux 1 docker image
xhochy 5521bcf
ARROW-2094: [C++] Install libprotobuf and set PROTOBUF_HOME when usin…
wesm 564fefe
ARROW-2213: [JS] fix npm release
trxcllnt 8c493cd
ARROW-2219: [JS] rename indicies to indices
trxcllnt e50a8ec
ARROW-2206: [JS] Document Perspective project
lmeyerov e0328b0
ARROW-2023: [C++] Fix ASAN failure on malformed / empty stream input,…
wesm c017a63
ARROW-1035: [Python] Add streaming dataframe reconstruction benchmark
pitrou a5c5ad2
ARROW-2203: [C++] StderrStream class
rvernica 887e893
ARROW-1937: [Python] Document nested array initialization
pitrou 482fc58
ARROW-2210: [C++] Reset ptr on failed memory allocation
xhochy 1a92846
ARROW-2223: [JS] compile src/bin as es5-cjs to all output targets
trxcllnt d3fabe0
ARROW-2230: [Python] Strip catch-all tag matching from git-describe
xhochy 524b522
ARROW-2218: [Python] PythonFile should infer mode when not given
pitrou 0a672bc
ARROW-2226, ARROW-2233: [JS] Dictionary bugfixes
trxcllnt 1d9b834
ARROW-2225: [JS] support tables split across buffers
trxcllnt 671b53c
ARROW-2046: [Python] Support path-like objects
pitrou 3d5880a
ARROW-2040: [Python] Deserialized Numpy array must keep ref to underl…
pitrou 5321582
ARROW-2231: [CI] Use clcache on AppVeyor for faster builds
pitrou af2047e
ARROW-2215: [Plasma] Hugetables munmap issue
pcmoritz 8b3bbae
ARROW-2198: [Python] correct docstring for parquet.read_table
wesm b2eb6ac
ARROW-1632: [Python] Permit categorical conversions in Table.to_panda…
xhochy bfac60d
ARROW-2145/ARROW-2153/ARROW-2157/ARROW-2160/ARROW-2177: [Python] Deci…
cpcloud 99899d6
ARROW-2232: [Python] pyarrow.Tensor constructor segfaults
cpcloud 29495ce
ARROW-2176: [C++] Extend DictionaryBuilder to support delta dictionaries
f403804
ARROW-2205: [Python] Option for integer object nulls
5994094
ARROW-2209: [Python] Partition columns are not correctly loaded in sc…
xhochy 34c33f1
[Python] Document serialization parameter as "string" instead of "bytes"
mitar 8b1c811
ARROW-2245: ARROW-2246: [Python] Revert static linkage of parquet-cpp…
xhochy 6e699d7
ARROW-2252: [Python] Create buffer from address, size and base
xhochy 03db8a3
ARROW-2251: [GLib] Keep GArrowBuffer alive while GArrowTensor for the…
kou 9ceda35
ARROW-2244: [C++] Add unit test to explicitly check that NullArray in…
wesm b89c124
ARROW-2253: [Python] Support __eq__ on scalar values
xhochy 49f1d00
ARROW-2258: [Python] Add additional information to find Boost on windows
xhochy 55bdae5
ARROW-2254: [Python] Ignore JS tags in local dev versions
xhochy c6359cb
ARROW-1929: [C++] Copy over testing utility code from PARQUET-1092
wesm 45f5da2
ARROW-1982: [Python] Coerce Parquet statistics as bytes to more usefu…
wesm 01a099c
ARROW-2199: [JAVA] Control the memory allocated for inner vectors in …
siddharthteotia 06e9fb4
[Python] Add missing dependency to development.rst
mitar 57e4dd8
ARROW-2265: [Python] Use CheckExact when serializing lists and numpy …
robertnishihara 51e117d
ARROW-2154: [Python] Implement equality on buffers
pitrou cde18a6
ARROW-2234: [JS] Read timestamp low bits as Uint32s
trxcllnt a58bd72
ARROW-2272: [Python] Clean up leftovers in test_plasma.py
pitrou 5f8a793
ARROW-2279: [Python] Better error message if lib cannot be found
mitar 60c8081
ARROW-2261: [GLib] Improve memory management for GArrowBuffer data
kou 9effbed
ARROW-2283: [C++] Support Arrow C++ installed in /usr detection by pk…
kou fb2316c
ARROW-2238: [C++] Detect and use clcache in cmake configuration
pitrou c372dfb
ARROW-2280: [Python] Return the offset for the buffers in pyarrow.Array
xhochy 5e945a3
ARROW-2239: [C++] Update Windows build docs
pitrou f3f91b0
ARROW-2263: [Python] Prepend local pyarrow/ path to PYTHONPATH in tes…
wesm 34b18f7
ARROW-1940: [Python] Extra metadata gets added after multiple convers…
cpcloud 04f4e6b
ARROW-2289: [GLib] Add Numeric, Integer, FloatingPoint data types
kou f56fdc9
ARROW-2270: [Python] Fix lifetime of ForeignBuffer base object
pitrou 40a0008
[Python] Adding more missing Linux dependencies to developer docs
mitar 23d08b7
ARROW-2150: [Python] Raise NotImplementedError when comparing with py…
wesm 7354a19
ARROW-2284: [Python] Fix error display on test_plasma error
pitrou 8167472
ARROW-2275: [C++] Guard against bad use of Buffer.mutable_data()
pitrou 3511c65
ARROW-2268: Drop usage of md5 checksums for source releases, verifica…
wesm c7c2393
ARROW-2269: [Python] Make boost namespace selectable in wheels
xhochy fc9f89a
ARROW-2250: [Python] Do not create a subprocess for plasma but just u…
mitar d0284cb
ARROW-2236: [JS] Add more complete set of predicates
412bb91
ARROW-2291: [C++] Add additional libboost-regex-dev to build instruct…
andygrove 907a27d
ARROW-2288: [Python] Fix slicing logic
pitrou 2f718d7
ARROW-2262: [Python] Support slicing on pyarrow.ChunkedArray
xhochy d64a231
ARROW-2181: [PYTHON][DOC] Add doc on usage of concat_tables
BryanCutler dc45a1a
ARROW-2099: [Python] Add safe option to DictionaryArray.from_arrays t…
wesm c7b3c05
ARROW-2297: [JS] babel-jest is not listed as a dev dependency
8f2ff30
ARROW-2240: [Python] Array initialization with leading numpy nan fail…
cpcloud 3917e85
ARROW-2292: [Python] Rename frombuffer() to py_buffer()
pitrou 58fa873
ARROW-2282: [Python] Create StringArray from buffers
xhochy 317b543
ARROW-2293: [JS] Print release vote e-mail template when making sourc…
6fc9922
ARROW-2118: [C++] Fix misleading error when memory mapping a zero-len…
wesm 171340f
ARROW-2135: [Python] Fix NaN conversion when casting from Numpy array
pitrou 0b28dc5
ARROW-2142: [Python] Allow conversion from Numpy struct array
pitrou 7c7b09f
ARROW-1643: [Python] Accept hdfs:// prefixes in parquet.read_table an…
33d1091
ARROW-2227: [Python] Fix off-by-one error in chunked binary conversions
wesm a430758
ARROW-2306: [Python] Fix partitioned Parquet test against HDFS
wesm 385656c
ARROW-2304: [C++] Fix HDFS MultipleClients unit test
wesm e25e3ef
ARROW-2307: [Python] Allow reading record batch streams with zero rec…
wesm 98012cb
ARROW-2312: [JS] run test_js before test_integration
trxcllnt b185951
ARROW-2313: [C++] Add -NDEBUG flag to arrow.pc
kou 630ce5e
ARROW-2311: [Python/C++] Fix struct array slicing
pitrou 019a560
ARROW-2309: [C++] Use std::make_unsigned
pitrou 60749b2
ARROW-2316: [C++] Revert Buffer::mutable_data to inline so that linke…
wesm 20ea781
[Python] Pin Cython to 0.27.3 in verify-release-candidate.sh (#1758)
wesm e29df7d
ARROW-2320: [C++] Vendored Boost build does not build regex library
cpcloud 79e19c3
[JS] Small fixes to source release workflow and e-mail template (#1750)
wesm 82c8b6f
ARROW-2318: [Plasma] Run plasma store tests with unique socket
pcmoritz 95ba6ef
ARROW-2321: [C++] Release verification script fails with if CMAKE_INS…
cpcloud 7be8d37
[Release] Update CHANGELOG.md for 0.9.0
wesm c695a5d
[maven-release-plugin] prepare release apache-arrow-0.9.0
wesm bb17a0d
[maven-release-plugin] prepare for next development iteration
wesm a50ef9f
ARROW-2329: [Website] 0.9.0 release update
siddharthteotia 60848c0
ARROW-2299: [Go] Import Go arrow implementation from influxdata/arrow
stuartcarnie 607c7fa
ARROW-2340: [Website] Add blog post about Go code donation
wesm 948cb4a
ARROW-2336: [Website] Add 0.9.0 release blog post
wesm f45abf0
[Website] Add link to press release
wesm 07beb51
ARROW-2333: [Python] Fix bundling boost with default namespace
pitrou 47fcef3
ARROW-2334: [C++] Update boost to 1.66.0
cpcloud d623567
ARROW-2341: [Python] Improve pa.union() mode argument behaviour
pitrou eecb1bc
ARROW-2281: [Python] Add Array.from_buffers()
pitrou f50d858
ARROW-2343: [Java/Packaging] Run mvn clean in API doc builds
cpcloud 29268ec
ARROW-2342: [Python] Allow pickling more types
pitrou 0c8d164
ARROW-2345: [Documentation] Fix bundle exec and set sphinx nosidebar …
cpcloud e6d8eed
ARROW-2322: [Java] Document dev environment requirements for publishi…
wesm a0ca9b4
ARROW-2346: [Python] Fix PYARROW_CXX_FLAGS with multiple options
pitrou 777f986
ARROW-2331: [Python] Fix indexing for negative or out-of-bounds indices
pitrou 7b2c797
ARROW-2349: [Python] Opt in to bundling Boost shared libraries separa…
wesm af6e3ec
ARROW-1913: [Java] Disable Javadoc doclint with Java 8
icexelloss 29f744f
ARROW-2350: Consolidated RUN step in spark_integration Dockerfile
9c7e06b
ARROW-2348: [GLib] Remove GLib + Go example
kou 6156b1d
ARROW-640: [Python] Implement __hash__ and equality for Array scalar …
27f5a42
ARROW-2301: [Python] Build source distribution inside the manylinux1 …
xhochy f9f8320
ARROW-2354: [C++] Make PyDecimal_Check() faster
pitrou 3d4b6c1
ARROW-2356: [JS] Fix JSON Reader FixedSizeBinary Vectors
trxcllnt f29e5a1
ARROW-2368: [JAVA] Correctly pad negative values in DecimalVector#set…
vkorukanti 866e9b8
ARROW-2327: [JS] Table.fromStruct missing from externs
97f5ec0
[C++] Fix documentation typo in arrow/array.h
rsabhi ba0cea3
ARROW-2140: [Python] Improve float16 support
pitrou 3f72d14
ARROW-2361: [Rust] Starting point for a native Rust implementation of…
andygrove 3975de5
Update README.md to include new components
wesm 00b334f
[Rust] Update READMEs to add Rust libraries link and to remove out-of…
andygrove be049fa
ARROW-2370: [GLib] Fix include path in .pc on Meson build
kou d2d4cc7
ARROW-2371: [GLib] Update "Requires" in .pc on GNU Autotools build
kou 7e27cf5
ARROW-2376: [Rust] Travis builds the Rust library
andygrove 8fdad18
ARROW-2377: [GLib] Support old GObject Introspection
kou 11b15a5
ARROW-2357: [Python] Add microbenchmark for PandasObjectIsNull()
pitrou fff992a
ARROW-2122: [Python] Pyarrow fails to serialize dataframe with timest…
b6e8b4b
ARROW-2381: [Rust] Adds iterator support to Buffer<T>
andygrove fce183c
ARROW-2378: [Rust] Rustfmt
max-sixty 4c68eca
ARROW-2375: [Rust] Implement Drop for Buffer so memory is released
andygrove 65d2558
ARROW-2351 [C++] StringBuilder::append(vector<string>...) not impleme…
lizhougao 65493a6
ARROW-2014: [Python] Document read_pandas method in pyarrow.parquet
9fc4d89
DOC: Fix a tiny typo in parquet documentation (#1824)
kjordahl b0f376a
Fix broken build on master (remove duplicate Drop impl for Buffer) (#…
andygrove 82d4555
ARROW-2141: [Python] Support variable length binary conversion from P…
BryanCutler 933b32b
ARROW-2388: [C++] Use valid_bytes API for StringBuilder::Append
kou 806979b
ARROW-2382: [Rust] Bug fix: List was not using aligned mem
andygrove 7081752
ARROW-2385: [Rust] implement to_json for DataType and Field
andygrove cf39686
ARROW-2195: [Plasma] Return auto-releasing buffers
pitrou 26bc4ab
ARROW-2308: [Python] Make deserialized numpy arrays 64-byte aligned.
robertnishihara 640fc83
ARROW-2276: [Python] Expose buffer protocol on Tensor
pitrou 76edf43
ARROW-1463: [Java] Cleanup usage of Types.MinorType to MinorType
BryanCutler 486d592
ARROW-2384: [Rust] Additional test & Trait standardization
max-sixty 02b0c72
ARROW-2325: [Python] Update setup.py to use Markdown project description
045470c
ARROW-2396: [Rust] Unify Rust Errors
max-sixty 9515fe9
ARROW-2380: [Python] Streamline conversions
pitrou 29c376d
ARROW-2398: [Rust] Create Builder<T> for building buffers directly in…
andygrove 83bfb39
ARROW-2404: [C++] Fix "declaration of 'type_id' hides class member" w…
rip-nsk 946517d
ARROW-2405: [C++] <function> is required for std::function
kou e3f7edc
ARROW-2401 Support filters on Hive partitioned Parquet files
f9c0701
ARROW-2402: [C++] Avoid spurious copies with FixedSizeBinaryBuilder
pitrou 87284a5
[Site] Add Antoine to committers list (#1853)
pitrou f88949b
ARROW-2418: [Rust] BUG FIX: reserve memory when building list
andygrove 408aa5a
ARROW-2416: [C++] Support system libprotobuf
kou b4dafa5
ARROW-2414: Fix a variety of typos.
waywardmonkeys 55c1075
ARROW-2353: [CI] Check correctness of built wheel on AppVeyor
pitrou b095994
ARROW-2408: [Rust] Remove build warnings
max-sixty 57db8b5
ARROW-2419: [Site] Hard-code timezone
pitrou 7376aab
ARROW-2413: [Rust] Remove useless calls to format!().
waywardmonkeys ca3dbbb
ARROW-2415: [Rust] Fix clippy ref-match-pats warnings.
waywardmonkeys abf4ed2
ARROW-2408: [Rust] Ability to get `&mut [T]` from `Buffer<T>`
andygrove 5030e23
ARROW-2420: [Rust] Fix major memory bug and add benches
andygrove ad39d1f
ARROW-2424: [Rust] Fix build - add missing import
andygrove 1bb7fba
ARROW-2100: [Python] Drop Python 3.4 support
pitrou f56d765
ARROW-2305: [Python] Bump Cython requirement to 0.27+
pitrou 27417b2
ARROW-2328: [C++] Fixed and unit tested feather writing with slice
Adriandorr e941af8
ARROW-2391: [C++/Python] Segmentation fault from PyArrow when mapping…
kszucs 33d92a0
ARROW-2434: [Rust] Add windows support
paddyhoran ca277ae
ARROW-2425: [Rust] BUG FIX: Add u8 mappings for Array::from
andygrove c5574f4
ARROW-2426: [GLib] Follow python -> python@3 change in Homebrew
kou 6633cc9
ARROW-2433: [Rust] Add Builder.push_slice(&[T])
andygrove 91ec792
ARROW-2411: [C++] Add StringBuilder::Append(const char **values)
kou 265142b
ARROW-2441: [Rust] Builder<T>::slice_mut assertions are too strict
andygrove 42e195b
ARROW-2440: [Rust] Implement ListBuilder<T>
andygrove 1ee7d11
ARROW-2407: [GLib] Add garrow_string_array_builder_append_values()
kou ed7db7c
ARROW-2097: [CI, Python] Reduce Travis-CI verbosity
pitrou 4009b62
ARROW-2224: [C++] Remove boost-regex dependency
pitrou 6e8ecb5
ARROW-2445: [Rust] Add documentation and make some fields private
andygrove db03663
ARROW-2182: [Python] Build C++ libraries in benchmarks build step
pitrou 9ad8602
ARROW-2432: [Python] Fix Pandas decimal type conversion with None values
BryanCutler f177404
ARROW-2369: [Python] Fix reading large Parquet files (> 4 GB)
pitrou 685147c
ARROW-2451: [Python] Handle non-object arrays more efficiently in cus…
robertnishihara 0f87c12
ARROW-2437: [C++] Add ReadMessage without aligned argument.
robertnishihara c96747b
ARROW-2455: [C++] Initialize the atomic bytes_allocated_ properly
sighingnow 7de1264
ARROW-2387: [Python] Flip test for rescale loss if value < 0
98d250e
ARROW-2397: [Documentation] Update format documentation to describe t…
robertnishihara b2167e4
ARROW-2435: [Rust] Add memory pool abstraction.
liurenjie1024 3eee3e4
ARROW-2101: [Python/C++] Correctly convert numpy arrays of bytes to a…
2d0fbf1
ARROW-2464: [Python] Use a python_version marker instead of a condition
thedrow 72c7f5d
ARROW-2454: [C++] Allow zero-array chunked arrays
pitrou 66d0ad1
ARROW-2315: [C++/Python] Flatten struct array
pitrou 2876a3f
ARROW-2463: [C++] Update flatbuffers to 1.9.0
xhochy f1ef708
ARROW-2319: [C++] Add BufferedOutputStream class
pitrou d7d3196
ARROW-2442: [C++] Disambiguate builder Append() overloads
pitrou 72df18c
ARROW-2465: [Plasma/GPU] Preserve plasma_store rpath
pitrou 4c31b37
ARROW-2147: [Python] Fix type inference of numpy arrays
xhochy 25eff99
ARROW-2468: [Rust] Builder::slice_mut() should take mut self.
waywardmonkeys d58057b
ARROW-2473: [Rust] List empty slice assertion
andygrove c2e0d42
ARROW-2423: [Python] Enable DataType, Field and plasma ObjectID equal…
kszucs 18999bb
ARROW-2469: [C++] Make out arguments last in ReadMessage.
robertnishihara 1299931
ARROW-2443: [Python] Allow creation of empty Dictionary indices
xhochy 7eeca3a
ARROW-2458: [Plasma] Use one thread pool per PlasmaClient
pcmoritz 09be7b4
ARROW-2472: [Rust] Remove public attributes from Schema and Field and…
andygrove 46fe09a
ARROW-2471: [Rust] Builder zero capacity fix
andygrove c19b1f0
ARROW-2481: [Rust] Move all calls to free() into memory.rs
andygrove 249e039
ARROW-1928: [C++] Add BitmapReader/BitmapWriter benchmarks
pitrou 4c71f30
ARROW-2390: [C++/Python] Map Python exceptions to Arrow status codes
pitrou c9ad33e
ARROW-2457: [GLib] Support large is_valids in builder's append_values()
kou 54df19d
ARROW-1018: [C++] Create FileOutputStream, ReadableFile from file des…
pitrou 2452a46
ARROW-2393: [C++] Moving ARROW_CHECK_OK_[PREPEND] macros from status.…
3b69c5a
ARROW-2450: [Python] Test for Parquet roundtrip of null lists
pitrou 1ba7d51
ARROW-2222: handle untrusted inputs
crepererum 5381295
ARROW-2314: [C++/Python] Fix union array slicing
pitrou 138717a
ARROW-1858: [Python] Added documentation for pq.write_dataset
dsimmie a6c9d30
ARROW-2453: [Python] Improve Table column access
a5ae134
ARROW-1731: [Python] Add columns selector in Table.from_array
Gatisseja 03251e9
ARROW-2427: [C++] Implement ReadAt properly
pitrou 77a5c59
ARROW-2494: [C++] Return status codes from PlasmaClient::Seal instead…
kszucs 7545e3e
ARROW-2492: [Python] Prevent segfault on accidental call of pyarrow.A…
xhochy b65205e
ARROW-2470: [C++] Avoid seeking in GetFileSize
pitrou 2abc889
ARROW-2489: [Plasma] Fix PlasmaClient ABI variation
pitrou a609309
ARROW-2502: [Rust] Restore Windows Compatibility
paddyhoran 2d278ab
ARROW-2508: [Python] Fix pytest.raises msg to message
pcmoritz 3d7a5a6
ARROW-2074: [Python] Infer lists of dicts as struct arrays
pitrou 5f9cf9c
ARROW-2448: [Plasma] Reference counting for PlasmaClient::Impl
pcmoritz c8a3ed8
ARROW-2286: [C++/Python] Allow subscripting pyarrow.lib.StructValue
kszucs c574006
ARROW-2498: [Java] Use java 1.8 instead of java 1.7
c8f17dd
ARROW-2518: [Java] Re-instate JDK tests in matrix, but with JDK 8 ins…
16820a2
ARROW-2452: [TEST] Spark integration test fails with permission error
kszucs 3f5819a
[GLib] Fix a typo
kou e8d45eb
ARROW-2515 [Python] Add DictionaryValue class, fixing bugs with neste…
blkerby e3fafae
ARROW-2513: [Python] DictionaryType should give access to index type …
crepererum f93a635
First try at implementing a CountValues kernel
andrioni 2d99fc6
Remove commented out prototype
andrioni 95da385
Fix formatting issues
andrioni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -51,6 +51,10 @@ Status GetDictionaryEncodeKernel(FunctionContext* ctx, | |
| const std::shared_ptr<DataType>& type, | ||
| std::unique_ptr<HashKernel>* kernel); | ||
|
|
||
| ARROW_EXPORT | ||
| Status GetCountValuesKernel(FunctionContext* ctx, const std::shared_ptr<DataType>& type, | ||
| std::unique_ptr<HashKernel>* kernel); | ||
|
|
||
| /// \brief Compute unique elements from an array-like object | ||
| /// \param[in] context the FunctionContext | ||
| /// \param[in] datum array-like input | ||
|
|
@@ -71,6 +75,19 @@ Status Unique(FunctionContext* context, const Datum& datum, std::shared_ptr<Arra | |
| ARROW_EXPORT | ||
| Status DictionaryEncode(FunctionContext* context, const Datum& data, Datum* out); | ||
|
|
||
| /// \brief Return counts of unique elements from an array-like object | ||
| /// \param[in] context the FunctionContext | ||
| /// \param[in] value array-like input | ||
| /// \param[out] out_uniques unique elements as Array | ||
| /// \param[out] out_counts counts per element as Array, same shape as out_uniques | ||
| /// | ||
| /// \since 0.10.0 | ||
| /// \note API not yet finalized | ||
| ARROW_EXPORT | ||
| Status CountValues(FunctionContext* context, const Datum& value, | ||
| std::shared_ptr<Array>* out_uniques, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems more natural to me to have the output type be a struct (but maybe there was discussion on this previously, I guess the existing API had this)? |
||
| std::shared_ptr<Array>* out_counts); | ||
|
|
||
| // TODO(wesm): Define API for incremental dictionary encoding | ||
|
|
||
| // TODO(wesm): Define API for regularizing DictionaryArray objects with | ||
|
|
@@ -95,11 +112,6 @@ Status DictionaryEncode(FunctionContext* context, const Datum& data, Datum* out) | |
| // Status IsIn(FunctionContext* context, const Datum& values, const Datum& member_set, | ||
| // Datum* out); | ||
|
|
||
| // ARROW_EXPORT | ||
| // Status CountValues(FunctionContext* context, const Datum& values, | ||
| // std::shared_ptr<Array>* out_uniques, | ||
| // std::shared_ptr<Array>* out_counts); | ||
|
|
||
| } // namespace compute | ||
| } // namespace arrow | ||
|
|
||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to promote some code reuse with the other unary (single-argument) hash kernels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, although I am not sure how to do it. Maybe moving everything to a macro? I am willing to try if somebody could give me some pointers on what's the best way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern described here might be useful: https://mortoray.com/2014/09/10/using-macros-to-simplify-type-visitors-and-enums/