feat!(datafusion): enable parallel file scanning with eager task bucketing by toutane · Pull Request #2298 · apache/iceberg-rust

toutane · 2026-03-31T11:43:03Z

Which issue does this PR close?

Closes: Enable parallel file-level scanning for IcebergTableScan Datafusion Integration #2220
Related:
- EPIC: Support parallel scan in iceberg-datafusion #1604 (EPIC: sub-file parallelism, long-term direction)
- Plan file scan task according scan file size. #128 (size-based planning)

What changes are included in this PR?

Approach

Rather than introducing new types (IcebergPartitionedScan, IcebergPartitionedTableProvider as originally proposed), this PR extends the existing IcebergTableProvider / IcebergTableScan with an eager mode where file scan tasks are planned at scan() time and distributed into buckets, one bucket per DataFusion partition.

The main motivation is to let DataFusion schedule file reads concurrently. Previously all files streamed through a single partition (UnknownPartitioning(1)); now IcebergTableProvider::scan distributes tasks across min(target_partitions, n_files) partitions, and declares Partitioning::Hash when the data is identity-partitioned.

Key changes

TableScan::to_arrow_from_tasks - New public method on TableScan that accepts a pre-collected FileScanTaskStream instead of calling plan_files() internally. This is the hook used by IcebergTableScan::execute(i) to replay each bucket through the Arrow reader while preserving all reader-side configuration (concurrency limit, row-group filtering, batch size). Tasks must come from a TableScan with the same projection and filters as self - predicates are baked into each task at planning time and are not re-applied by the reader. The doc comment makes this contract explicit.
IcebergTableScan is now pub - Previously pub(crate). Made public so that downstream integrations that need to inspect or wrap the physical plan can do so without going through the table provider.
with_new_children now returns an error - IcebergTableScan is a leaf node and does not support children. Previously the implementation silently dropped any children passed to it; it now returns DataFusionError::Internal when children is non-empty, matching the contract of IcebergCommitExec.
Eager task planning in IcebergTableProvider::scan - plan_files() is now called at planning time (inside TableProvider::scan) rather than at execution time (inside ExecutionPlan::execute). The collected tasks are distributed into min(target_partitions, n_files) buckets by bucketing::bucket_tasks and stored in the scan. Each execute(i) call then fetches its pre-assigned bucket and streams it through to_arrow_from_tasks - no redundant metadata reads per partition.
bucketing module - Handles bucket assignment and Partitioning declaration. For tables with a single partition spec using only identity transforms, tasks are hashed on their partition values using DataFusion's create_hashes + REPARTITION_RANDOM_STATE, and the scan declares Partitioning::Hash. This lets DataFusion recognize that the output is already hash-partitioned and skip a downstream RepartitionExec. Non-identity transforms (bucket, truncate, year/month/day/hour) are lossy: the partition value in task metadata does not match what DataFusion would compute by hashing the actual column values, so those cases fall back to UnknownPartitioning. Any task that cannot be fully hashed with the identity key (unsupported literal type, null partition value) also falls back.
Credit: This bucketing solution was proposed by @timsaucer.

Design choices - planning at `scan()` time vs. at `execute()` time

Planning eagerly at scan() time is a deliberate trade-off:

Pro: Tasks are computed once and shared across all partitions; the plan is reproducible; execute(i) is pure I/O with no catalog round-trips.
Con: TableProvider::scan now does network I/O (catalog + metadata reads), which is unusual for a planning-phase method. An alternative design - planning lazily at execute time - would keep scan() cheap but requires one plan_files() call per partition (redundant). A future extension could expose this as an option for use cases where snapshot staleness matters more than plan reproducibility.

Known limitations

Limited type support for Partitioning::Hash - literal_to_array supports seven primitive Arrow types (Bool, Int32, Int64, Float32, Float64, Utf8, Date32). Timestamps, Decimal128, LargeUtf8, etc. are not yet covered; any unsupported type forces fallback to UnknownPartitioning.
Spec evolution disables Partitioning::Hash - If the table has more than one historical partition spec, the bucketing module conservatively returns UnknownPartitioning to avoid mismatches between old and new partition tuple layouts.

Are these changes tested?

Unit tests in table/mod.rs covering the new bucketed scan path:

test_empty_table_single_empty_bucket - Empty table produces one empty bucket, guarding against out-of-bounds panic on execute(0).
test_unpartitioned_falls_back_to_unknown - Unpartitioned table declares UnknownPartitioning.
test_bucket_count_capped_at_file_count - When target_partitions > n_files, bucket count is capped at n_files.
test_single_target_partition_single_bucket - target_partitions=1 produces a single bucket regardless of file count, reproducing the original single-threaded behavior.
test_identity_partitioned_declares_hash - Identity-partitioned table declares Partitioning::Hash referencing the partition column.
test_projection_without_partition_col_falls_back_to_unknown - Projecting away an identity column falls back to UnknownPartitioning.

Additional tests are added for IcebergTableProvider to cover limit pushdown, insert behavior, and schema consistency, ensuring the refactor introduces no regressions on existing functionality.

SQL logic tests - EXPLAIN snapshots are updated to reflect the new buckets:[N] file_count:[M] display format and the correct input_partitions counts.

Production validation - We plan to test these changes in our infrastructure by shadowing real-world queries.

Follow-up work

Redundant FilterExec - @timsaucer reports that supports_filters_pushdown returns Inexact for all filters, causing DataFusion to insert a FilterExec above IcebergTableScan even though the Arrow reader already applies the predicate via ArrowPredicateFn. Returning Exact for losslessly-converted filters would eliminate this redundant re-evaluation.
He proposed a solution in earlier commits, but those have been reverted as they are out of scope for this PR. This issue is tracked in IcebergTableProvider::supports_filters_pushdown marks every filter as Inexact, causing a redundant FilterExec above IcebergTableScan #2363.

Note

IcebergStaticTableProvider is unchanged - it still uses IcebergTableScan::new (lazy, single-partition). Static snapshots do not benefit from eager planning because the task list is fixed by construction.

timsaucer

I'm no expert on Iceberg but I've worked a lot on DataFusion, particularly table providers. I wrote a blog on the datafusion site recently, but since you first put this PR up. In case it's in any way useful: https://datafusion.apache.org/blog/2026/03/31/writing-table-providers/

Overall I think the approach here is definitely reasonable. My comments are mostly around opportunities to squeeze out a little more performance based on having done something similar at my work.

timsaucer · 2026-04-20T12:23:30Z

+        self: Arc<Self>,
+        _children: Vec<Arc<dyn ExecutionPlan>>,
+    ) -> DFResult<Arc<dyn ExecutionPlan>> {
+        Ok(self)


Since this doesn't support children, I'd recommend an error if _children is not empty. Not a blocker for merge.

Yes, you're right thanks! Pushed a fix that returns a DataFusionError::Internal, matching the pattern used in IcebergCommitExec::with_new_children.

Side note: IcebergTableScan::with_new_children has the same issue. This could be the subject of another PR.

timsaucer · 2026-04-20T12:27:55Z

+        &self,
+        filters: &[&Expr],
+    ) -> DFResult<Vec<TableProviderFilterPushDown>> {
+        Ok(vec![TableProviderFilterPushDown::Inexact; filters.len()])


Can we do better than this? If we have partitioned scan and the filter is on the partitions I would expect to be able to get an exact pushdown. That would entirely remove a filter operation for cases where it matches, and I think that's a big win and common use case I've seen in other work.

Yes, you're right there's something to do here, I agree.

I'd prefer to tackle this in a follow-up PR: doing it correctly requires a per-filter conversion API (currently convert_filters_to_predicate collapses everything into a single combined predicate and silently drops non-convertible filters) and, in a partition-spec-aware check, only Identity-transformed partition columns can be safely marked Exact; bucket, truncate, year/month/etc. are lossy and must stay Inexact to avoid incorrect results.

Happy to open a tracking issue. However, if you think it's simple enough, I can go ahead and make the changes directly in the PR.

timsaucer · 2026-04-20T12:30:53Z

+            .map_err(to_datafusion_error)?
+            .try_collect::<Vec<_>>()
+            .await
+            .map_err(to_datafusion_error)?;


It looks like the number of output partitions will be the number of files, right? I'm wondering if there's an opportunity to do better than that. We're specifying that the output partitioning in the exec is unknown, but don't we have information about the partitioning we could utilize?

By better I mean could we be more performant if we were to go ahead and get the target partitions from the session and output in those number of partitions already with hashing?

Thanks for raising this, please push back if any of the below is off.

For context, the long-term direction for this is tracked in the EPIC #1604 (row-group-based parallel scan with a GroupPruner that can split/merge FileScanTask below the file grain). What I was hoping to land with this PR is a more immediate, scoped optimization that stays within the current file-grain contract, so we don't preempt the design choices in #1604. The file-grouping step you're pointing at is essentially what #2220 describes as the intermediate improvement on the path toward #1604.

If you think it's appropriate, I'd be happy to pick up a short-term follow-up along these lines:

Switch IcebergPartitionedScan from tasks: Vec<FileScanTask> to file_groups: Vec<Vec<FileScanTask>>, to follow the convention used by DataFusion's own FileScanConfig, each group = one DataFusion partition that streams its files sequentially through ArrowReaderBuilder::read.

In IcebergPartitionedTableProvider::scan, read state.config().target_partitions() and group tasks into min(n_files, target_partitions) buckets.

When n_files < target_partitions, parallelism is still capped at n_files. I think that's inherent to the file grain, but let me know if I'm missing something.

I'm happy to open the follow-up issue/PR myself, or defer to you if you'd rather frame it, whatever works best.

I suppose I'd need to understand those conversations. I think I mentioned in one of the other comments on this PR, but I found the whole discussion difficult to track. Maybe I can find some time this weekend to look through that sized based partitioning they mention.

I wrote this PR targeting your branch. Let me know what you think!

toutane#1

The one issue I have is that I do not personally have access to any iceberg catalogs that I could use for benchmarking. My ability to test it is very limited right now.

Hey Tim, thanks a lot for the proposal. It is really clean and smart.

I created an issue for the redundant FilterExec you were mentioning (#2363), so nice that you're addressed it here.

For the benchmark, we can do it in our infra by shadowing real traffic (our ultimate goal would be to distribute execution on multiple workers, based on the output partitioning). I will not be a standard benchmark but at least it will show if things are improving on real world queries.

What do you finally thing of merging this new provider/scan with the current one so that we only maintain one path as you suggested? If I understand correctly the current path is reachable by setting target_partitions to 1.

Last thing, I'll try to support partitioning based on Iceberg bucket transform, the tricky thing being that DataFusion and Iceberg aren't using the same hash function making the bucket hash incompatible with RepartitionExec.

Personally I strongly believe you should be updating the existing table provider instead of creating a new one. I think it's just more work in the long run to keep to near identical bits of code.

I don't think you'll be able to use iceberg bucket transforms for the datafusion hashing output.

mbutrovich · 2026-04-20T14:43:00Z

Thanks for the PR, @toutane! One thing I noticed: IcebergPartitionedScan::execute() creates a bare ArrowReaderBuilder::new(file_io).build() with no configuration. The existing path through IcebergTableScan wires through row group filtering, row selection, concurrency limits, and batch size. Might be worth plumbing those through here too so users don't silently lose those optimizations when switching to the partitioned scan.

timsaucer · 2026-04-21T11:27:00Z

More broadly, is adding in a second path really the best answer? It seems like now you're going to increase your maintenance load. Is there any reason not to have a single path and the fallback be that it's a partitioned scan of N=1?

I am going to spend a little more time trying to understand the issues. It's difficult because some of them are marked as unplanned or stale and some of the links do not have good descriptions. I suppose I'll need to look at the java source to get a better idea of what the long term goal is.

toutane · 2026-04-22T12:45:40Z

Hey Tim, I think you're absolutely right about consolidating everything into a single TableProvider long term.

The only reason I kept separate paths was to avoid introducing breaking changes. I am going to explore a design where partitioned file scan becomes the default behavior, with the current provider's logic as a fallback as you suggested.

On a related note, it could be worth thinking about the next step: exposing Partitioning::Hash as output-partitioned when the Iceberg data uses bucket partitioning. Do you think that fits naturally in the same path, or would a separate provider be a better fit?

timsaucer · 2026-04-23T17:36:23Z

Hey Tim, I think you're absolutely right about consolidating everything into a single TableProvider long term.

The only reason I kept separate paths was to avoid introducing breaking changes. I am going to explore a design where partitioned file scan becomes the default behavior, with the current provider's logic as a fallback as you suggested.

On a related note, it could be worth thinking about the next step: exposing Partitioning::Hash as output-partitioned when the Iceberg data uses bucket partitioning. Do you think that fits naturally in the same path, or would a separate provider be a better fit?

I understand a desire to not introduce breaking changes. Is the concern that the API is changing or do you have implementation concerns? If it's just the API change, then I think a good upgrade documentation is often sufficient, especially since it looks like the change would be fairly straightforward for a downstream consumer. Please correct me if that's not correct.

If it's concern about the implementation, then I think the real solution is to make sure there's robust testing both in the repo and against some real life workloads to verify performance at different scales and partitioning structures.

With respect to the question about output partitioning, I think any time you can do that you should. Any time we can give more information about these kinds of things we're going to see performance gains, and sometimes significant gains.

…itionedScan for parallel file scanning

Co-authored-by: Tim Saucer <timsaucer@gmail.com>

…:with_new_children

…identity-hash partitioning Replace the one-task-per-partition layout in IcebergPartitionedScan with N buckets sized from the session's target_partitions. When the table's default spec exposes identity-transform columns and every task carries the corresponding partition values, tasks are bucketed by hashing those values via DataFusion's REPARTITION_RANDOM_STATE so the resulting partitioning matches what RepartitionExec would produce. The scan then declares Partitioning::Hash(exprs, N), letting downstream joins and aggregates skip an extra repartition. Hash declaration is conservative and only stands when: - the table has a single partition spec (no spec evolution) - every identity source column is present in the output projection - every column type is supported by literal_to_array - every task supplied a full identity key Any miss collapses to UnknownPartitioning(N) while bucketing falls back to a hash of data_file_path so partitions still distribute. IcebergPartitionedScan now stores Vec<Vec<FileScanTask>> and execute(i) streams every task in buckets[i] through to_arrow_with_tasks. Bucket count is capped at min(target_partitions, num_files), and an empty table still yields zero partitions to avoid out-of-bounds execute calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`IcebergPartitionedTableProvider::supports_filters_pushdown` previously returned `Inexact` for every filter, forcing DataFusion to re-evaluate even filters that Iceberg's manifest-level pruning has fully resolved. Per-filter the provider now returns `Exact` when both: - the iceberg conversion can represent the filter, so manifest pruning will remove every row that fails it, and - every leaf is a comparison or null check against an identity- partition column with a literal RHS. Identity-partitioned column names are cached at `try_new` from the table's default spec; tables with spec evolution (>1 historical specs) fall back to an empty set so all filters stay `Inexact`. Supported shapes: =, !=, <, <=, >, >=, IS NULL, IS NOT NULL, IN/NOT IN, plus AND/OR/NOT compositions of the above. Every other shape is `Inexact`. `convert_filter_to_predicate` is promoted to `pub(crate)` so the provider can probe convertibility per filter without rebuilding the whole AND-collapsed predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…column intersection Previously identity_partition_col_names returned an empty set whenever the table had more than one historical partition spec, forcing every filter back to Inexact under spec evolution. This was overly conservative: Iceberg evaluates partition predicates against each manifest's own spec, so a column that is identity-partitioned in every spec is fully prunable across the entire table regardless of which spec a given file was written under. Replace the multi-spec gate with an intersection across every spec's identity-source set. A column survives only if every spec includes it with Transform::Identity; columns that appear with non-identity transforms in some spec, or are missing from a spec entirely, are dropped. The result remains an honest set of columns for which Exact pushdown is provably safe across all surviving files. Hash bucketing (compute_identity_cols) keeps its single-spec gate because slot-order alignment with the table's default spec depends on each task carrying its own spec id, which the native plan flow does not yet do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…via per-column intersection" This reverts commit b2613e3. (cherry picked from commit 826f054)

…shdown" This reverts commit 6d0ed4c. (cherry picked from commit 4381f00)

…ergTableProvider IcebergPartitionedTableProvider and IcebergPartitionedScan were introduced to enable parallel file scanning by bucketing FileScanTasks across DataFusion partitions. However, maintaining two TableProvider implementations is redundant: the new provider is strictly more capable, and its degenerate case (target_partitions=1) reproduces the old single-partition behavior exactly. This commit folds the partitioned provider into IcebergTableProvider and the partitioned scan into IcebergTableScan, eliminating the parallel types. Changes: - IcebergTableProvider::scan() now eagerly calls plan_files() and distributes FileScanTasks into buckets using the same identity-hash strategy (REPARTITION_RANDOM_STATE + create_hashes) that was in IcebergPartitionedTableProvider, enabling Partitioning::Hash declarations that align with DataFusion's RepartitionExec. - IcebergTableScan gains a new_with_tasks() constructor that accepts pre-planned buckets and a caller-supplied Partitioning. execute(i) streams the tasks in buckets[i] via TableScan::to_arrow_with_tasks, rebuilding the TableScan per-partition to avoid serializing PlanContext Arc-shared caches across workers. - The original new() constructor and the to_arrow() lazy path are kept unchanged for IcebergStaticTableProvider, which does not pre-plan tasks. - Limit slicing (try_filter_map truncation) from the old IcebergTableScan is preserved in both execution paths. - Bucketing helpers (IdentityCol, compute_identity_cols, bucket_tasks, identity_hash, fallback_hash, literal_to_array, is_supported_dtype) are moved verbatim into a new private table/bucketing.rs module. - Unit tests from partitioned.rs are migrated to table/mod.rs and updated to use IcebergTableProvider and IcebergTableScan. - integration_datafusion_test.rs: fix test_provider_plan_stream_schema to call execute(0) instead of execute(1). The old call worked only because the previous IcebergTableScan silently ignored the partition index. (cherry picked from commit d2e5e04)

Review pass over the partitioned-scan branch ahead of upstream contribution. - Rename `TableScan::to_arrow_with_tasks` to `to_arrow_from_tasks` — `from` better signals that the tasks are the input source rather than a builder-style modifier. - Restructure the doc with a `# Correctness` section that calls out the projection/filter contract while clarifying that reader-side configuration (concurrency, batch size, row-group filtering, row selection) is taken from `self`. - Make `IcebergTableScan::new` and `new_with_tasks` `pub` (were `pub(crate)`) so external users can construct the node directly, matching the public visibility of the struct itself. - Drop the `convert_filters_to_predicate` re-export from `physical_plan/mod.rs`: it was unused outside the module. - Extract a private `new_inner` constructor on `IcebergTableScan` so `new` and `new_with_tasks` share a single source of truth for the `PlanProperties` / projection / predicate setup. - Split `IcebergTableScan::execute` into a linear pipeline backed by three helpers: `build_table_scan` (synchronous scan-builder plumbing), `build_record_batch_stream` (async stream construction for the lazy/eager modes), and `apply_limit`. - Trim the `IcebergTableScan` struct doc and field comments to match the rest of the file's style; drop the verbose `to_arrow_with_tasks` rationale (the `# Correctness` doc carries the load-bearing info). - Tighten `DisplayAs::fmt_as`: remove the file-path enumeration (file count alone is enough for `EXPLAIN`) and factor the common prefix. - Trim several narrating comments in `table/mod.rs` and the module doc that duplicated information already evident from the code. - Add `test_identity_partitioned_declares_hash`: verifies the happy path where an identity-partitioned table with the partition column in the projection produces `Partitioning::Hash` referencing that column. This was the main missing coverage for the bucketing logic. - Add `test_projection_without_partition_col_falls_back_to_unknown`: verifies the `compute_identity_cols → None` branch when the projection omits the partition source column. - Add helpers (`make_partitioned_catalog_and_table_for_bucketing`, `append_partitioned_fake_data_files`) to build identity-partitioned fixtures without writing real Parquet files. (cherry picked from commit b1f2d66)

IcebergTableProvider::scan now plans files eagerly and buckets them across DataFusion partitions before returning the ExecutionPlan. As a result, IcebergTableScan's DisplayAs output always includes `buckets:[N] file_count:[M]` - even for unpartitioned tables where N = 1. Update the four .slt files whose EXPLAIN snapshots were missing this suffix, and fix the like_predicate_pushdown snapshots that also had a stale input_partitions count on RepartitionExec (the table now has multiple files across multiple buckets). (cherry picked from commit 6ae4a71)

toutane mentioned this pull request Mar 31, 2026

Enable parallel file-level scanning for IcebergTableScan Datafusion Integration #2220

Open

toutane marked this pull request as ready for review March 31, 2026 14:12

mbutrovich self-requested a review March 31, 2026 14:17

timsaucer reviewed Apr 20, 2026

View reviewed changes

t3hw mentioned this pull request Apr 20, 2026

perf(reader): parallelize Parquet decompression across tokio tasks #2342

Closed

toutane marked this pull request as draft April 21, 2026 09:35

toutane force-pushed the draft/partitioned-file-scanning-contribution branch from 0a7af45 to fde61f6 Compare April 21, 2026 09:56

toutane and others added 13 commits April 29, 2026 16:39

feat(datafusion): add IcebergPartitionedTableProvider and IcebergPart…

b9819b4

…itionedScan for parallel file scanning

docs(datafusion): update comment in IcebergPartitionedScan

c362b0f

Update crates/integrations/datafusion/src/table/mod.rs

ec1bd37

Co-authored-by: Tim Saucer <timsaucer@gmail.com>

fix(datafusion): reject non-empty children in IcebergPartitionedScan:…

5e53cb8

…:with_new_children

fix(datafusion): use ArrowReaderBuilder existing configuration path

cc6a833

format

75e521d

Revert "feat(datafusion): allow Exact pushdown across spec evolution …

aba4523

…via per-column intersection" This reverts commit b2613e3. (cherry picked from commit 826f054)

Revert "feat(datafusion): mark identity-partition filters as Exact pu…

598c5de

…shdown" This reverts commit 6d0ed4c. (cherry picked from commit 4381f00)

toutane force-pushed the draft/partitioned-file-scanning-contribution branch from 0a2ba62 to 70bc487 Compare April 29, 2026 14:48

toutane changed the title ~~feat(datafusion): enable parallel file-level scanning via one partition per file~~ feat!(datafusion): enable parallel file scanning with eager task bucketing Apr 29, 2026

toutane added 3 commits April 30, 2026 11:09

fix(datafusion): resolve conflicts

7ad9dcc

fix(datafusion): format

7ff1f6d

toutane force-pushed the draft/partitioned-file-scanning-contribution branch from 1f87e13 to 7ff1f6d Compare April 30, 2026 09:12

toutane marked this pull request as ready for review April 30, 2026 09:44

toutane requested a review from timsaucer April 30, 2026 09:44

Conversation

toutane commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Approach

Key changes

Design choices - planning at scan() time vs. at execute() time

Known limitations

Are these changes tested?

Follow-up work

Note

Uh oh!

timsaucer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toutane Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toutane Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbutrovich commented Apr 20, 2026

Uh oh!

timsaucer commented Apr 21, 2026

Uh oh!

toutane commented Apr 22, 2026

Uh oh!

timsaucer commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

toutane commented Mar 31, 2026 •

edited

Loading

Design choices - planning at `scan()` time vs. at `execute()` time

toutane Apr 20, 2026 •

edited

Loading

toutane Apr 27, 2026 •

edited

Loading