Skip to content

doc: Add an auto-generated dependency graph for internal crates#19280

Merged
2010YOUY01 merged 14 commits intoapache:mainfrom
2010YOUY01:doc-deps
Jan 14, 2026
Merged

doc: Add an auto-generated dependency graph for internal crates#19280
2010YOUY01 merged 14 commits intoapache:mainfrom
2010YOUY01:doc-deps

Conversation

@2010YOUY01
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

A dependency graph for workspace member crates are often needed when doing refactors, I want it to be included in the doc, and have a script to update it automatically.

Here is the preview:
image

What changes are included in this PR?

  • adds a script to generate the dependency graph deps.svg, and verify if the existing one is up to date.
  • adds a documentation page in Contributor Guide to show this graph
  • adds a CI job to check if the generated dependency graph is up to date with the code.

Are these changes tested?

I tested the dependency graph display locally, see above.
Is it possible to see the preview from this PR's change online?

I also included a dummy crate in the initial commit, to test if the CI can catch it and throw understandable error message.

Are there any user-facing changes?

No

@github-actions github-actions Bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion labels Dec 11, 2025
## Dependency Graph for Workspace Crates

<!--
Below is an embedded .svg file, with interactive functionalities like drag/zoom-in/etc.
Copy link
Copy Markdown
Contributor Author

@2010YOUY01 2010YOUY01 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI disclosure:
This snippet is to embed the deps.svg inside the markdown, and be able to drag/zoom. The implementation is AI generated, I have skimmed through it and it looks good to me, though I don't have enough front-end knowledge to inspect it in details.

Comment thread datafusion/ci-dummy/Cargo.toml Outdated
# under the License.

[package]
name = "datafusion-ci-dummy"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CI testing:

  1. Generated the dep graph
  2. Added this new dummy workspace member crate
    Now the dep graph is obsolete, the CI should fail

If failed as expected in https://github.com/apache/datafusion/actions/runs/20129195011/job/57766134500?pr=19280

Comment thread dev/rust_lint.sh Outdated
Comment thread docs/scripts/generate_dependency_graph.sh Outdated
2010YOUY01 and others added 2 commits December 15, 2025 17:13
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Comment thread .github/workflows/dependencies.yml Outdated
with:
submodules: true
fetch-depth: 1
- name: Setup Rust toolchain
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just run this script as part of some other job and save having to check out the code again...

Copy link
Copy Markdown
Contributor Author

@2010YOUY01 2010YOUY01 Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's better to let it generate and deploy through CI automatically. Done in 307119c

Perhaps we should also do so for other auto-generated document sections, like configuration and function docs?

Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Some of these look a bit odd. I guess it omits direct dependencies if they have a transitive dependency? For example, it says datafusion-examples depends on datafusion-proto only but this is not the case 🤔

(minor nit, but I wonder if there's a better way to represent the gen/gen-common binaries or just omit them entirely)

Comment thread docs/source/contributor-guide/architecture/dependency-graph.md Outdated
Comment thread docs/README.md
Comment on lines +43 to +46
The docs build regenerates the workspace dependency graph via
`docs/scripts/generate_dependency_graph.sh`, so ensure `cargo`, `cargo-depgraph`
(`cargo install cargo-depgraph --version ^1.6 --locked`), and Graphviz `dot`
(`brew install graphviz` or `sudo apt-get install -y graphviz`) are available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unfortunate we have these dependencies required on the host system; I suppose we can file a followup ticket to see if we can have a way to build docs locally without installing these (e.g. Docker or somehow see if uv can support this (no idea on this, just throwing out ideas))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2010YOUY01
Copy link
Copy Markdown
Contributor Author

Image Some of these look a bit odd. I guess it omits direct dependencies if they have a transitive dependency? For example, it says `datafusion-examples` depends on `datafusion-proto` only but this is not the case 🤔

Yes, otherwise the dep graph would be quite hard to read. I usually use this dep graph to decide where to put/move modules, so ignoring transitive dependencies won't be an issue.
I updated the comment below the dep graph to make it clear.

(minor nit, but I wonder if there's a better way to represent the gen/gen-common binaries or just omit them entirely)

I updated the script to ignore it. They're only used in internal scripts do I think it's better to omit them.

@2010YOUY01 2010YOUY01 added this pull request to the merge queue Jan 14, 2026
Merged via the queue into apache:main with commit 567ba75 Jan 14, 2026
29 checks passed
@2010YOUY01
Copy link
Copy Markdown
Contributor Author

It's live at https://datafusion.apache.org/contributor-guide/architecture/dependency-graph.html

de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
…he#19280)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
A dependency graph for workspace member crates are often needed when
doing refactors, I want it to be included in the doc, and have a script
to update it automatically.

Here is the preview:
<img width="1203" height="951" alt="image"
src="https://github.com/user-attachments/assets/527c18fc-258e-465f-a150-f2aafe3e6db9"
/>


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
- adds a script to generate the dependency graph `deps.svg`, and verify
if the existing one is up to date.
- adds a documentation page in `Contributor Guide` to show this graph
- adds a CI job to check if the generated dependency graph is up to date
with the code.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
I tested the dependency graph display locally, see above.
Is it possible to see the preview from this PR's change online?

I also included a dummy crate in the initial commit, to test if the CI
can catch it and throw understandable error message.

## Are there any user-facing changes?
No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants