Skip to content

docs: add spark r/w lance demo#3574

Merged
yanghua merged 3 commits intolance-format:mainfrom
yanghua:issue-3553
Mar 28, 2025
Merged

docs: add spark r/w lance demo#3574
yanghua merged 3 commits intolance-format:mainfrom
yanghua:issue-3553

Conversation

@yanghua
Copy link
Copy Markdown
Collaborator

@yanghua yanghua commented Mar 20, 2025

No description provided.

@yanghua yanghua self-assigned this Mar 20, 2025
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Mar 20, 2025
@yanghua yanghua closed this Mar 20, 2025
@yanghua yanghua reopened this Mar 20, 2025
Comment thread docs/examples/examples.rst Outdated
Training Multi-Modal models using a Lance dataset <./clip_training.rst>
Deep Learning Artefact Management using Lance <./artefact_management.rst> No newline at end of file
Deep Learning Artefact Management using Lance <./artefact_management.rst>
Reading and writing a Lance dataset in Spark <./spark_example.rst> No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a spark data source example. It will have another spark catalog connector example when the catalog is ready.
So this doc should rename as spark_datasource_example.rst

* lance-core JAR: Core Rust Spark binding exposing Lance features to Java (available `here <https://repo1.maven.org/maven2/com/lancedb/lance-core/0.23.0/lance-core-0.23.0.jar>`_)
* lance-spark JAR: Spark connector for reading/writing Lance format (available `here <https://repo1.maven.org/maven2/com/lancedb/lance-spark/0.23.0/lance-spark-0.23.0.jar>`_)

Place these JARs in the ``${SPARK_HOME}/jars`` directory, then run:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two jars are not enough to run lance connector. The arrow and JNI are needed.
But I think how to set up the spark env should be another doc under integrations.

@dowjones226
Copy link
Copy Markdown

hello, curious whether you could also write a basic example for pyspark that would be compatible with spark3.5 and python3.10?

@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Mar 28, 2025

hello, curious whether you could also write a basic example for pyspark that would be compatible with spark3.5 and python3.10?

@dowjones226 Of course, we can add a pyspark demo, but to reduce this PR's review context, I'd rather open another PR to add the pyspark demo, WDYT?

@yanghua
Copy link
Copy Markdown
Collaborator Author

yanghua commented Mar 28, 2025

@eddyxu Can we push this work to land as soon as possible?

@yanghua yanghua merged commit 7a49e5d into lance-format:main Mar 28, 2025
5 checks passed
@yanghua yanghua deleted the issue-3553 branch March 28, 2025 03:07
@dowjones226
Copy link
Copy Markdown

dowjones226 commented Mar 30, 2025

@yanghua yes that would be great! please cc me on the PR. Thank you.
Just to add some colour, the use-case we're trying out is reading from iceberg and writing to s3 lance files. If the demo could just show how to write to s3 from iceberg that would be great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants