Skip to content

Conversation

@davidcavazos
Copy link

@davidcavazos davidcavazos commented Feb 23, 2021

This is the second introductory notebook on how to read and write data.

It covers:

  • Basic I/O concepts
  • ReadFromText and WriteToText transforms
  • Creating new sources
    • Creating data from generators
    • Reading data from SQLite
  • Creating new sinks
    • Writing fixed-sized batches
    • Writing windows of elements

R: @aaltay
R: @rosetn

Staged:

Note: I tried reading from public Cloud Storage data and public BigQuery data, but they all required authentication, so I decided to not include them.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@davidcavazos
Copy link
Author

Will need to be added to the landing page #13747 as well, whichever merges first.

@aaltay
Copy link
Member

aaltay commented Feb 24, 2021

I glanced at this. It looks good. Some high level comments:

  • I am not sure the info boxes related to disk speeds are relevant. Read/Write might be mostly happening to a service and very unlikely to use in memory sources.
  • The parts that refer to creating a sink transform and creating a source transform might cause confusion. Especially the source one, since source has a specific meaning.

I would like to load balance these reviews across the team. For this one I will nominate @emilymye in addition to @rosetn.

Feel free to ping me once this is ready to merge after both reviews.

@davidcavazos
Copy link
Author

Got it, I removed the info boxes about the disk speeds. I'm also renaming "Source" to "input transform" and "Sink" to "output transform" since that might be more accurate terms.

Copy link
Contributor

@emilymye emilymye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will defer to @rosetn for rest of review :)

@davidcavazos davidcavazos changed the title [BEAM-10937] Add reading and writing data notebook [BEAM-10937] Tour of Beam: Reading and writing data notebook Mar 3, 2021
Copy link
Author

@davidcavazos davidcavazos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, here's the updated version.

@davidcavazos
Copy link
Author

Hi @emilymye, can you take a look at this whenever you have a chance? Thank you!

@davidcavazos
Copy link
Author

Hi @rosetn, I've addressed your review comments, please let me know what you think.

Copy link
Contributor

@rosetn rosetn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great, David!

@davidcavazos
Copy link
Author

Friendly ping @emilymye :)

@emilymye
Copy link
Contributor

LGTM! @aaltay will have to actually approve and submit because I don't have committer access.

@aaltay
Copy link
Member

aaltay commented Mar 17, 2021

Thank you! And thank you for the reviews!

@aaltay aaltay merged commit e8f9c68 into apache:master Mar 17, 2021
@davidcavazos davidcavazos deleted the tour-of-beam-reading-writing-data branch March 17, 2021 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants