Skip to content

Conversation

@cpcloud
Copy link
Member

@cpcloud cpcloud commented Nov 8, 2022

This PR moves bigquery back into the main ibis repo.

Still working through the failing tests, though many are fixed. Tests are passing.

TODOs:

  • get all tests passing locally
  • see if we can automatically handle the autonaming we're doing that bigquery doesn't accept
  • setup ci similar to ibis-bigquery if possible, though maybe we just run these tests on push events only similar to snowflake

Possible follow ups:

  • delete <4 legacy code
  • delete <4 legacy tests
  • move SQL tests to write-then-compare so that they are easy to modify

@cpcloud cpcloud added this to the 4.0.0 milestone Nov 8, 2022
@cpcloud cpcloud added the community Issues or PRs requiring help from the community label Nov 8, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

Test Results

       41 files         41 suites   1h 37m 12s ⏱️
11 830 tests   8 998 ✔️   2 832 💤 0
42 238 runs  32 018 ✔️ 10 220 💤 0

Results for commit 5be3c16.

♻️ This comment has been updated with latest results.

@cpcloud cpcloud force-pushed the ibis-bigquery branch 4 times, most recently from 362feca to 0a44e88 Compare November 11, 2022 14:26
@cpcloud cpcloud added the bigquery The BigQuery backend label Nov 11, 2022
@cpcloud cpcloud force-pushed the ibis-bigquery branch 2 times, most recently from 4d93724 to 9ad71ad Compare November 11, 2022 16:18
@cpcloud cpcloud force-pushed the ibis-bigquery branch 2 times, most recently from a9f1703 to c5b2f6f Compare November 13, 2022 14:28
@jreback
Copy link
Contributor

jreback commented Nov 14, 2022

@cpcloud why do we view this as a good thing?

@cpcloud
Copy link
Member Author

cpcloud commented Nov 14, 2022

@jreback Good question, thanks for bringing it up.

The primary reason is to prevent the maintenance burden that comes along with a separate repo.

I give a more detailed answer to your question here (ibis-project/ibis-bigquery#151).

In short, many of the things we thought would be good about having a separate repo in practice increase maintenance work or have a negligible effect on the amount of maintenance work.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2022

@cpcloud sure

is this a general change in policy though? or a specific one off for BQ?

eg what about a lot of the other google variants or mssql for example

@cpcloud cpcloud marked this pull request as ready for review November 14, 2022 23:27
@cpcloud cpcloud force-pushed the ibis-bigquery branch 2 times, most recently from e9d38e8 to d0d7624 Compare November 16, 2022 12:48
@cpcloud
Copy link
Member Author

cpcloud commented Nov 21, 2022

@tswast Friendly ping! Any thoughts on this PR?

@codecov
Copy link

codecov bot commented Nov 21, 2022

Codecov Report

Merging #4797 (986544d) into master (a2d03d1) will decrease coverage by 5.15%.
The diff coverage is 2.37%.

❗ Current head 986544d differs from pull request most recent head 4c16755. Consider uploading reports for the commit 4c16755 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4797      +/-   ##
==========================================
- Coverage   92.88%   87.73%   -5.16%     
==========================================
  Files         192      204      +12     
  Lines       21731    22830    +1099     
  Branches     3011     3124     +113     
==========================================
- Hits        20185    20030     -155     
- Misses       1129     2389    +1260     
+ Partials      417      411       -6     
Impacted Files Coverage Δ
ibis/backends/bigquery/client.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/compiler.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/datatypes.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/operations.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/registry.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/rewrites.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/udf/__init__.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/udf/core.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/udf/find.py 0.00% <0.00%> (ø)
ibis/backends/bigquery/udf/rewrite.py 0.00% <0.00%> (ø)
... and 24 more

@cpcloud cpcloud force-pushed the ibis-bigquery branch 5 times, most recently from 9891fcf to 659ba53 Compare November 23, 2022 16:25
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BQ changes LGTM. I like the "snapshot" structure in the tests.

@cpcloud cpcloud force-pushed the ibis-bigquery branch 2 times, most recently from c6dc72c to 60eab8f Compare November 23, 2022 21:22
def fetch_from_cursor(self, cursor, schema):
query = cursor.query
df = query.to_arrow().to_pandas(timestamp_as_object=True)
query_result = query.result()
Copy link
Member Author

@cpcloud cpcloud Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast Can you take a look at this block of code and say whether this is expected behavior?

The use case is reading from bigquery-public-data.hacker_news.comments, but having ibis-gbq be the billing project.

Without this workaround, the storage API creates a read session in the data project (bigquery-public-data), which causes queries to fail when using the pyarrow functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect this to be necessary. If the query succeeded, then I assume the billing project is being set correctly in the client constructor

project=new_backend.billing_project,
and in the query method
stmt, job_config=job_config, project=self.billing_project

I've filed googleapis/python-bigquery#1422 to investigate this further, but I think it's fine to keep this workaround if there really is a bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I think there really is a bug. The project from "client" is used instead of the project from the QueryJob.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Thanks Tim. I'll keep this comment unresolved so the link is easier to find.

@cpcloud
Copy link
Member Author

cpcloud commented Nov 27, 2022

Ok, I'm going to merge this in and fix any issues with the CI. Thanks all for the help reviewing, great to see this back in the main repo!

@cpcloud cpcloud merged commit cd5e881 into ibis-project:master Nov 27, 2022
@cpcloud cpcloud deleted the ibis-bigquery branch November 27, 2022 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery The BigQuery backend community Issues or PRs requiring help from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants