Only load fields that are needed from Google API for Big Query schema. by jezdez · Pull Request #787 · mozilla/redash

jezdez · 2019-01-25T17:55:17Z

Refs #785.

I used the Google API Explorer to build the values for the fields parameters: e.g. https://developers.google.com/apis-explorer/#p/bigquery/v2/bigquery.tables.get

More info for the fields parameter: https://developers.google.com/api-client-library/python/guide/performance#partial-response-fields-parameter

- Use new master / rc release release strategy (#440) - Migrate Circle CI 2.0 (#488, #502)

I tried to make `JSON.stringify(this.visualization.options.columnMapping)` a variable to avoid repeating it, but if I make it a `let` the linter throws an error and if I make it a `const` then it doesn’t change with the UI and the logic doesn’t work. :( updated based on PR comments

In the long run we'll be able to install additional dependencies by having an own Dockerfile to build images based on the Redash image but that installs additional Python dependencies. But until we have a fork with lots of changes ourselves we need to do it this way. Redash-stmo contains the ability to hook up our own Dockerflow library. Refs #13 Refs #37

Extend the Remote User Auth backend with the ability to pass remote user groups via a configurable request header similar to the REMOTE_USER header. Refs #37. If enabled the feature allows checks the header value against a configured list of group names, including the ability to use UNIX shell-style wildcards.

* Use --max-tasks-per-child as per celery documentation * Set --max-memory-per-child to 1/4th of total system memory * Split exec command over multiple lines * Fix memory variable typo

Refs: #379, #415

Refs #537, #553. Co-authored-by: Marina Samuel <msamuel@mozilla.com> Co-authored-by: Allen Short <ashort@mozilla.com>

…p the bundles.

…. Refs #785.

jezdez · 2019-01-25T17:57:10Z

It'd be neat to try this out eventually, not sure how I can do this locally to be honest :(

emtwo · 2019-01-28T20:16:25Z

@jezdez I haven't looked into the cause of the failing tests here, but I've tried this out locally and it's working in that less data is passed along per request, which is good.

However, I'm not entirely sure that this will resolve the issue #785 for a couple of reasons.

It did not provide a clear improvement in speed for fetching BigQuery schema since the main bottleneck seems to be making a request for every single table, and this PR does not change that. Unfortunately, it seems that this is just the nature of the BigQuery schema API.
I could not reproduce the error @jasonthomas was seeing {"error": {"message": "Error retrieving schema.", "code": 2}} neither on master or in this PR when testing with a BigQuery data source that has ~270 tables. The only way I could get that error to show up is by using an invalid JSON key file when setting up the data source.

That said, I think once these tests are fixed, we can land this PR anyways (perhaps it should be done upstream?) It may or may not fix the issue but it's still useful to be sending around less data.

I also think if we could connect to the BigQuery data source locally that is failing on stage, it might be easier to reproduce and debug the issue. Is there any chance we could do that, @jasonthomas?

emtwo · 2019-01-28T20:27:11Z

For reference, here are some differences in data that is passed along with the change in this PR:

[Before] Fetching a dataset

[{
    u 'kind': u 'bigquery#dataset',
    u 'location': u 'US',
    u 'id': u 'cosmic-antenna-155218:babynames',
    u 'datasetReference': {
        u 'projectId': u 'cosmic-antenna-155218',
        u 'datasetId': u 'babynames'
    }
}]

[After] Fetching a dataset

[{
    u 'datasetReference': {
        u 'datasetId': u 'babynames'
    }
}]

[Before] Fetching table names

[{
    u 'creationTime': u '1541093458826',
    u 'kind': u 'bigquery#table',
    u 'type': u 'TABLE',
    u 'id': u 'cosmic-antenna-155218:babynames.copy',
    u 'tableReference': {
        u 'projectId': u 'cosmic-antenna-155218',
        u 'tableId': u 'copy',
        u 'datasetId': u 'babynames'
    }
}, ...]

[After] Fetching table names

[{
    u 'tableReference': {
        u 'tableId': u 'copy'
    }
}, ...]

[Before] Fetching a table schema:

{
    u 'kind': u 'bigquery#table',
    u 'creationTime': u '1541093458826',
    u 'location': u 'US',
    u 'tableReference': {
        u 'projectId': u 'cosmic-antenna-155218',
        u 'tableId': u 'copy',
        u 'datasetId': u 'babynames'
    },
    u 'numRows': u '33228',
    u 'numBytes': u '637183',
    u 'etag': u '3AWXArPupz0SpiQIXDh8RQ==',
    u 'numLongTermBytes': u '0',
    u 'lastModifiedTime': u '1541093458826',
    u 'type': u 'TABLE',
    u 'id': u 'cosmic-antenna-155218:babynames.copy',
    u 'selfLink': u 'https://www.googleapis.com/bigquery/v2/projects/cosmic-antenna-155218/datasets/babynames/tables/copy',
    u 'schema': {
        u 'fields': [{
            u 'type': u 'STRING',
            u 'name': u 'name',
            u 'mode': u 'NULLABLE'
        }, {
            u 'type': u 'STRING',
            u 'name': u 'gender',
            u 'mode': u 'NULLABLE'
        }, {
            u 'type': u 'INTEGER',
            u 'name': u 'count',
            u 'mode': u 'NULLABLE'
        }]
    }
}

[After] Fetching a table schema:

{
    u 'id': u 'cosmic-antenna-155218:babynames.copy',
    u 'schema': {
        u 'fields': [{
            u 'type': u 'STRING',
            u 'name': u 'name',
            u 'mode': u 'NULLABLE'
        }, {
            u 'type': u 'STRING',
            u 'name': u 'gender',
            u 'mode': u 'NULLABLE'
        }, {
            u 'type': u 'INTEGER',
            u 'name': u 'count',
            u 'mode': u 'NULLABLE'
        }]
    }
}

jasonthomas · 2019-01-28T22:17:23Z

Is there any chance we could do that, @jasonthomas?

@emtwo I've sent you an email with the json key. The project id information is also listed in the json key file.

One thing to mention is that the redash documentation recommends either using the Big Query Admin role or creating a custom role for the service account with the permissions. I've created a custom role with the following permissions:

    "bigquery.datasets.get",
    "bigquery.jobs.create",
    "bigquery.jobs.get",
    "bigquery.jobs.list",
    "bigquery.jobs.listAll",
    "bigquery.jobs.update",
    "bigquery.tables.get",
    "bigquery.tables.list",

The documentation also includes 'bigquery.datasets.list' and 'bigquery.tables.getData' which is not included in the above permissions. 'bigquery.datasets.list' does not exist and ''bigquery.tables.getData' would give access to all table data [2]. Instead the service account user has access to specific datasets/tables.

[1] https://redash.io/help/data-sources/setup/bigquery
[2] https://cloud.google.com/bigquery/docs/access-control

jasonthomas · 2019-01-30T21:00:11Z

As per today's STMO meeting we decided to test the BigQuery data source with a a gcp service account that contained role 'BigQuery Admin' to rule out this is a permission issue.

I created a new service account, created a new Big Query data source on stage stmo and was unable to view the schema. The error was the same as above {"error": {"message": "Error retrieving schema.", "code": 2}}. I also ran a query against the datasource and that was successful and returned data.

emtwo · 2019-02-01T16:29:30Z

Note: bug #785 has a fix that was merged upstream: getredash#3382

jezdez · 2019-02-25T11:27:30Z

Hey @emtwo, @jasonthomas Do you think this would be a good idea to open a PR upstream?

emtwo · 2019-03-28T19:08:48Z

Do you think this would be a good idea to open a PR upstream?

Yes definitely.

jezdez · 2019-04-03T07:59:18Z

Closing in favor of getredash#3673.

Allen Short and others added 30 commits December 17, 2018 13:20

In docker-entrypoint ensure tables exist

a33e479

upgrade node and npm in dockerfile

0bd626d

Update Circle CI for our workflow

18695a9

- Use new master / rc release release strategy (#440) - Migrate Circle CI 2.0 (#488, #502)

Pin PyAthena dependency to 1.2.0.

86f5c2f

Switch to PyMySQL for MySQL 5.7 support

1fc3098

Don't execute query when changing data sources (fixes #29)

57f11a7

Add schedule_until field to queries, to allow expiry (re #15)

6fd67ed

Add compare query version support (re #7)

a740615

Filter tables from schema browser (re #31)

6a0c867

Add column type info to query runners (re #152, #23)

17290cb

Add last_active_at column to users page (re #155)

e6d88ad

Add partition key marker to Athena and Presto columns (re #185)

d440b80

Run queries with no cached result in public dashboards (re #220)

51be429

secure cookies, add X-Content-Type-Options header (bug 1371613)

c27e225

Merge mozilla schema updates with schema from master

1af1f42

merge upstream db changes

340ac9f

properly rollback failed db commits

b3abab9

Unique names for query parameters (re #164)

9614858

Aggregate query results (re #35) (#339)

cab71d7

Updates to docker-entrypoint for worker and scheduler (#364)

68fa92e

* Use --max-tasks-per-child as per celery documentation * Set --max-memory-per-child to 1/4th of total system memory * Split exec command over multiple lines * Fix memory variable typo

Closes #396: Integration with Flower.

a169e20

merge upstream db changes

5105abf

Add data source health monitoring via an extension.

da09b7f

Refs: #379, #415

Use production build of react when deployed (fixes #606)

614ca83

Various merge fixes.

af0ce47

Enable documentation links and versions of data sources (re #6).

0bcab1e

Refs #537, #553. Co-authored-by: Marina Samuel <msamuel@mozilla.com> Co-authored-by: Allen Short <ashort@mozilla.com>

Use entry_point.module_name instead of entry_point.name for looking u…

6f4bfdb

…p the bundles.

Only load fields that are needed from Google API for Big Query schema…

734d461

…. Refs #785.

jezdez requested review from emtwo and jasonthomas January 25, 2019 17:55

Pin requests-oauthlib to work around incompatible deptree.

7d2e523

rafrombrc added the in progress label Feb 20, 2019

washort force-pushed the master branch from 7a9c171 to f70c823 Compare April 2, 2019 19:53

jezdez closed this Apr 3, 2019

jezdez deleted the bug785 branch April 3, 2019 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only load fields that are needed from Google API for Big Query schema.#787

Only load fields that are needed from Google API for Big Query schema.#787
jezdez wants to merge 32 commits intomasterfrom
bug785

jezdez commented Jan 25, 2019 •

edited

Loading

Uh oh!

jezdez commented Jan 25, 2019

Uh oh!

emtwo commented Jan 28, 2019

Uh oh!

emtwo commented Jan 28, 2019 •

edited

Loading

Uh oh!

jasonthomas commented Jan 28, 2019 •

edited

Loading

Uh oh!

jasonthomas commented Jan 30, 2019

Uh oh!

emtwo commented Feb 1, 2019

Uh oh!

jezdez commented Feb 25, 2019

Uh oh!

emtwo commented Mar 28, 2019

Uh oh!

jezdez commented Apr 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

jezdez commented Jan 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jezdez commented Jan 25, 2019

Uh oh!

emtwo commented Jan 28, 2019

Uh oh!

emtwo commented Jan 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonthomas commented Jan 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonthomas commented Jan 30, 2019

Uh oh!

emtwo commented Feb 1, 2019

Uh oh!

jezdez commented Feb 25, 2019

Uh oh!

emtwo commented Mar 28, 2019

Uh oh!

jezdez commented Apr 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jezdez commented Jan 25, 2019 •

edited

Loading

emtwo commented Jan 28, 2019 •

edited

Loading

jasonthomas commented Jan 28, 2019 •

edited

Loading