Skip to content

Adding YAML Import/Export for Datasources to CLI#2993

Closed
fabianmenges wants to merge 2 commits into
apache:masterfrom
tc-dc:import_export_ds
Closed

Adding YAML Import/Export for Datasources to CLI#2993
fabianmenges wants to merge 2 commits into
apache:masterfrom
tc-dc:import_export_ds

Conversation

@fabianmenges
Copy link
Copy Markdown
Contributor

@fabianmenges fabianmenges commented Jun 19, 2017

Context: https://groups.google.com/forum/#!topic/airbnb_superset/GeWZs42_NyA

Summary

Adding YAML import and export for datasources, which includes (SqlAlchemy) Databases and Druid Clusters to the Superset CLI.

Description

I added the core of the Import/Export logic to the ImportMixin mix-in class. It heavily relies on SqlAlchemy to determine the schema of the YAML and the relationship of objects. Specifically it uses unique constraints to identify and update existing elements. This required me to add unique constraints to existing tables, but I'm pretty confident that this should not cause major issues with existing installations since I added them in the "spirit" of the current design.

In addition to the SqlAlchemy relationships the main object hierarchy needs to be defined by configuring the export_parent and export_children attributes appropriately (documented in code).

The unit test covers only basic importing exporting (it was liftet from the existing pickle import/export) and should probably be extended to cover more edge cases.

You can export databases, druid clusters, tables, datasources from the UI:
screen shot 2017-11-16 at 6 52 46 pm

Possible Future Projects/Improvements

  • Add YAML import/export for Slices and Dashboards.
  • Adding support to import/export individual database / tables / metrics, etc. as root element of a YAML file.
  • Support of importing multiple YAML files at the same time
  • Support to split exports into smaller files
  • It would also be nice to use the YAML import for the example datasets.
  • Adding YAML import to the Web UI and/or API

@fabianmenges fabianmenges force-pushed the import_export_ds branch 7 times, most recently from 96e46fd to 378673b Compare June 21, 2017 13:24
@spegmb
Copy link
Copy Markdown

spegmb commented Nov 15, 2017

Hi,

I have not been able to integrate this code into the latest superset, so I was wondering if this script (https://github.com/tc-dc/superset/blob/DatasourceUpload/upload_datasources.py) that was mentioned in the discussion thread is still available? (The link is not working anymore).
Thank you

@fabianmenges
Copy link
Copy Markdown
Contributor Author

I'll push an updated version of this branch soon (by end of this week), I'm currently enhancing it for our CI environment. I have quite a few of these branches out and its pretty painful to keep them up to date.

@mistercrunch
Copy link
Copy Markdown
Member

I think the build didn't trigger (Travis downtime?), looks ready to merge otherwise.

@spegmb
Copy link
Copy Markdown

spegmb commented Nov 19, 2017

Hi,

I pulled your commit and all seemed to work fine. In particular, exporting to a yaml file worked well. However, importing seems to have some problem

$superset import_datasources blob

usage: superset [-?]
{flower,load_examples,export_datasource_schema,import_datasources,db,runserver,refresh_druid,init,version,export_datasources,worker,shell,update_datasources_cache}
...
superset: error: too many arguments

(where this error happens whether blob is an actual file or just something that does not exist). And of course running without the second argument does not work either:

superset import_datasources
Traceback (most recent call last):
File "/usr/local/bin/superset", line 4, in
......
File "build/bdist.linux-x86_64/egg/pathlib2.py", line 839, in _parse_args
TypeError: <flask_script.commands.Command object at 0x7ff900dd2650>: argument should be a str object or an os.PathLike object returning str, not <type 'NoneType'>

I also tried with a pipe, same error (cat whatever.yaml | superset import_datasources)
I'm guessing this is not the expected behavior?

Thanks

@fabianmenges
Copy link
Copy Markdown
Contributor Author

Try

superset import_datasources -p blob.yaml

And look at the documentation in this PR.
I'll look into why it does print the help text when called without parameters.

@spegmb
Copy link
Copy Markdown

spegmb commented Nov 19, 2017

Got it; I had not realized that superset import_datasources --help provided all the necessary info. Thank you and sorry for the dumb question.

@fabianmenges fabianmenges force-pushed the import_export_ds branch 2 times, most recently from 6972a07 to 2597da0 Compare November 29, 2017 14:59
@fabianmenges
Copy link
Copy Markdown
Contributor Author

Opened a new PR for this so travis would pick it up

@fabianmenges fabianmenges deleted the import_export_ds branch December 4, 2017 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants