[SIP] Proposal for Improving Examples Interface, Organization and Storage
The goal of the changes in this proposal is to improve the examples capabilities of Superset so as to foster an ecosystem of examples which will sustain and grow as the platform continues to develop. First I will characterize the existing system of examples and then propose changes to improve the number and quality of examples.
Current Examples
Examples are currently programmatically defined in the superset.data module. An abstract interface summarizing these examples looks like the following:
from abc import ABC
class AbstractSupersetExample(ABC):
"""Defines interface through which superset examples load themselves."""
def __init__(self, description):
self.description = description
def load_data(self):
# Task 1: Load file and create pandas.DataFrame
# Task 2: Load data into SQL with pandas.DataFrame.to_sql()
# Task 3: Process through ORM to get back workable Table object from whichever data source the table is in
pass
def create_metrics(self):
# Task 1: Build any TableColumns
# Task 2: Build Metrics - SQLMetrics
# Task 3: Store metrics in DB via ORM
pass
def create_charts(self, chart):
# Task 1: Build chart from config/JSON
# Task 2: Store to DB via - misc_dash_slices.add(slc.slice_name) / merge_slice(slc)
pass
def create_dashboards(self, name, config):
# Task 1: Instantiate Dash via ORM
# Task 2: Configure Dash via JSON
# Task 3: Store to DB via ORM
pass
While this mechanism jump started the collection of Superset examples, defining examples as code will not appeal to most Superset users of growing a community that contributes examples.
Current Dashboard Import/Export
Dashboard example creation could utilize the export feature by adding example oriented fields to the export. Dashboards can be exported via the Dashboard List interface at /dashboard/list/ via its Export action or at the command line via superset export_dashboards. Dashboard and chart export JSON includes everything needed to reproduce a dashboard save the actual data table: dashboard, chart and datasource information.
The datasources. __SqlaTable__.database element will need to be removed when examples are created and recreated when they are loaded to match SQLALCHEMY_EXAMPLES_URI or a —database-uri the user specifies. Each Slice’s datasource_id and datasource_name must be changed.
{
"dashboards": [
{
"__Dashboard__": {
"created_on": {
"__datetime__": "2019-04-19T10:29:21"
},
"changed_by_fk": null,
"slug": "world_health",
"json_metadata": "{\"remote_id\": 1}",
"description": null,
"dashboard_title": "World's Bank Data",
"changed_on": {
"__datetime__": "2019-05-02T17:55:47"
},
"created_by_fk": null,
"css": null,
"position_json": {...},
"id": 1,
"slices": [
{
"__Slice__": {
"created_by_fk": null,
"cache_timeout": null,
"params": "{\"compare_lag\": \"10\", \"compare_suffix\": \"o10Y\", \"country_fieldtype\": \"cca3\", \"date_filter\": false, \"entity\": \"country_code\", \"filter_configs\": [{\"asc\": false, \"clearable\": true, \"column\": \"region\", \"key\": \"2s98dfu\", \"metric\": \"sum__SP_POP_TOTL\", \"multiple\": true}, {\"asc\": false, \"clearable\": true, \"column\": \"country_name\", \"key\": \"li3j2lk\", \"metric\": \"sum__SP_POP_TOTL\", \"multiple\": true}], \"granularity_sqla\": \"year\", \"groupby\": [], \"limit\": \"25\", \"markup_type\": \"markdown\", \"metric\": \"sum__SP_POP_TOTL\", \"metrics\": [\"sum__SP_POP_TOTL\"], \"row_limit\": 50000, \"secondary_metric\": \"sum__SP_POP_TOTL\", \"show_bubbles\": true, \"since\": \"2014-01-01\", \"time_range\": \"2014-01-01 : 2014-01-02\", \"until\": \"2014-01-02\", \"viz_type\": \"filter_box\", \"where\": \"\", \"remote_id\": 371, \"datasource_name\": \"wb_health_population\", \"schema\": \"wb_health_population\", \"database_name\": \"main\"}",
"datasource_name": null,
"datasource_type": "table",
"slice_name": "Region Filter",
"changed_on": {
"__datetime__": "2019-05-02T18:00:39"
},
"changed_by_fk": 1,
"perm": "[main].[wb_health_population](id:2)",
"description": null,
"viz_type": "filter_box",
"datasource_id": 2,
"id": 371,
"created_on": {
"__datetime__": "2019-05-02T17:55:47"
},
"owners": []
}
}
],
"datasources": [
{
"__SqlaTable__": {
"created_on": {
"__datetime__": "2019-04-19T10:29:20"
},
"sql": null,
"cache_timeout": null,
"changed_on": {
"__datetime__": "2019-05-02T17:55:47"
},
"is_sqllab_view": false,
"params": "{\"remote_id\": 2, \"database_name\": \"main\"}",
"id": 2,
"template_params": null,
"perm": "[main].[wb_health_population](id:2)",
"description": "<!--\nLicensed to the Apache Software Foundation...",
"created_by_fk": null,
"table_name": "wb_health_population",
"default_endpoint": null,
"changed_by_fk": null,
"main_dttm_col": "year",
"is_featured": false,
"database_id": 1,
"filter_select_enabled": true,
"fetch_values_predicate": null,
"offset": 0,
"schema": null,
"columns": [
{
"__TableColumn__": {
"created_by_fk": null,
"python_date_format": null,
"is_dttm": false,
"description": null,
"groupby": false,
"type": "FLOAT",
"verbose_name": null,
"id": 4,
"changed_by_fk": null,
"created_on": {
"__datetime__": "2019-04-19T10:29:20"
},
"table_id": 2,
"filterable": false,
"expression": "",
"database_expression": null,
"is_active": true,
"column_name": "NY_GNP_PCAP_CD",
"changed_on": {
"__datetime__": "2019-04-19T10:29:20"
}
}
},
...
],
"metrics": [
{
"__SqlMetric__": {
"created_by_fk": null,
"table_id": 2,
"d3format": null,
"description": null,
"metric_name": "sum__SP_POP_TOTL",
"changed_on": {
"__datetime__": "2019-04-19T10:29:20"
},
"metric_type": null,
"changed_by_fk": null,
"expression": "sum(SP_POP_TOTL)",
"warning_text": null,
"is_restricted": false,
"verbose_name": null,
"id": 3,
"created_on": {
"__datetime__": "2019-04-19T10:29:20"
}
}
},
...
],
"database": {
"__Database__": {
"allow_csv_upload": true,
"verbose_name": null,
"created_by_fk": null,
"allow_ctas": false,
"database_name": "main",
"changed_by_fk": null,
"allow_dml": false,
"sqlalchemy_uri": "sqlite:////Users/rjurney/.superset/superset.db",
"force_ctas_schema": null,
"password": null,
"allow_multi_schema_metadata_fetch": false,
"cache_timeout": null,
"created_on": {
"__datetime__": "2019-04-19T10:28:59"
},
"extra": "{\n \"metadata_params\": {},\n \"engine_params\": {},\n \"metadata_cache_timeout\": {},\n \"schemas_allowed_for_csv_upload\": []\n}\n",
"select_as_create_table_as": false,
"perm": "[main].(id:1)",
"changed_on": {
"__datetime__": "2019-04-19T10:28:59"
},
"expose_in_sqllab": true,
"impersonate_user": false,
"id": 1,
"allow_run_async": false
}
}
}
},
A Slice object has a params_dict which contains the following. Note that this includes references to the datasource_name wb_health_population.
{'compare_lag': '10',
'compare_suffix': 'o10Y',
'country_fieldtype': 'cca3',
'entity': 'country_code',
'granularity_sqla': 'year',
'groupby': ['region', 'country_code'],
'limit': '25',
'markup_type': 'markdown',
'metric': 'sum__SP_POP_TOTL',
'metrics': ['sum__SP_POP_TOTL'],
'row_limit': 50000,
'secondary_metric': 'sum__SP_POP_TOTL',
'show_bubbles': True,
'since': '1960-01-01',
'time_range': '2014-01-01 : 2014-01-02',
'until': 'now',
'viz_type': 'treemap',
'where': '',
'remote_id': 708,
'datasource_name': 'wb_health_population',
'schema': 'wb_health_population',
'database_name': 'main'}]
Example Components
A Superset example is a SQL oriented dashboard and is composed of the following:
* Physical database with tables loaded with a dataset
* Superset Datasource object
* Table object
* Column objects
* Metric objects
* Dashboard object
* One or more chart objects
All of the above with the exception of the Datasource.Database entry will need to be serialized, stored, contributed, approved, listed, deserialized and loaded by the example system. The Datasource.Database entry will need to be removed on exporting and replaced on importing of examples.
Scope of Improvement
This proposal improves the superset example process in three areas: example creation, data storage and discoverability.
In order to improve the range and quality of Superset examples we need to first improve the process for creating and loading examples. While examples can be created programmatically, the more natural process is to use Superset to create them. This requires that we automate the process to persist and restore the combined state of the Superset Dashboard, Database and related objects as well as the contents of the datasource itself.
We also need a directory to which examples can be uploaded and a corresponding user interface and process of governance over that repository. This directory should be independent of the Superset project release process and code repository. Current processes for management of Superset’s code assets would transition directly to the management of its examples: changes would be created by creating Github issues and pull requests, data assets would be versioned and managed in a central repository.
Finally we need a user interface for finding, listing and loading examples from the repository. It should be simple and can exist as an examples command as part of the superset CLI which will have export, list , import and remove sub-commands.
Example Repository Requirements
The requirement for example storage are that it have the following properties:
- Any user can view a list of all examples that have been published
- Any user can upload an example and publish it - pending approval
- Examples can be approved by committers by some mechanism
- Approved examples can be published/released by committers by some mechanism
- Management process/interface should match as closely as possible existing tools and practices.
- The project is separate from Superset to enable more rapid and independent release cycles
Git and Github are a desirable mechanism for publishing and approval but an undesirable mechanism for storage. Git LFS (Large File System) offers scalable storage while still using Github for project management. With a 2GB file limit and support on Github for 250 of these files, it scales well and is the proposed storage system. Other options are explored in the addendum.
Example File Format
Examples should be defined and packaged in a standard manner and each example should be self contained in its own file system directory. The existing Dashboard export format adequately describes a dashboard, it’s charts and the associated datasources but is missing human readable fields describing the contents of the dataset and dashboard as well as the physical location of the contents of the tables the datasource metadata describes. These fields will be added to the Superset dashboard export format.
Data location information will be stored in a top level files key next to the existing dashboards, slices and datasources keys. A top level description field will fill out the fields of a description of the dataset in the examples directory. The existing World’s Bank Data example is extended below:
{
“dashboards”: [ ... ],
“datasources”: [ ... ],
“description”: {
"created_at": "2019-05-20T16:20:24.883125",
"description": "World Bank Data example about world health populations from 1960-2010.",
"file_count": 1,
"license": "Apache 2.0"
"title": "World Bank Health Information",
"total_rows": 11770,
"total_size": 22561353,
"total_size_mb": 21.52,
},
“files”: [
{
"file_name": "wb_health_population.csv.gz",
"rows": 11770,
"size": 22561353,
"table_name": "wb_health_population"
}
],
“slices”: [ ... ]
}
The file layout for this example appears as follows, with the dashboard slug used as the directory name in the exported tarball and examples directory:
/
/world_health
/world_health/dashboard.json
/world_health/wb_health_population.csv.gz
Example Data Table Format
In order to manage tables, to create and drop them, it is helpful to assume that an Integer id primary column is present. This is the case for all current example dashboard tables. In the future we may want to support tables with uuid or other types of primary column.
New or Changed Public Interfaces
Changes include the addition of a SQLALCHEMY_EXAMPLES_URI and EXAMPLES_GIT_TAG configuration keys and changes to the model classes as well as the CLI.
The examples-data Repository
Currently the example data is on Github at apache-superset/examples-data. This will continue to be the case, but this repository will now house both Dashboard metadata files as well as data files via Git LFS. Each example will have its own directory with its own dashboard.json and data files.
The README.md for this repository in new new form can be accessed here: GitHub - rjurney/examples-data at lfs.
SQLALCHEMY_EXAMPLES_URI Configuration Key
A SQLALCHEMY_EXAMPLES_URI configuration key in superset/config.py controls the default location to load examples into. This defaults to ~/.superset/examples.db and can be over-ridden on a per-import basis using the --database-uri/-d option.
EXAMPLE_REPOS_TAGS Configuration Key
A EXAMPLE_REPOS_TAGS configuration key in superset/config.py controls the locations of the examples from which to list and load. This can be set manually using the --examples-repo/-r option. The format of the items are a tuple containing the full repository name (ex. apache-superset/examples-data) and the git tag/branch of the repository to use (ex. master).
In config.py the default entry will look like:
# Tuple format: Gitub repo full name, tag/branch
EXAMPLE_REPOS_TAGS = [
('rjurney/examples-data', 'v0.0.3')
]
GITHUB_AUTH_TOKEN Configuration Key
Github rate limits the contents API to 50 anonymous requests per hour. While this is unlikely to affect many users, the limit is by IP address which means users behind proxies or developers may sometimes encounter this. I have added the optional configuration key GITHUB_AUTH_TOKEN which provides a way to add a personal access token to requests from the examples sub-commands. This increases the API limit.
UUIDs via sqlalchemy.types.uuid.UUIDType
In order to export or import assets in a way that doesn’t result in integer primary key chaos, we require that each serialized asset have a unique identifier. The superset.models.helpers.ImportMixin class has been used to provide a uuid field to the following classes:
Dashboard
Datasource
Database
DruidCluster
DruidMetric
Slice
SqlMetric
SqlaTable
TableColumn
This required patching FlaskAppBuilder to support UUIDType as a String field type. This will be released with FlaskAppBuilder 2.1.4.
ImportMixin —> ImportExportMixin
I was confused by the role of ImportMixin in model class export, so accordingly I have renamed it to ImportExportMixin.
Command Line Interface
Example capabilities will be accessed via the command line (CLI) interface. The CLI will be changed, removing the load_examples command and replacing it with an examples subcommand with export, list, import and remove commands beneath it.
Top Level CLI Menu
superset —help
Usage: superset [OPTIONS] COMMAND [ARGS]...
This is a management script for the Superset application.
Options:
--version Show the flask version
--help Show this message and exit.
Commands:
db Perform database migrations.
examples Manages example chart/dashboards/datasets
...
Dashboard Exports Menu
The dashboard exports menu will be extended to add the —dashboard-titles/-t, —export-data/-x and —export-data-dir/-d options which facilitate example export.
superset export_dashboards --help
Usage: superset export_dashboards [OPTIONS]
Export dashboards to JSON
Options:
-f, --dashboard-file TEXT Specify the the file to export to
-p, --print_stdout Print JSON to stdout
-i, --dashboard-ids INTEGER Specify dashboard id to export
-t, --dashboard-titles TEXT Specify dashboard title to export
-x, --export-data Export the dashboard's data tables as CSV
files.
-d, --export-data-dir TEXT Specify export directory path. Defaults to
'/tmp'.
--help Show this message and exit.
Examples Top Level Menu
superset examples —-help
Usage: superset examples [OPTIONS] COMMAND [ARGS]...
Manages example dashboards/datasets
Options:
--help Show this message and exit.
Commands:
export Export example dashboard/datasets
list List example dashboards/datasets
import Import an example dashboard/dataset
remove Remove an example dashboard/dataset
Example Creation Menu
The examples creation command can be used to export a Dashboard JSON file along with its underlying data tables into a gzipped tarball file. These assets can then be uncompressed in the examples-data project, committed, pushed and then submitted by pull request.
superset examples export --help
Usage: superset examples export [OPTIONS]
Export example dashboard/datasets
Options:
-i, --dashboard-id INTEGER Specify dashboard id to export
-t, --dashboard-title TEXT Specify dashboard title to export
-d, --description TEXT Description of new example [required]
-e, --example-title TEXT Title for new example [required]
-f, --file-name TEXT Specify export file name. Defaults to
dashboard.tar.gz
-l, --license TEXT License of the example dashboard
--help Show this message and exit.
Examples List Menu
The examples list command will query the examples-data repository and return a list of available examples along with their metadata. Examples can then be loaded from this list.
superset examples list —-help
Usage: superset examples list [OPTIONS]
List example dashboards/datasets
Options:
-r, --examples-repo TEXT Full name of Github repository containing
examples, ex: 'apache-superset/examples-data'
-r, --examples-tag TEXT Tag or branch of Github repository containing
examples. Defaults to 'master'
--help Show this message and exit.
The output looks table uses prettytable and looks like this:
+-------------------------------+-------------------------------------------------------+-----------+-------+-------+------------------+-----------------------+--------+
| Title | Description | Size (MB) | Rows | Files | Created Date | Repository | Tag |
+-------------------------------+-------------------------------------------------------+-----------+-------+-------+------------------+-----------------------+--------+
| World Bank Health Information | World Bank Data example about world health populat... | 21.52 | 11770 | 1 | 2019-05-20T16:20 | rjurney/examples-data | v0.0.3 |
+-------------------------------+-------------------------------------------------------+-----------+-------+-------+------------------+-----------------------+--------+
Examples Import Menu
The examples import command will download example metadata and data files from the examples-data repository and will load them into the examples database configured via the SQLALCHEMY_EXAMPLES_URI configuration key or using the value supplied by the —database-uri/-d option.
superset examples load --help
Usage: superset examples load [OPTIONS]
Load an example dashboard/dataset
Options:
-d, --database-uri TEXT Database URI to load example to
-r, --examples-revision TEXT Revision of examples to list
-e, --example-title TEXT Title of example to load [required]
--help Show this message and exit.
Examples Remove Menu
The examples remove command will remove the installed example specified from the metadata tables as well as the examples database configured via the SQLALCHEMY_EXAMPLES_URI configuration key or using the value supplied by the —database-uri/-d option.
superset examples remove --help
Usage: superset examples remove [OPTIONS]
Remove an example dashboard/dataset
Options:
-e, --example-title TEXT Title of example to remove [required]
-d, --database-uri TEXT Database URI to load example to
--help Show this message and exit.
New Dependencies
Creating, removing, listing and loading examples can be handled without Git LFS but adding examples to the examples-data Github repository will require it. This is a developer only requirement of the examples-data project and not Superset itself.
Git LFS can be installed via:
# OS X
brew install git-lfs
# port install git-lfs
git lfs install
# Ubuntu
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
Migration Plan and Compatibility
Backwards compatibility will be maintained. Existing dashboard export JSON files will continue to work and all existing dashboard examples and their data files will be ported to the new system and stored in the Superset examples repository.
Cross-Repository Management
As both the examples and Superset evolve, some will work with newer versions of Superset than others. We must strive to keep all of them up to date, but should also try to make them backwards compatible. It will thus be inevitable that incubator-superset releases will have to point to a branch/tag of superset-examples.
A given release of superset must reference a certain version of the examples-data repository. This is achieved via the EXAMPLES_GIT_TAG configuration key. Alternatively, this could be a branch rather than a tag to facilitate the ongoing update of examples.
[SIP] Proposal for Improving Examples Interface, Organization and Storage
The goal of the changes in this proposal is to improve the examples capabilities of Superset so as to foster an ecosystem of examples which will sustain and grow as the platform continues to develop. First I will characterize the existing system of examples and then propose changes to improve the number and quality of examples.
Current Examples
Examples are currently programmatically defined in the
superset.datamodule. An abstract interface summarizing these examples looks like the following:While this mechanism jump started the collection of Superset examples, defining examples as code will not appeal to most Superset users of growing a community that contributes examples.
Current Dashboard Import/Export
Dashboard example creation could utilize the export feature by adding example oriented fields to the export. Dashboards can be exported via the Dashboard List interface at
/dashboard/list/via its Export action or at the command line viasuperset export_dashboards. Dashboard and chart export JSON includes everything needed to reproduce a dashboard save the actual data table: dashboard, chart and datasource information.The
datasources. __SqlaTable__.databaseelement will need to be removed when examples are created and recreated when they are loaded to matchSQLALCHEMY_EXAMPLES_URIor a—database-urithe user specifies. Each Slice’sdatasource_idanddatasource_namemust be changed.A Slice object has a
params_dictwhich contains the following. Note that this includes references to thedatasource_namewb_health_population.{'compare_lag': '10', 'compare_suffix': 'o10Y', 'country_fieldtype': 'cca3', 'entity': 'country_code', 'granularity_sqla': 'year', 'groupby': ['region', 'country_code'], 'limit': '25', 'markup_type': 'markdown', 'metric': 'sum__SP_POP_TOTL', 'metrics': ['sum__SP_POP_TOTL'], 'row_limit': 50000, 'secondary_metric': 'sum__SP_POP_TOTL', 'show_bubbles': True, 'since': '1960-01-01', 'time_range': '2014-01-01 : 2014-01-02', 'until': 'now', 'viz_type': 'treemap', 'where': '', 'remote_id': 708, 'datasource_name': 'wb_health_population', 'schema': 'wb_health_population', 'database_name': 'main'}]Example Components
A Superset example is a SQL oriented dashboard and is composed of the following:
All of the above with the exception of the Datasource.Database entry will need to be serialized, stored, contributed, approved, listed, deserialized and loaded by the example system. The Datasource.Database entry will need to be removed on exporting and replaced on importing of examples.
Scope of Improvement
This proposal improves the superset example process in three areas: example creation, data storage and discoverability.
In order to improve the range and quality of Superset examples we need to first improve the process for creating and loading examples. While examples can be created programmatically, the more natural process is to use Superset to create them. This requires that we automate the process to persist and restore the combined state of the Superset Dashboard, Database and related objects as well as the contents of the datasource itself.
We also need a directory to which examples can be uploaded and a corresponding user interface and process of governance over that repository. This directory should be independent of the Superset project release process and code repository. Current processes for management of Superset’s code assets would transition directly to the management of its examples: changes would be created by creating Github issues and pull requests, data assets would be versioned and managed in a central repository.
Finally we need a user interface for finding, listing and loading examples from the repository. It should be simple and can exist as an
examplescommand as part of the superset CLI which will haveexport,list,importandremovesub-commands.Example Repository Requirements
The requirement for example storage are that it have the following properties:
Git and Github are a desirable mechanism for publishing and approval but an undesirable mechanism for storage. Git LFS (Large File System) offers scalable storage while still using Github for project management. With a 2GB file limit and support on Github for 250 of these files, it scales well and is the proposed storage system. Other options are explored in the addendum.
Example File Format
Examples should be defined and packaged in a standard manner and each example should be self contained in its own file system directory. The existing
Dashboardexport format adequately describes a dashboard, it’s charts and the associated datasources but is missing human readable fields describing the contents of the dataset and dashboard as well as the physical location of the contents of the tables the datasource metadata describes. These fields will be added to the Superset dashboard export format.Data location information will be stored in a top level
fileskey next to the existingdashboards,slicesanddatasourceskeys. A top leveldescriptionfield will fill out the fields of a description of the dataset in the examples directory. The existingWorld’s Bank Dataexample is extended below:The file layout for this example appears as follows, with the dashboard slug used as the directory name in the exported tarball and examples directory:
Example Data Table Format
In order to manage tables, to create and drop them, it is helpful to assume that an
Integeridprimary column is present. This is the case for all current example dashboard tables. In the future we may want to support tables withuuidor other types of primary column.New or Changed Public Interfaces
Changes include the addition of a
SQLALCHEMY_EXAMPLES_URIandEXAMPLES_GIT_TAGconfiguration keys and changes to the model classes as well as the CLI.The
examples-dataRepositoryCurrently the example data is on Github at apache-superset/examples-data. This will continue to be the case, but this repository will now house both Dashboard metadata files as well as data files via Git LFS. Each example will have its own directory with its own
dashboard.jsonand data files.The README.md for this repository in new new form can be accessed here: GitHub - rjurney/examples-data at lfs.
SQLALCHEMY_EXAMPLES_URIConfiguration KeyA
SQLALCHEMY_EXAMPLES_URIconfiguration key insuperset/config.pycontrols the default location to load examples into. This defaults to~/.superset/examples.dband can be over-ridden on a per-import basis using the--database-uri/-doption.EXAMPLE_REPOS_TAGSConfiguration KeyA
EXAMPLE_REPOS_TAGSconfiguration key insuperset/config.pycontrols the locations of the examples from which to list and load. This can be set manually using the--examples-repo/-roption. The format of the items are a tuple containing the full repository name (ex.apache-superset/examples-data) and the git tag/branch of the repository to use (ex.master).In
config.pythe default entry will look like:GITHUB_AUTH_TOKENConfiguration KeyGithub rate limits the contents API to 50 anonymous requests per hour. While this is unlikely to affect many users, the limit is by IP address which means users behind proxies or developers may sometimes encounter this. I have added the optional configuration key
GITHUB_AUTH_TOKENwhich provides a way to add a personal access token to requests from the examples sub-commands. This increases the API limit.UUIDs via sqlalchemy.types.uuid.UUIDType
In order to export or import assets in a way that doesn’t result in integer primary key chaos, we require that each serialized asset have a unique identifier. The
superset.models.helpers.ImportMixinclass has been used to provide auuidfield to the following classes:DashboardDatasourceDatabaseDruidClusterDruidMetricSliceSqlMetricSqlaTableTableColumnThis required patching FlaskAppBuilder to support UUIDType as a String field type. This will be released with FlaskAppBuilder 2.1.4.
ImportMixin —> ImportExportMixin
I was confused by the role of
ImportMixinin model class export, so accordingly I have renamed it toImportExportMixin.Command Line Interface
Example capabilities will be accessed via the command line (CLI) interface. The CLI will be changed, removing the
load_examplescommand and replacing it with anexamplessubcommand withexport,list,importandremovecommands beneath it.Top Level CLI Menu
superset —helpUsage: superset [OPTIONS] COMMAND [ARGS]... This is a management script for the Superset application. Options: --version Show the flask version --help Show this message and exit. Commands: db Perform database migrations. examples Manages example chart/dashboards/datasets ...Dashboard Exports Menu
The dashboard exports menu will be extended to add the
—dashboard-titles/-t,—export-data/-xand—export-data-dir/-doptions which facilitate example export.superset export_dashboards --helpExamples Top Level Menu
superset examples —-helpUsage: superset examples [OPTIONS] COMMAND [ARGS]... Manages example dashboards/datasets Options: --help Show this message and exit. Commands: export Export example dashboard/datasets list List example dashboards/datasets import Import an example dashboard/dataset remove Remove an example dashboard/datasetExample Creation Menu
The examples creation command can be used to export a Dashboard JSON file along with its underlying data tables into a gzipped tarball file. These assets can then be uncompressed in the
examples-dataproject, committed, pushed and then submitted by pull request.superset examples export --helpExamples List Menu
The examples list command will query the
examples-datarepository and return a list of available examples along with their metadata. Examples can then be loaded from this list.superset examples list —-helpUsage: superset examples list [OPTIONS] List example dashboards/datasets Options: -r, --examples-repo TEXT Full name of Github repository containing examples, ex: 'apache-superset/examples-data' -r, --examples-tag TEXT Tag or branch of Github repository containing examples. Defaults to 'master' --help Show this message and exit.The output looks table uses
prettytableand looks like this:Examples Import Menu
The examples import command will download example metadata and data files from the
examples-datarepository and will load them into the examples database configured via theSQLALCHEMY_EXAMPLES_URIconfiguration key or using the value supplied by the—database-uri/-doption.superset examples load --helpExamples Remove Menu
The examples remove command will remove the installed example specified from the metadata tables as well as the examples database configured via the
SQLALCHEMY_EXAMPLES_URIconfiguration key or using the value supplied by the—database-uri/-doption.superset examples remove --helpNew Dependencies
Creating, removing, listing and loading examples can be handled without Git LFS but adding examples to the
examples-dataGithub repository will require it. This is a developer only requirement of theexamples-dataproject and not Superset itself.Git LFS can be installed via:
Migration Plan and Compatibility
Backwards compatibility will be maintained. Existing dashboard export JSON files will continue to work and all existing dashboard examples and their data files will be ported to the new system and stored in the Superset examples repository.
Cross-Repository Management
As both the examples and Superset evolve, some will work with newer versions of Superset than others. We must strive to keep all of them up to date, but should also try to make them backwards compatible. It will thus be inevitable that
incubator-supersetreleases will have to point to a branch/tag ofsuperset-examples.A given release of superset must reference a certain version of the
examples-datarepository. This is achieved via theEXAMPLES_GIT_TAGconfiguration key. Alternatively, this could be a branch rather than a tag to facilitate the ongoing update of examples.