Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 22 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,35 +51,46 @@ import dataworkbench

To use it on your local machine, it requires you to set a set of variables to connect to the Veracity Dataworkbench API.

### Basic Example


## Examples

### Saving a Spark DataFrame to the Data Catalogue
```python
from dataworkbench import DataCatalogue

df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "number"])

datacatalogue = DataCatalogue() # Naming subject to change
datacatalogue.save(df, "Dataset Name", "Description", tags={"environment": ["test"]})
datacatalogue = DataCatalogue()
datacatalogue.save(
df,
"Dataset Name",
"Description",
tags={"environment": ["test"]}
) # schema_id is optional - if not provided, schema will be inferred from the dataframe
```

## Examples

### Saving a Spark DataFrame to the Data Catalogue

#### Using an existing schema
When you have an existing schema that you want to reuse:
```python
from dataworkbench import DataCatalogue

df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "number"])

datacatalogue = DataCatalogue() # Naming subject to change
datacatalogue.save(df, "Dataset Name", "Description", tags={"environment": ["test"]})
datacatalogue = DataCatalogue()
datacatalogue.save(
df,
"Dataset Name",
"Description",
tags={"environment": ["test"]},
schema_id="abada0f7-acb4-43cf-8f54-b51abd7ba8b1" # Using an existing schema ID
)
```

## API Reference

### DataCatalogue

- `save(df, name, description=None, tags=None)`: Save a Spark DataFrame to the Data Workbench Data Catalogue
- `save(df, name, description, schema_id=None, tags=None)`: Save a Spark DataFrame to the Data Workbench Data Catalogue


## License
Expand Down
2 changes: 1 addition & 1 deletion src/dataworkbench/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def is_databricks():
return os.getenv("DATABRICKS_RUNTIME_VERSION") is not None


def get_secret(key: str, scope: str = "secrets") -> str:
def get_secret(key: str, scope: str = "dwsecrets") -> str:
"""
Retrieve a secret from dbutils if running on Databricks, otherwise fallback to env variables.
"""
Expand Down