Skip to content

Document workflow for contributing new data #77

@ha0ye

Description

@ha0ye

(Some of this content will likely end up in package docs, but I'm starting the notes here.)

Current Methodology

Currently, there are two general ways that MATSS loads datasets:

  1. It includes raw CSVs, and provides functions to load those directly. (files are located in inst/extdata)
  2. It has functions that download the data to disk, and then load the data from disk and process into the correct format. (used by both the Portal data function and various retriever datasets)

(Alternatively, make the cache hold all of the raw datasets, by downloading to a temporary location and importing immediately. This is more limiting on usage, as then one is required to use some sort of Drake caching or one's own caching to avoid repeated downloading every time the code is run.)

Issue

With a growing number of datasets, we should address the question of a standardized download location for 2.

Proposal

Use an R environment variable for the parent folder for downloaded datasets:

  • provide functions for users to set it or modify it, and create the folder if it doesn't exist
  • add a function to retrieve this using Sys.getenv()
  • use the function to retrieve the path as the default location for downloading and importing data. (so users only need to set it once, but can still modify it on a case by case basis)

We can build this into portalr and then import those functions for our use here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions