Skip to content

guide: Data Management #2856

@iesahin

Description

@iesahin

UPDATE: #2856 (comment)


This is the plan for data management trail that focuses on:

Details

Adding data to DVC projects

  • Initialize a DVC repository and use dvc add to add files.

  • We'll assume MNIST data exist in a folder and will add it.

Versioning data in DVC projects

  • Overwrite Fashion-MNIST data on top of MNIST and update the dataset.
  • Go back and forth in Git history to get different datasets in the same folder.

Creating remotes

  • Add a Google Drive folder as a remote.

  • Make it default

Pushing to/pulling from remotes

  • Push the cache to the remote we created
  • Clone the repository to somewhere (e.g. ssh or local folder)
  • Pull the cache

Accessing public datasets and registries

  • Get the Fashion MNIST data from dataset-registry

Removing data from DVC projects

  • Remove certain folders from workspace
  • Delete the corresponding cache files

UPDATE: start with a reorg, see #2856 (comment) below (may be enough).

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: docsArea: user documentation (gatsby-theme-iterative)C: guideContent of /doc/user-guide✨ epicPlaceholder ticket for multi-sprint direction, use story, improvement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions