Skip to content

Tables/DataFrames - requirements #149

@alimanfoo

Description

@alimanfoo

This issue is a placeholder for gathering thoughts and discussion on requirements for storage of columnar tables (a la bcolz). The idea is to explore how the functionalities currently available in zarr and bcolz might be brought together. This is not necessarily something that will happen in zarr, just an attempt to pull together thoughts and discussion at this stage.

Requirements

Below is a list of possible requirements. If you have any thoughts or comments on the requirements, please add a comment below, including if requirements should be non-requirements (e.g., if the requirement can be achieved by using zarr with dask).

  • Store data from a pandas DataFrame without loss.
  • Load data into a pandas DataFrame without loss.
  • Store data from a NumPy recarray without loss.
  • Load data into a NumPy recarray without loss.
  • Play well with dask.dataframe. E.g., it should be possible to write a from_zarr() function into dask.dataframe, which would allow out-of-core dask computations against the zarr stored data.
  • Append a single row.
  • Append multiple rows as a block.
  • Iterate over rows.
  • Iterate over rows matching a query.
  • Evaluate an expression against columns (e.g., "a + b * c").
  • Parallel query (not sure what this means)?
  • Parallel append (not sure what this means)?
  • Add a column.
  • Delete a column.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions