-
-
Notifications
You must be signed in to change notification settings - Fork 371
Open
Description
This issue is a placeholder for gathering thoughts and discussion on requirements for storage of columnar tables (a la bcolz). The idea is to explore how the functionalities currently available in zarr and bcolz might be brought together. This is not necessarily something that will happen in zarr, just an attempt to pull together thoughts and discussion at this stage.
Requirements
Below is a list of possible requirements. If you have any thoughts or comments on the requirements, please add a comment below, including if requirements should be non-requirements (e.g., if the requirement can be achieved by using zarr with dask).
- Store data from a pandas DataFrame without loss.
- Load data into a pandas DataFrame without loss.
- Store data from a NumPy recarray without loss.
- Load data into a NumPy recarray without loss.
- Play well with dask.dataframe. E.g., it should be possible to write a
from_zarr()function into dask.dataframe, which would allow out-of-core dask computations against the zarr stored data. - Append a single row.
- Append multiple rows as a block.
- Iterate over rows.
- Iterate over rows matching a query.
- Evaluate an expression against columns (e.g., "a + b * c").
- Parallel query (not sure what this means)?
- Parallel append (not sure what this means)?
- Add a column.
- Delete a column.
Kyrish
Metadata
Metadata
Assignees
Labels
No labels