Skip to content

wareflowx/excel-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

159 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Excel CLI Toolkit

Command-line toolkit for Excel data manipulation and analysis.

Python 3.14+ License: MIT

Overview

Excel CLI Toolkit (xl) is a powerful command-line interface for performing data wrangling and analysis operations on Excel files. Designed for both human users and AI systems, it provides fast, predictable operations without requiring scripts or programming.

Features

  • Filter & Search: Filter rows by conditions, search for values
  • Sort & Group: Sort by columns, group and aggregate data
  • Transform: Apply transformations, clean data, deduplicate
  • Multi-format: Support for XLSX, CSV, JSON, and Parquet
  • Functional Programming: Built with Result/Maybe types for robust error handling
  • AI-Friendly: Simple, composable commands perfect for AI automation
  • Fast: Efficient processing of large files

Installation

Using pip

pip install excel-toolkit-cwd

Using uv (recommended)

uv pip install excel-toolkit-cwd

Development installation

git clone https://github.com/wareflowx/excel-toolkit.git
cd excel-toolkit
uv pip install -e ".[dev]"

With Parquet support

pip install "excel-toolkit-cwd[parquet]"

Quick Start

Basic filtering

# Filter rows where Amount > 1000
xl filter sales.xlsx --where "Amount > 1000" --output filtered.xlsx

# Filter by multiple conditions
xl filter data.xlsx --where "Region == 'North' and Price > 100" -o result.xlsx

Sorting and aggregation

# Sort by column
xl sort data.xlsx --by "Date" --descending --output sorted.xlsx

# Group and aggregate
xl group sales.xlsx --by "Region" --aggregate "Amount:sum" --output grouped.xlsx

Data cleaning

# Remove duplicates
xl dedupe data.xlsx --by "Email" --output unique.xlsx

# Clean whitespace and standardize
xl clean data.xlsx --trim --lowercase --columns "Name,Email" --output clean.xlsx

File conversion

# Convert Excel to CSV
xl convert data.xlsx --output data.csv

# Convert CSV to Excel
xl convert data.csv --output data.xlsx

Pipeline operations

# Chain operations
xl filter sales.xlsx --where "Amount > 1000" | \
  xl sort --by "Date" --descending | \
  xl group --by "Region" --aggregate "Amount:sum" \
  --output final.xlsx

Documentation

For detailed documentation, see the docs/ directory:

Development

Setup development environment

# Clone repository
git clone https://github.com/wareflowx/excel-toolkit.git
cd excel-toolkit

# Install dependencies
uv sync --all-extras

# Install pre-commit hooks
pre-commit install

Running tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=excel_toolkit

# Run specific test file
uv run pytest tests/unit/test_filtering.py

Code quality

# Linting
uv run ruff check .

# Formatting
uv run ruff format .

# Type checking
uv run mypy excel_toolkit/

Command Reference

Core Commands

  • xl filter - Filter rows based on conditions
  • xl select - Select specific columns
  • xl sort - Sort by column(s)
  • xl group - Group and aggregate data
  • xl join - Join multiple files
  • xl clean - Clean data (trim, case, etc.)
  • xl dedupe - Remove duplicates
  • xl transform - Apply transformations
  • xl convert - Convert between formats
  • xl merge - Merge multiple files
  • xl stats - Calculate statistics
  • xl info - Display file information
  • xl validate - Validate data

Getting help

# General help
xl --help

# Command-specific help
xl filter --help

Examples

Data Analysis Workflow

# Extract sales data for Q1, filter high-value orders, group by region
xl filter sales.xlsx --where "Date >= '2024-01-01' and Date <= '2024-03-31'" | \
  xl filter --where "Amount > 1000" | \
  xl group --by "Region" --aggregate "Amount:sum,Orders:count" \
  --output q1_high_value_by_region.xlsx

Data Cleaning Pipeline

# Clean messy CSV file
xl clean messy_data.csv \
  --trim \
  --lowercase \
  --columns "email,name" \
  --output cleaned.csv

# Remove duplicates and validate
xl dedupe cleaned.csv --by "email" --output unique.csv
xl validate unique.csv --columns "email:email,age:int:0-120" --output final.csv

Architecture

Built with a functional programming approach:

  • Result types: Explicit error handling without exceptions
  • Maybe types: Safe handling of optional values
  • Immutable configuration: Predictable behavior
  • Composable operations: Chain commands efficiently

For more details, see FUNCTIONAL_ANALYSIS.md.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with:

About

A powerful command-line toolkit for Excel and CSV data manipulation, analysis, and transformation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages