-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
xl does not support referencing columns by index/position (e.g., column 1, 2, 3). This is critical when column names contain special characters that break filter/group commands (see XL-001).
Current Behavior
# Column name "Qté VTE" has special character "é"
# Cannot filter with name:
$ xl filter "sales.xlsx" -s DONNEES "`Qté VTE` < 0"
/usr/bin/bash: line 1: Qté: command not found
# Cannot reference by index either:
$ xl filter "sales.xlsx" -s DONNEES -c 7 "< 0"
Error: Column "7" not found
# No workaround available in xlExpected Behavior
Users should be able to reference columns by position:
# Reference column 7 by index
$ xl filter "sales.xlsx" -s DONNEES --col 7 "< 0"
# Or with -c shorthand
$ xl filter "sales.xlsx" -s DONNEES -c 7 "< 0"
# Or with @ notation
$ xl filter "sales.xlsx" -s DONNEES "@7 < 0"Steps to Reproduce
- Create Excel file with column named "Qté VTE" (contains accent)
- Try to filter on column by name: Fails (XL-001)
- Try to filter on column by index: Fails (this issue)
- No way to filter on this column
Real-World Impact
Use Case: Filter on Numeric Column with Special Characters
Business Question: Find outlier transactions (negative quantities, bulk orders)
Column: Qté VTE (Quantity Sold in French)
Position: Column 7
Current Approach (Impossible):
# Cannot use name (special character bug)
xl filter sales.xlsx -s DONNEES "`Qté VTE` < 0"
# Error: bash interpretation error
# Cannot use index (not supported)
xl filter sales.xlsx -s DONNEES --col 7 "< 0"
# Error: Column "7" not found
# Must export to Python
python -c "import pandas as pd; df = pd.read_excel('sales.xlsx'); print(df[df.iloc[:, 6] < 0])"Proposed Approach (with column indexing):
xl filter sales.xlsx -s DONNEES --col 7 "< 0"
# Works immediatelyUse Case: Dynamic Column Reference
Business Question: Filter on last column without knowing name
Current: Must inspect file first, get column name
Proposed: xl filter file.xlsx --col -1 "< 100" (last column)
Use Case: Anonymous Data
Business Question: Analyze CSV/file without headers
Current: Cannot reference columns easily
Proposed: xl group file.csv --by @1 --aggregate "@3:sum"
Proposed Solution
Add column index/position reference support to all column-accepting commands.
Syntax Options
Option 1: --col INDEX (Preferred)
xl filter file.xlsx --col 7 "condition"Option 2: -c INDEX (Shorthand)
xl filter file.xlsx -c 7 "condition"Option 3: @INDEX notation
xl filter file.xlsx "@7 < 0"Option 4: Negative indexing (Python style)
xl filter file.xlsx --col -1 "< 100" # Last column
xl filter file.xlsx --col -2 "< 50" # Second-to-last columnRecommendation: Support Options 1, 2, and 4 for maximum flexibility
Affected Commands
All commands that accept column names should also support column indices:
| Command | Column Flag | Should Support Index? |
|---|---|---|
xl filter |
--condition (implicit) |
✅ Yes |
xl group |
--by |
✅ Yes |
xl unique |
-c, --columns |
✅ Yes |
xl count |
-c, --columns |
✅ Yes |
xl sort |
--by |
✅ Yes |
xl select |
-c, --columns |
✅ Yes |
xl stats |
-c, --columns |
✅ Yes |
Acceptance Criteria
- Can filter on column by positive index (1, 2, 3, ...)
- Can filter on column by negative index (-1, -2, -3 for last, 2nd last, ...)
- Can group by column index
- Can sort by column index
- Can get unique values by column index
- Can get stats by column index
- Works with Excel files
- Works with CSV files
- Documentation updated with examples
- Error message shows: "Did you mean column index? Use --col N"
Test Cases
# Test 1: Filter by positive index
xl filter sales.xlsx -s DONNEES --col 7 "< 0"
# Expected: Rows where column 7 < 0
# Test 2: Filter by negative index (last column)
xl filter sales.xlsx --col -1 "> 100"
# Expected: Rows where last column > 100
# Test 3: Group by column index
xl group sales.xlsx --by 2 --aggregate 5:sum
# Expected: Group by column 2, sum column 5
# Test 4: Sort by column index
xl sort sales.xlsx --by 3 --order desc
# Expected: Sort by column 3 descending
# Test 5: Stats by column index
xl stats sales.xlsx -c 4
# Expected: Statistics for column 4
# Test 6: Unique by column index
xl unique sales.xlsx -c 2
# Expected: Unique values in column 2
# Test 7: Mixed names and indices
xl group sales.xlsx --by "Circuit,3" --aggregate 5:sum
# Expected: Group by Circuit name and column 3
# Test 8: Aggregation by index
xl group sales.xlsx --by 2 --aggregate "5:sum,6:mean"
# Expected: Group by col 2, sum col 5, mean col 6Implementation Notes
Index Convention
Important: 1-based indexing (user-friendly) vs 0-based (Python-style)?
Recommendation: 1-based for CLI users (more intuitive):
- Column 1 = First column
- Column -1 = Last column
- Matches Excel/SQL conventions
Alternative: Support both with clear documentation
Detection Logic
# Pseudo-code
def resolve_column_reference(col_ref, df):
if isinstance(col_ref, int):
# Column index
return df.columns[col_ref - 1] # Convert 1-based to 0-based
elif col_ref.startswith('@'):
# @ notation
idx = int(col_ref[1:])
return df.columns[idx - 1]
elif col_ref.lstrip('-').isdigit():
# Negative index
return df.columns[int(col_ref)]
else:
# Column name (existing behavior)
return col_refError Messages
# When column name not found
$ xl filter sales.xlsx "NonExistent < 0"
Error: Column 'NonExistent' not found
Available columns: DateVente, jour, Circuit, Magasin, Catégorie, Libellé, Qté VTE
Hint: Use column index (e.g., --col 7) for special characters
# When index out of range
$ xl filter sales.xlsx --col 20 "< 0"
Error: Column index 20 out of range (file has 7 columns)Related Issues
- XL-001: Special characters in filter (primary use case for column indexing)
- XL-011: Inconsistent option naming (this adds consistency)
- XL-003: No sorting in group (could use index for sort column)
Workarounds
Current Workaround 1: Rename columns first
# If xl rename worked (future feature)
xl rename sales.xlsx --from "Qté VTE" --to "Qty"
xl filter sales.xlsx "Qty < 0"Current Workaround 2: Export to Python/pandas
import pandas as pd
df = pd.read_excel('sales.xlsx')
# Can use iloc for position-based indexing
result = df[df.iloc[:, 6] < 0] # Column 7 (0-based = 6)Current Workaround 3: Use Excel
# Open in Excel
# Filter by column position (column G)
# Save filtered fileBenefits
- Special Character Workaround: Fixes XL-001 without renaming
- Anonymous Data: Analyze files without headers
- Dynamic Scripts: Reference columns without knowing names
- International Users: Work around character encoding issues
- Quick Analysis: No need to memorize column names
- Robustness: Works even when column names change
Industry Standards
pandas
df.iloc[:, 6] # Column 7 (0-based indexing)SQL
-- No standard column indexing, but some databases support:
SELECT * FROM table ORDER BY 3 # Order by 3rd columnExcel
Column G = 7th column (1-based)
Recommendation: Follow Excel convention (1-based indexing)
Advanced Features (Future)
Range Notation
xl select sales.xlsx --columns "1-5" # Select columns 1 through 5Multiple Indices
xl group sales.xlsx --by "1,2,3" --aggregate "5:sum"Header Detection
xl filter sales.csv --no-header --col 3 "< 0"
# For CSV files without header rowLabels
feature-request high-priority workaround column-indexing special-characters usability