Skip to content

No column indexing - cannot reference columns by position when names have special characters #12

@AliiiBenn

Description

@AliiiBenn

Description

xl does not support referencing columns by index/position (e.g., column 1, 2, 3). This is critical when column names contain special characters that break filter/group commands (see XL-001).

Current Behavior

# Column name "Qté VTE" has special character "é"
# Cannot filter with name:
$ xl filter "sales.xlsx" -s DONNEES "`Qté VTE` < 0"
/usr/bin/bash: line 1: Qté: command not found

# Cannot reference by index either:
$ xl filter "sales.xlsx" -s DONNEES -c 7 "< 0"
Error: Column "7" not found

# No workaround available in xl

Expected Behavior

Users should be able to reference columns by position:

# Reference column 7 by index
$ xl filter "sales.xlsx" -s DONNEES --col 7 "< 0"

# Or with -c shorthand
$ xl filter "sales.xlsx" -s DONNEES -c 7 "< 0"

# Or with @ notation
$ xl filter "sales.xlsx" -s DONNEES "@7 < 0"

Steps to Reproduce

  1. Create Excel file with column named "Qté VTE" (contains accent)
  2. Try to filter on column by name: Fails (XL-001)
  3. Try to filter on column by index: Fails (this issue)
  4. No way to filter on this column

Real-World Impact

Use Case: Filter on Numeric Column with Special Characters

Business Question: Find outlier transactions (negative quantities, bulk orders)

Column: Qté VTE (Quantity Sold in French)
Position: Column 7

Current Approach (Impossible):

# Cannot use name (special character bug)
xl filter sales.xlsx -s DONNEES "`Qté VTE` < 0"
# Error: bash interpretation error

# Cannot use index (not supported)
xl filter sales.xlsx -s DONNEES --col 7 "< 0"
# Error: Column "7" not found

# Must export to Python
python -c "import pandas as pd; df = pd.read_excel('sales.xlsx'); print(df[df.iloc[:, 6] < 0])"

Proposed Approach (with column indexing):

xl filter sales.xlsx -s DONNEES --col 7 "< 0"
# Works immediately

Use Case: Dynamic Column Reference

Business Question: Filter on last column without knowing name

Current: Must inspect file first, get column name
Proposed: xl filter file.xlsx --col -1 "< 100" (last column)

Use Case: Anonymous Data

Business Question: Analyze CSV/file without headers

Current: Cannot reference columns easily
Proposed: xl group file.csv --by @1 --aggregate "@3:sum"

Proposed Solution

Add column index/position reference support to all column-accepting commands.

Syntax Options

Option 1: --col INDEX (Preferred)

xl filter file.xlsx --col 7 "condition"

Option 2: -c INDEX (Shorthand)

xl filter file.xlsx -c 7 "condition"

Option 3: @INDEX notation

xl filter file.xlsx "@7 < 0"

Option 4: Negative indexing (Python style)

xl filter file.xlsx --col -1 "< 100"  # Last column
xl filter file.xlsx --col -2 "< 50"   # Second-to-last column

Recommendation: Support Options 1, 2, and 4 for maximum flexibility

Affected Commands

All commands that accept column names should also support column indices:

Command Column Flag Should Support Index?
xl filter --condition (implicit) ✅ Yes
xl group --by ✅ Yes
xl unique -c, --columns ✅ Yes
xl count -c, --columns ✅ Yes
xl sort --by ✅ Yes
xl select -c, --columns ✅ Yes
xl stats -c, --columns ✅ Yes

Acceptance Criteria

  • Can filter on column by positive index (1, 2, 3, ...)
  • Can filter on column by negative index (-1, -2, -3 for last, 2nd last, ...)
  • Can group by column index
  • Can sort by column index
  • Can get unique values by column index
  • Can get stats by column index
  • Works with Excel files
  • Works with CSV files
  • Documentation updated with examples
  • Error message shows: "Did you mean column index? Use --col N"

Test Cases

# Test 1: Filter by positive index
xl filter sales.xlsx -s DONNEES --col 7 "< 0"
# Expected: Rows where column 7 < 0

# Test 2: Filter by negative index (last column)
xl filter sales.xlsx --col -1 "> 100"
# Expected: Rows where last column > 100

# Test 3: Group by column index
xl group sales.xlsx --by 2 --aggregate 5:sum
# Expected: Group by column 2, sum column 5

# Test 4: Sort by column index
xl sort sales.xlsx --by 3 --order desc
# Expected: Sort by column 3 descending

# Test 5: Stats by column index
xl stats sales.xlsx -c 4
# Expected: Statistics for column 4

# Test 6: Unique by column index
xl unique sales.xlsx -c 2
# Expected: Unique values in column 2

# Test 7: Mixed names and indices
xl group sales.xlsx --by "Circuit,3" --aggregate 5:sum
# Expected: Group by Circuit name and column 3

# Test 8: Aggregation by index
xl group sales.xlsx --by 2 --aggregate "5:sum,6:mean"
# Expected: Group by col 2, sum col 5, mean col 6

Implementation Notes

Index Convention

Important: 1-based indexing (user-friendly) vs 0-based (Python-style)?

Recommendation: 1-based for CLI users (more intuitive):

  • Column 1 = First column
  • Column -1 = Last column
  • Matches Excel/SQL conventions

Alternative: Support both with clear documentation

Detection Logic

# Pseudo-code
def resolve_column_reference(col_ref, df):
    if isinstance(col_ref, int):
        # Column index
        return df.columns[col_ref - 1]  # Convert 1-based to 0-based
    elif col_ref.startswith('@'):
        # @ notation
        idx = int(col_ref[1:])
        return df.columns[idx - 1]
    elif col_ref.lstrip('-').isdigit():
        # Negative index
        return df.columns[int(col_ref)]
    else:
        # Column name (existing behavior)
        return col_ref

Error Messages

# When column name not found
$ xl filter sales.xlsx "NonExistent < 0"
Error: Column 'NonExistent' not found
Available columns: DateVente, jour, Circuit, Magasin, Catégorie, Libellé, Qté VTE
Hint: Use column index (e.g., --col 7) for special characters

# When index out of range
$ xl filter sales.xlsx --col 20 "< 0"
Error: Column index 20 out of range (file has 7 columns)

Related Issues

  • XL-001: Special characters in filter (primary use case for column indexing)
  • XL-011: Inconsistent option naming (this adds consistency)
  • XL-003: No sorting in group (could use index for sort column)

Workarounds

Current Workaround 1: Rename columns first

# If xl rename worked (future feature)
xl rename sales.xlsx --from "Qté VTE" --to "Qty"
xl filter sales.xlsx "Qty < 0"

Current Workaround 2: Export to Python/pandas

import pandas as pd
df = pd.read_excel('sales.xlsx')
# Can use iloc for position-based indexing
result = df[df.iloc[:, 6] < 0]  # Column 7 (0-based = 6)

Current Workaround 3: Use Excel

# Open in Excel
# Filter by column position (column G)
# Save filtered file

Benefits

  1. Special Character Workaround: Fixes XL-001 without renaming
  2. Anonymous Data: Analyze files without headers
  3. Dynamic Scripts: Reference columns without knowing names
  4. International Users: Work around character encoding issues
  5. Quick Analysis: No need to memorize column names
  6. Robustness: Works even when column names change

Industry Standards

pandas

df.iloc[:, 6]  # Column 7 (0-based indexing)

SQL

-- No standard column indexing, but some databases support:
SELECT * FROM table ORDER BY 3  # Order by 3rd column

Excel

Column G = 7th column (1-based)

Recommendation: Follow Excel convention (1-based indexing)

Advanced Features (Future)

Range Notation

xl select sales.xlsx --columns "1-5"  # Select columns 1 through 5

Multiple Indices

xl group sales.xlsx --by "1,2,3" --aggregate "5:sum"

Header Detection

xl filter sales.csv --no-header --col 3 "< 0"
# For CSV files without header row

Labels

feature-request high-priority workaround column-indexing special-characters usability

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestNew feature or functionality requesthigh-priorityHigh priority issuesi18nInternationalization and localization issuesusabilityUsability and user experienceworkaroundWorkaround solutions for critical bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions