Skip to content

No date/time extraction functions - cannot perform time series analysis #7

@AliiiBenn

Description

@AliiiBenn

Description

The xl transform command does not provide date/time extraction functions. Users cannot extract year, month, quarter, day of week from datetime columns, making time series analysis (year-over-year comparison, seasonal analysis, etc.) impossible.

Current Behavior

# Attempt to extract year from date
$ xl transform "sales.xlsx" -c "DateVente" -o "Year" -f "extract_year"

Usage: xl transform [OPTIONS] FILE_PATH
Try 'xl transform --help' for help
+- Error ---------------------------------------------------------------------+
| Invalid transformation type: 'extract_year'                                 |

No date extraction functions are available.

Expected Behavior

Users should be able to extract date components:

# Extract year
xl transform "sales.xlsx" -c "DateVente" -o "Year" -f "year"

# Extract month
xl transform "sales.xlsx" -c "DateVente" -o "Month" -f "month"

# Extract quarter
xl transform "sales.xlsx" -c "DateVente" -o "Quarter" -f "quarter"

# Extract day of week
xl transform "sales.xlsx" -c "DateVente" -o "Weekday" -f "weekday"

Steps to Reproduce

  1. Create Excel file with datetime column (e.g., "2023-04-13 00:00:00")
  2. Try to extract year: xl transform file.xlsx -c "Date" -o "Year" -f "year"
  3. Observe error: "Invalid transformation type"

Real-World Impact

Use Case: Year-Over-Year Sales Analysis

Business Question: How did each circuit perform in 2023 vs 2024 vs 2025?

Current Approach (Manual):

# Option 1: Manual extraction from pre-aggregated TCD sheet
# Time: 30 minutes of manual data entry
# Error-prone, not scalable

# Option 2: Export to Python/pandas
xl export "sales.xlsx" --output sales.csv

# Python script:
import pandas as pd
df = pd.read_csv('sales.csv')
df['Year'] = pd.to_datetime(df['DateVente']).dt.year
yearly_sales = df.groupby('Year')['Sales'].sum()
yearly_sales.to_excel('yearly_sales.xlsx')

Proposed Approach (with xl):

# One command chain:
xl transform "sales.xlsx" -c "DateVente" -o "Year" -f "year" --output with_year.xlsx | \
xl group --by "Year,Circuit" --aggregate "Sales:sum"

Critical Business Analyses Blocked

  1. Year-over-Year Growth Rates

    • Cannot calculate: (2024_sales - 2023_sales) / 2023_sales
    • Essential for business planning
  2. Seasonal Analysis

    • Cannot extract month/quarter
    • Cannot identify seasonal patterns
  3. Day-of-Week Patterns

    • Cannot extract weekday
    • Cannot analyze weekend vs weekday performance
  4. Trend Analysis

    • Cannot group by time periods
    • Cannot identify upward/downward trends

Proposed Solution

Add date/time extraction functions to xl transform command.

Function Syntax

xl transform FILE_PATH -c COLUMN -o OUTPUT_COLUMN -f FUNCTION

Supported Functions

Function Description Example Input Example Output
year Extract year (4-digit) 2023-04-13 2023
month Extract month (1-12) 2023-04-13 4
quarter Extract quarter (1-4) 2023-04-13 2
day Extract day of month (1-31) 2023-04-13 13
weekday Extract day of week (0-6, Mon=0) 2023-04-13 3 (Thursday)
week Extract ISO week number (1-53) 2023-04-13 15
dayofyear Extract day of year (1-366) 2023-04-13 103
hour Extract hour (0-23) 2023-04-13 14:30:00 14
minute Extract minute (0-59) 2023-04-13 14:30:00 30
date Extract date only 2023-04-13 14:30:00 2023-04-13
time Extract time only 2023-04-13 14:30:00 14:30:00

Extended Functions (Future)

Function Description Example
year_month Combined year-month (YYYY-MM) 2023-04
year_quarter Combined year-quarter (YYYY-QX) 2023-Q2
month_name Month name (January, February, ...) April
weekday_name Weekday name (Monday, Tuesday, ...) Thursday
is_weekend Boolean: is weekend? true/false

Acceptance Criteria

  • Can extract year from datetime column
  • Can extract month from datetime column
  • Can extract quarter from datetime column
  • Can extract day of week from datetime column
  • Can extract hour/minute from datetime column
  • Works with Excel datetime format
  • Works with CSV datetime format
  • Handles NULL/missing datetime values gracefully
  • Output column has correct data type (integer for year/month, etc.)
  • Documentation updated with examples
  • Error handling: clear error if column is not datetime

Test Cases

# Test 1: Extract year
xl transform sales.xlsx -c "DateVente" -o "Year" -f "year"
# Expected: New column "Year" with values 2023, 2024, 2025

# Test 2: Extract month
xl transform sales.xlsx -c "DateVente" -o "Month" -f "month"
# Expected: New column "Month" with values 1-12

# Test 3: Extract quarter
xl transform sales.xlsx -c "DateVente" -o "Quarter" -f "quarter"
# Expected: New column "Quarter" with values 1-4

# Test 4: Extract weekday
xl transform sales.xlsx -c "DateVente" -o "Weekday" -f "weekday"
# Expected: New column "Weekday" with values 0-6

# Test 5: Chain for time series analysis
xl transform sales.xlsx -c "DateVente" -o "Year" -f "year" --output with_year.xlsx
xl group with_year.xlsx --by "Year" --aggregate "Sales:sum"
# Expected: Yearly totals

# Test 6: Multiple extractions
xl transform sales.xlsx -c "DateVente" -o "Year" -f "year" \
  -c "DateVente" -o "Month" -f "month" \
  -c "DateVente" -o "Quarter" -f "quarter"
# Expected: Three new columns

# Test 7: Handle NULL dates
xl transform sales.xlsx -c "DateVente" -o "Year" -f "year"
# Expected: NULL or NaN for missing dates

Real-World Examples

Example 1: Year-Over-Year Analysis

# Extract year and aggregate
xl transform sales.xlsx -c "DateVente" -o "Year" -f "year" --output temp.xlsx
xl group temp.xlsx --by "Year,Circuit" --aggregate "Sales:sum"

# Result:
# Year | Circuit | Sales_sum
# 2023 | 24H     | 2299
# 2024 | 24H     | 1518
# 2025 | 24H     | 2073

Example 2: Monthly Seasonality

# Extract month and analyze
xl transform sales.xlsx -c "DateVente" -o "Month" -f "month" --output temp.xlsx
xl group temp.xlsx --by "Month" --aggregate "Sales:sum" --sort desc

# Result: Identify peak months (e.g., April, May, June)

Example 3: Weekend vs Weekday

# Extract weekday
xl transform sales.xlsx -c "DateVente" -o "Weekday" -f "weekday" --output temp.xlsx

# Add weekend flag (requires conditional transform - future feature)
# xl transform temp.xlsx -c "Weekday" -o "IsWeekend" -f "if gt 4 1 0"

Implementation Priority

Phase 1: Critical Functions (MVP)

  • year - Most common use case
  • month - Seasonal analysis
  • quarter - Quarterly reporting

Phase 2: Highly Useful

  • weekday - Day of week analysis
  • day - Day-level analysis
  • week - Weekly reporting

Phase 3: Nice to Have

  • hour, minute - Intra-day analysis
  • year_month, year_quarter - Pre-combined formats
  • month_name, weekday_name - Human-readable formats

Related Features

This feature enables:

  • Time Series Analysis: Group by year/month/quarter
  • Seasonal Pattern Detection: Identify cyclical patterns
  • Trend Analysis: Identify upward/downward trends
  • Period-over-Period Comparison: YoY, MoM, QoQ growth
  • Business Intelligence: Time-based reporting

Workarounds

Current Workaround 1: Manual Pivot Table

# Use TCD sheet if already aggregated by year
# Time: 30 minutes manual copy-paste
# Not scalable for different date ranges

Current Workaround 2: Export to Python

import pandas as pd
df = pd.read_excel('sales.xlsx')
df['Year'] = pd.to_datetime(df['Date']).dt.year
df['Month'] = pd.to_datetime(df['Date']).dt.month
df.to_excel('sales_with_dates.xlsx')

Current Workaround 3: Use Excel Formulas

# Open in Excel
# Add formula: =YEAR(A2) in new column
# Drag down for all rows
# Save and use xl group

Benefits

  1. Time Series Analysis: Enables YoY, MoM, QoQ analysis
  2. Seasonal Insights: Identify seasonal patterns
  3. Business Intelligence: Critical for business decisions
  4. Workflow Efficiency: No need to export to Python/Excel
  5. Competitive Parity: SQL, pandas, Excel all have this

Industry Standards

SQL Equivalent

SELECT
  YEAR(Date) as Year,
  MONTH(Date) as Month,
  SUM(Sales) as Total
FROM data
GROUP BY YEAR(Date), MONTH(Date)

pandas Equivalent

df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month

Excel Equivalent

=YEAR(A2)
=MONTH(A2)

xl should match these standard capabilities.

Related Issues

  • XL-006: No growth rate calculations (depends on year extraction)
  • XL-003: No sorting in group (time series needs sorting by date)
  • XL-007: Multiple aggregations (time series with sum, count, mean)

Labels

feature-request critical time-series datetime transform analysis

Metadata

Metadata

Assignees

No one assigned

    Labels

    criticalCritical priority issues that block functionalityfeature-requestNew feature or functionality requesti18nInternationalization and localization issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions