Skip to content

neilsable/DFSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š DFSE β€” Decision-Driven Forecasting & Segmentation Engine

Python License Status

Transform data into decisions through intelligent forecasting and customer segmentation

Quick Start β€’ Features β€’ Outputs β€’ Documentation


🎯 What is DFSE?

DFSE is a production-ready analytics engine that combines time-series forecasting with customer segmentation to drive business decisions. Built with real-world applications in mind, it demonstrates end-to-end data science from raw data to actionable insights.

Why DFSE?

  • πŸš€ Decision-First: Built to answer real business questions, not just create models
  • πŸ”§ Production-Ready: Clean code, automated workflows, reproducible results
  • πŸ“ˆ Business-Focused: Demand forecasting + RFM segmentation = immediate value
  • πŸŽ“ Educational: Clear structure, well-documented, perfect for learning

✨ Features

πŸ“‰ Demand Forecasting

  • Classical time-series modeling
  • 60-day forward predictions
  • Confidence intervals included
  • Performance metrics automated

πŸ‘₯ Customer Segmentation

  • RFM (Recency, Frequency, Monetary) analysis
  • Automated segment profiling
  • Actionable customer groups
  • Clear business insights

πŸš€ How to Run This Project

⚑ Super Simple (Just 2 Steps!)

Step 1: Download the project

git clone https://github.com/neilsable/dfse.git
cd dfse

Step 2: Run it!

make run

Done! βœ… Your reports and data will appear in the reports/ and data/processed/ folders.


πŸ“‹ What Happens When You Run It?

1. πŸ”§ Sets up Python environment automatically
2. πŸ“¦ Installs all needed libraries
3. 🎲 Creates sample data (no downloads needed!)
4. πŸ€– Builds forecasts and customer segments
5. πŸ“Š Generates reports and charts
6. βœ… Saves everything to your folders

Time needed: ~2-3 minutes


πŸͺŸ Don't Have make? (Windows users)

No problem! Use this instead:

# Step 1: Download project (same as above)
git clone https://github.com/neilsable/dfse.git
cd dfse

# Step 2: Run this simple script
./run.sh

Or run it manually:

python3 -m venv .venv
source .venv/bin/activate          # macOS/Linux
.\.venv\Scripts\Activate.ps1       # Windows PowerShell
pip install -r requirements.txt
python3 -m src.pipeline

⚠️ Prerequisites

You only need:

That's it! No databases, no API keys, no complicated setup.


πŸ“¦ Where to Find Your Results

After running the project, here's where everything is saved:

πŸ“Š Reports (Human-readable insights)

reports/
β”œβ”€β”€ πŸ“„ executive_summary.md      ← Read this first! Plain English summary
└── πŸ“ˆ forecast_plot.png         ← Visual chart showing predictions

What to do with these:

  • Open executive_summary.md in any text editor
  • View forecast_plot.png to see your forecast chart

πŸ“ Data Files (For further analysis)

data/processed/
β”œβ”€β”€ πŸ“Š forecast_metrics.csv      ← How accurate is the model?
β”œβ”€β”€ πŸ“ˆ forecast_60d.csv          ← Next 60 days of predictions
β”œβ”€β”€ πŸ‘₯ rfm_segments.csv          ← Each customer's segment
└── πŸ“‹ segment_summary.csv       ← Summary of customer groups

What to do with these:

  • Open any .csv file in Excel, Google Sheets, or Python
  • Use them for presentations, dashboards, or further analysis

πŸ’‘ Example: What You'll See

In executive_summary.md:

πŸ“Š Forecast Accuracy: 94.2%
πŸ‘₯ Customer Segments Found: 4 groups
πŸ’° High-Value Customers: 127 people
πŸ“ˆ Recommended Action: Focus on "Champions" segment

In forecast_plot.png:

Example Output

❓ Troubleshooting

Problem: make: command not found
Solution: Use ./run.sh instead, or follow the manual steps above

Problem: python3: command not found
Solution: Try python instead of python3, or install Python from python.org

Problem: Permission denied when running ./run.sh
Solution: Run chmod +x run.sh first, then try again

Problem: Libraries won't install
Solution: Make sure you activated the virtual environment (the source .venv/bin/activate step)

Still stuck? Open an issue on GitHub and I'll help!


πŸ—οΈ Project Structure

dfse/
β”‚
β”œβ”€β”€ src/                     # Source code
β”‚   β”œβ”€β”€ pipeline.py          # Main forecasting pipeline
β”‚   β”œβ”€β”€ evaluation.py        # Model evaluation
β”‚   └── utils/               # Helper functions
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                 # Generated sample data
β”‚   └── processed/           # Analysis outputs
β”‚
β”œβ”€β”€ reports/                 # Generated reports
β”œβ”€β”€ assets/                  # Images and resources
β”‚
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ Makefile                 # Automation commands
└── run.sh                   # Simple run script

πŸŽ“ Use Cases

DFSE is perfect for:

  • πŸ“š Portfolio Projects: Showcase end-to-end data science skills
  • 🏒 Business Analytics: Demand planning and customer insights
  • 🎯 Learning: Understand forecasting and segmentation in practice
  • πŸ”§ Template: Starting point for real-world analytics projects

πŸ› οΈ Tech Stack

Category Technologies
Language Python 3.8+
Data Processing pandas, NumPy
Modeling statsmodels, scikit-learn
Visualization matplotlib, seaborn
Automation Make, bash scripting

πŸ’­ How I Built This (My Approach)

🎯 The Problem I Wanted to Solve

Most data science projects are either:

  • Too theoretical (just Jupyter notebooks with no real workflow)
  • Too complex (enterprise-level code that's hard to understand)

I wanted something in between β€” a project that shows real production skills but stays simple enough for anyone to learn from.


πŸ—οΈ My Design Decisions

1. Why Classical Time-Series Instead of ML?

# I chose ARIMA/Exponential Smoothing over LSTM/Prophet because:
βœ… More interpretable (you can explain WHY it predicts what it does)
βœ… Works well with limited data
βœ… Faster to train and run
βœ… Industry standard for demand forecasting

For business decisions, explainability > accuracy by 2%. Stakeholders need to trust your model.

2. Why RFM Segmentation?

# RFM = Recency, Frequency, Monetary Value
βœ… Simple enough to explain to non-technical people
βœ… Actionable (you can target segments immediately)
βœ… Proven technique used by real companies
βœ… No complex clustering algorithms to debug

I wanted to show I understand business value, not just fancy algorithms.

3. Code Structure Philosophy

src/
β”œβ”€β”€ pipeline.py       # Main logic (what happens)
β”œβ”€β”€ evaluation.py     # Quality checks (is it good?)
└── utils/            # Helpers (how we do it)

Why this structure?

  • βœ… Separates WHAT from HOW
  • βœ… Easy to test individual pieces
  • βœ… Can swap out methods without breaking everything
  • βœ… Follows production best practices

4. Automation Over Documentation

Instead of writing "Step 1: Do this, Step 2: Do that..." I built:

make run  # Just worksβ„’

Why?

  • Users don't read long instructions
  • Automation forces you to think about reproducibility
  • Shows DevOps thinking, not just data science

🧠 What I Learned Building This

Challenge What I Did Why It Matters
Data Generation Built synthetic data generator instead of using real data Shows I can create realistic test scenarios
Error Handling Added validation at each pipeline step Production code needs to fail gracefully
Output Design Created both technical (CSV) and business (MD) outputs Data scientists serve multiple audiences
Environment Setup Made it work on Mac, Linux, AND Windows Real tools need to work everywhere

πŸ”§ Technical Highlights

Here's what makes this code different:

Clean Data Pipeline

# Instead of messy notebooks, I built a clear pipeline:
raw_data β†’ validation β†’ transformation β†’ modeling β†’ evaluation β†’ reporting

Modular Functions

# Each function does ONE thing well
def calculate_rfm_score(df):
    """Takes customer data, returns RFM segments"""
    # Not: calculate_rfm_and_make_plots_and_send_email()

Type Hints & Documentation

def forecast_demand(data: pd.DataFrame, periods: int = 60) -> Dict[str, Any]:
    """
    Generate demand forecast with confidence intervals.
    
    Args:
        data: Historical demand data with datetime index
        periods: Number of future periods to forecast
        
    Returns:
        Dictionary containing forecast, metrics, and plots
    """

This isn't just "code that works" β€” it's code others can maintain.


πŸŽ“ Why I Made These Choices

As someone looking to break into data science/analytics roles, I wanted to show:

  1. I understand business needs (decision-first approach)
  2. I write production code (not just experiments)
  3. I communicate clearly (reports for stakeholders, code for developers)
  4. I think about the full lifecycle (from data to decisions)

This project represents how I actually work, not just what I know.


πŸ“ˆ Sample Output

Forecast Visualization

Actual vs Predicted demand with confidence intervals


🀝 Contributing

Contributions, issues, and feature requests are welcome!

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘€ Author

Neil Sable


⭐ Show Your Support

Give a ⭐️ if this project helped you!


Built with ❀️ for practical data science

Back to Top

About

Decision-Driven Forecasting & Segmentation Engine - Production-ready analytics combining time-series forecasting with customer segmentation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors