Transform data into decisions through intelligent forecasting and customer segmentation
Quick Start β’ Features β’ Outputs β’ Documentation
DFSE is a production-ready analytics engine that combines time-series forecasting with customer segmentation to drive business decisions. Built with real-world applications in mind, it demonstrates end-to-end data science from raw data to actionable insights.
- π Decision-First: Built to answer real business questions, not just create models
- π§ Production-Ready: Clean code, automated workflows, reproducible results
- π Business-Focused: Demand forecasting + RFM segmentation = immediate value
- π Educational: Clear structure, well-documented, perfect for learning
|
|
Step 1: Download the project
git clone https://github.com/neilsable/dfse.git
cd dfseStep 2: Run it!
make runDone! β
Your reports and data will appear in the reports/ and data/processed/ folders.
1. π§ Sets up Python environment automatically
2. π¦ Installs all needed libraries
3. π² Creates sample data (no downloads needed!)
4. π€ Builds forecasts and customer segments
5. π Generates reports and charts
6. β
Saves everything to your folders
Time needed: ~2-3 minutes
No problem! Use this instead:
# Step 1: Download project (same as above)
git clone https://github.com/neilsable/dfse.git
cd dfse
# Step 2: Run this simple script
./run.shOr run it manually:
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
.\.venv\Scripts\Activate.ps1 # Windows PowerShell
pip install -r requirements.txt
python3 -m src.pipelineYou only need:
- β Python 3.8 or newer (Download here)
- β Git (Download here)
- β 5 minutes of your time
That's it! No databases, no API keys, no complicated setup.
After running the project, here's where everything is saved:
reports/
βββ π executive_summary.md β Read this first! Plain English summary
βββ π forecast_plot.png β Visual chart showing predictions
What to do with these:
- Open
executive_summary.mdin any text editor - View
forecast_plot.pngto see your forecast chart
data/processed/
βββ π forecast_metrics.csv β How accurate is the model?
βββ π forecast_60d.csv β Next 60 days of predictions
βββ π₯ rfm_segments.csv β Each customer's segment
βββ π segment_summary.csv β Summary of customer groups
What to do with these:
- Open any
.csvfile in Excel, Google Sheets, or Python - Use them for presentations, dashboards, or further analysis
In executive_summary.md:
π Forecast Accuracy: 94.2%
π₯ Customer Segments Found: 4 groups
π° High-Value Customers: 127 people
π Recommended Action: Focus on "Champions" segment
In forecast_plot.png:
Problem: make: command not found
Solution: Use ./run.sh instead, or follow the manual steps above
Problem: python3: command not found
Solution: Try python instead of python3, or install Python from python.org
Problem: Permission denied when running ./run.sh
Solution: Run chmod +x run.sh first, then try again
Problem: Libraries won't install
Solution: Make sure you activated the virtual environment (the source .venv/bin/activate step)
Still stuck? Open an issue on GitHub and I'll help!
dfse/
β
βββ src/ # Source code
β βββ pipeline.py # Main forecasting pipeline
β βββ evaluation.py # Model evaluation
β βββ utils/ # Helper functions
β
βββ data/
β βββ raw/ # Generated sample data
β βββ processed/ # Analysis outputs
β
βββ reports/ # Generated reports
βββ assets/ # Images and resources
β
βββ requirements.txt # Python dependencies
βββ Makefile # Automation commands
βββ run.sh # Simple run script
DFSE is perfect for:
- π Portfolio Projects: Showcase end-to-end data science skills
- π’ Business Analytics: Demand planning and customer insights
- π― Learning: Understand forecasting and segmentation in practice
- π§ Template: Starting point for real-world analytics projects
| Category | Technologies |
|---|---|
| Language | Python 3.8+ |
| Data Processing | pandas, NumPy |
| Modeling | statsmodels, scikit-learn |
| Visualization | matplotlib, seaborn |
| Automation | Make, bash scripting |
Most data science projects are either:
- Too theoretical (just Jupyter notebooks with no real workflow)
- Too complex (enterprise-level code that's hard to understand)
I wanted something in between β a project that shows real production skills but stays simple enough for anyone to learn from.
# I chose ARIMA/Exponential Smoothing over LSTM/Prophet because:
β
More interpretable (you can explain WHY it predicts what it does)
β
Works well with limited data
β
Faster to train and run
β
Industry standard for demand forecastingFor business decisions, explainability > accuracy by 2%. Stakeholders need to trust your model.
# RFM = Recency, Frequency, Monetary Value
β
Simple enough to explain to non-technical people
β
Actionable (you can target segments immediately)
β
Proven technique used by real companies
β
No complex clustering algorithms to debugI wanted to show I understand business value, not just fancy algorithms.
src/
βββ pipeline.py # Main logic (what happens)
βββ evaluation.py # Quality checks (is it good?)
βββ utils/ # Helpers (how we do it)
Why this structure?
- β Separates WHAT from HOW
- β Easy to test individual pieces
- β Can swap out methods without breaking everything
- β Follows production best practices
Instead of writing "Step 1: Do this, Step 2: Do that..." I built:
make run # Just worksβ’Why?
- Users don't read long instructions
- Automation forces you to think about reproducibility
- Shows DevOps thinking, not just data science
| Challenge | What I Did | Why It Matters |
|---|---|---|
| Data Generation | Built synthetic data generator instead of using real data | Shows I can create realistic test scenarios |
| Error Handling | Added validation at each pipeline step | Production code needs to fail gracefully |
| Output Design | Created both technical (CSV) and business (MD) outputs | Data scientists serve multiple audiences |
| Environment Setup | Made it work on Mac, Linux, AND Windows | Real tools need to work everywhere |
Here's what makes this code different:
# Instead of messy notebooks, I built a clear pipeline:
raw_data β validation β transformation β modeling β evaluation β reporting# Each function does ONE thing well
def calculate_rfm_score(df):
"""Takes customer data, returns RFM segments"""
# Not: calculate_rfm_and_make_plots_and_send_email()def forecast_demand(data: pd.DataFrame, periods: int = 60) -> Dict[str, Any]:
"""
Generate demand forecast with confidence intervals.
Args:
data: Historical demand data with datetime index
periods: Number of future periods to forecast
Returns:
Dictionary containing forecast, metrics, and plots
"""This isn't just "code that works" β it's code others can maintain.
As someone looking to break into data science/analytics roles, I wanted to show:
- I understand business needs (decision-first approach)
- I write production code (not just experiments)
- I communicate clearly (reports for stakeholders, code for developers)
- I think about the full lifecycle (from data to decisions)
This project represents how I actually work, not just what I know.
Contributions, issues, and feature requests are welcome!
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Neil Sable
- GitHub: @neilsable
- LinkedIn: Neil Sable
- Email: neilsable7@gmail.com
Give a βοΈ if this project helped you!
Built with β€οΈ for practical data science
