Fastly Log Sync and Analytics

A comprehensive tool for syncing Fastly logs from S3, parsing them, and generating analytics reports.

⚠️ Security Notice

This tool processes log files that may contain sensitive information including:

IP addresses
User agents
Request paths and query parameters
Response data
Other potentially sensitive application data

Please ensure:

Log files are stored securely and access is restricted
Parsed log files are not committed to version control
Configuration files with real bucket names/paths are gitignored
Access to the dashboard and analytics is restricted to authorized personnel only

Features

Date-filtered S3 sync: Download logs for specific date ranges
Incremental sync: Only downloads new files that don't already exist locally
UTC timezone handling: All date operations use UTC to match log file organization
Log parsing: Converts syslog-style Fastly logs to structured JSON/CSV
Comprehensive analytics: Traffic patterns, error analysis, performance metrics, user agent analysis, query parameter patterns, endpoint drill-down, daily summaries, and slowness investigation
Time-based filtering: Filter logs to analyze only the last N hours (e.g., last hour, last 24 hours)
Client IP analysis: Top client IPs by request volume for security and traffic analysis
Interactive dashboard: Streamlit-based web dashboard for visualizing log analytics
Modular design: Run sync, parse, or analyze operations independently or as a pipeline
Log management: Utility script to clear all log files when needed

Prerequisites

AWS CLI: Required for S3 sync operations
- Install: https://aws.amazon.com/cli/
- Configure credentials: aws configure or set environment variables
Python 3: Required for parsing and analytics
Python packages: Install with pip install -r requirements.txt

Installation

Clone or download this repository
Install Python dependencies:
```
pip install -r requirements.txt
```
Create your configuration file:
```
cp config/log_sources.yaml.example config/log_sources.yaml
```
Then edit config/log_sources.yaml with your actual log source configurations (S3 buckets, paths, etc.). This file is gitignored and won't be committed.
Ensure AWS CLI is installed and configured with appropriate credentials

Usage

Full Pipeline (Sync → Parse → Analyze)

Run all operations in sequence:

From date to today (syncs logs from the specified date to today, inclusive):

python3 scripts/query_logs.py --date 2025-11-10

Date range (syncs logs from start date to end date, inclusive):

python3 scripts/query_logs.py --start-date 2025-11-10 --end-date 2025-11-12

Note: The --date parameter syncs logs from the specified date to today (in UTC, since logs are stored in UTC). To sync a specific date range, use --start-date and --end-date.

Timezone Note: All date operations use UTC to match the log file organization in S3. Log timestamps are preserved in UTC throughout parsing and analysis.

Time-Based Filtering

Filter logs to analyze only entries from the last N hours:

Analyze last hour from existing parsed logs (no sync needed):

python3 scripts/query_logs.py --last-hours 1.0

Sync, parse, and analyze with time filter:

python3 scripts/query_logs.py --date 2025-11-23 --last-hours 1.0

Other time filter examples:

python3 scripts/query_logs.py --last-hours 0.5   # Last 30 minutes
python3 scripts/query_logs.py --last-hours 24.0  # Last 24 hours
python3 scripts/query_logs.py --last-hours 2.5   # Last 2.5 hours

Note: When using --last-hours without dates:

If parsed logs exist, it will analyze them with the time filter
If no parsed logs exist, it will automatically sync today's logs, parse them, then analyze with the time filter

Individual Operations

Sync Only

Download logs from S3 without parsing or analyzing:

python3 scripts/query_logs.py --operation sync --start-date 2025-11-10 --end-date 2025-11-12

Or use the sync script directly:

python3 scripts/sync_logs.py --start-date 2025-11-10 --end-date 2025-11-12

Parse Only

Parse existing log files in the logs/ directory:

python3 scripts/query_logs.py --operation parse

Or use the parser directly:

python3 scripts/parse_logs.py --input-dir ./logs/example-source-1/raw --output ./logs/example-source-1/parsed/parsed_logs.json

Analyze Only

Generate analytics from already-parsed logs:

python3 scripts/query_logs.py --operation analyze --parsed-output ./logs/example-source-1/parsed/parsed_logs.json

Or use the analyzer directly:

python3 scripts/analyze_logs.py --input ./logs/example-source-1/parsed/parsed_logs.json --format console

Interactive Dashboard

Launch the Streamlit dashboard for interactive visualization:

streamlit run dashboard/dashboard.py

The dashboard will open in your browser (typically at http://localhost:8501) and provides:

Traffic Patterns: Requests over time, popular endpoints, HTTP method distribution
Error Analysis: Status code breakdown, error rates, error-prone endpoints
Performance Metrics: Cache hit/miss rates, response size statistics
User Agent Analysis: Top user agents and agent type distribution
Query Patterns: Most common query parameters and value distributions
Slowness Investigation: Cache miss patterns, large response endpoints, peak traffic times, top client IPs by request volume

New in Slowness Investigation: Top client IPs analysis helps identify:

Bots and crawlers generating high traffic
Potential abuse or DDoS patterns
Most active clients
Security investigation

You can specify a custom parsed log file path in the dashboard sidebar.

Command-Line Options

query_logs.py

--start-date DATE: Start date in YYYY-MM-DD format (required for sync)
--end-date DATE: End date in YYYY-MM-DD format (required for sync)
--date DATE: Start date in YYYY-MM-DD format (syncs from this date to today)
--operation OP: Operation to perform: sync, parse, analyze, or all (default: all)
--parsed-output FILE: Output file for parsed logs (default: first enabled source's parsed output)
--analytics-output FILE: Output file for analytics report (optional)
--last-hours HOURS: Filter to only entries from the last N hours (e.g., 1.0 for last hour). Only applies to analyze operation. If no parsed logs exist, automatically syncs today's logs first.

sync_logs.sh

--start-date DATE: Start date in YYYY-MM-DD format
--end-date DATE: End date in YYYY-MM-DD format
--date DATE: Start date in YYYY-MM-DD format (syncs from this date to today)
-h, --help: Show help message

parse_logs.py

--input-dir DIR: Directory containing log files (default: ./logs/example-source-1/raw)
--output FILE: Output file path (default: ./logs/example-source-1/parsed/parsed_logs.json)
--format FORMAT: Output format: json or csv (default: json)
--pattern PATTERN: File pattern to match (default: *.log*)

analyze_logs.py

--input FILE: Input file (parsed JSON or CSV) - required
--output FILE: Output file path (optional)
--format FORMAT: Output format: json or console (default: console)
--last-hours HOURS: Filter to only entries from the last N hours (e.g., 1.0 for last hour)

clear_logs.py

Utility script to clear all log files (raw and parsed):

--logs-dir DIR: Path to logs directory (default: ./logs)
--yes, -y: Skip confirmation prompt

Examples:

python3 scripts/clear_logs.py              # Clear with confirmation
python3 scripts/clear_logs.py --yes        # Clear without confirmation
python3 scripts/clear_logs.py -y           # Short form

Log Format

The tool parses Fastly logs in syslog format. All timestamps are in UTC (indicated by the 'Z' suffix):

<priority>timestamp cache-server process[pid]: IP "-" "-" date "METHOD path" status size "-" "user-agent" cache-status

Example:

<134>2025-11-09T23:57:35Z cache-server-001 s3logsprod[254840]: 192.0.2.1 "-" "-" Sun, 09 Nov 2025 23:57:35 GMT "GET /api/endpoint?param=value" 200 18508 "-" "Mozilla/5.0..." hit

Note: The example above uses dummy IP addresses (192.0.2.1 is a reserved test IP) and generic paths. Real log files will contain actual client IPs and request paths.

Output Structure

Parsed Logs (JSON)

Each log entry is parsed into structured fields:

{
  "priority": 134,
  "timestamp": "2025-11-09T23:57:35",
  "cache_server": "cache-server-001",
  "ip_address": "192.0.2.1",
  "http_method": "GET",
  "path": "/api/endpoint",
  "query_string": "param=value",
  "query_params": { "param": "value" },
  "status_code": 200,
  "response_size": 18508,
  "user_agent": "Mozilla/5.0...",
  "cache_status": "hit"
}

Note: The example above uses dummy data. Real parsed logs will contain actual IP addresses, paths, and query parameters from your log files.

Analytics Report

The analytics report includes:

Traffic Patterns: Total requests, requests per hour/day/minute, popular endpoints, HTTP method distribution, peak traffic detection
Error Analysis: Status code distribution, 4xx/5xx error rates, error-prone endpoints, hourly error breakdowns
Performance Metrics: Cache hit/miss rates, response size statistics, hourly cache performance, hourly response size trends
User Agent Analysis: Top user agents, agent type distribution
Query Patterns: Most common query parameters, parameter value distributions, top query signatures
Slowness Investigation: Traffic spikes, cache miss patterns, large response endpoints, peak traffic times, rate of change analysis, top client IPs by request volume
Endpoint Drill-Down: Detailed analysis for specific endpoints (time patterns, errors, cache, query params)
Daily Summaries: Daily request totals with status code breakdown by day

File Structure

fastly_log_query/
├── config/
│   ├── log_sources.yaml.example  # Example configuration template
│   └── log_sources.yaml           # Log source configurations (gitignored, copy from .example)
├── logs/                    # Local log storage (created automatically, gitignored)
│   └── example-source-1/
│       ├── raw/             # Raw log files from S3 (gitignored)
│       └── parsed/          # Parsed log files (JSON/CSV, gitignored)
├── dashboard/
│   └── dashboard.py        # Streamlit interactive dashboard
├── scripts/
│   ├── query_logs.py        # Main orchestration script
│   ├── sync_logs.py         # S3 sync script
│   ├── parse_logs.py        # Log parser
│   ├── analyze_logs.py      # Analytics engine
│   └── clear_logs.py         # Utility to clear all log files
├── src/                     # Source code modules
│   ├── sync/                # Sync implementations
│   ├── parse/               # Parsing logic
│   ├── analyze/             # Analysis functions
│   └── utils/               # Utility functions
├── requirements.txt         # Python dependencies
└── README.md               # This file

Troubleshooting

AWS Credentials Not Configured

If you see an error about AWS credentials:

aws configure

Or set environment variables:

export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=ap-southeast-2

Python Dependencies Missing

Install required packages:

pip install -r requirements.txt

No Log Files Found

Ensure:

The sync operation completed successfully
Log files are in the logs/ directory
Files match the expected naming pattern (e.g., 2025-11-10T00:00:00.000--*.log.gz)

Security and Privacy

Important considerations when working with log data:

Log files contain sensitive data: IP addresses, user agents, request paths, and potentially sensitive query parameters
Storage: Ensure log files are stored securely with appropriate access controls
Version control: Never commit log files, parsed logs, or configuration files with real credentials/bucket names to git
Access control: Restrict access to the dashboard and analytics output to authorized personnel only
Data retention: Consider implementing data retention policies for log files
Compliance: Ensure log processing complies with your organization's data privacy policies and regulations

The .gitignore file is configured to exclude:

config/log_sources.yaml (your actual configuration)
logs/ directory (all log files)
Parsed log outputs

Examples

Example 1: Analyze logs from a date to today

python3 scripts/query_logs.py --date 2025-11-10  # Syncs from 2025-11-10 to today

Example 2: Sync logs for a week, then analyze separately

# Sync
python3 scripts/query_logs.py --operation sync --start-date 2025-11-10 --end-date 2025-11-16

# Parse
python3 scripts/query_logs.py --operation parse

# Analyze with custom output
python3 scripts/query_logs.py --operation analyze --analytics-output ./reports/weekly_report.json

Example 3: Parse and analyze existing logs

# Parse
python3 scripts/parse_logs.py --input-dir ./logs/example-source-1/raw --output ./logs/example-source-1/parsed/parsed.json --format json

# Analyze
python3 scripts/analyze_logs.py --input ./logs/example-source-1/parsed/parsed.json --format console --output ./reports/analysis.txt

Example 4: Use the interactive dashboard

# First, parse your logs (if not already done)
python3 scripts/parse_logs.py --input-dir ./logs/example-source-1/raw --output ./logs/example-source-1/parsed/parsed_logs.json

# Then launch the dashboard
streamlit run dashboard/dashboard.py

The dashboard will automatically load the parsed logs and display interactive visualizations. You can change the log file path in the sidebar if needed.

Example 5: Analyze last hour of logs

# Analyze last hour from existing parsed logs (no sync needed)
python3 scripts/query_logs.py --last-hours 1.0

# Or sync today's logs and analyze last hour
python3 scripts/query_logs.py --date 2025-11-23 --last-hours 1.0

# Analyze last 30 minutes
python3 scripts/query_logs.py --last-hours 0.5

Example 6: Clear all logs

# Clear all log files with confirmation
python3 scripts/clear_logs.py

# Clear without confirmation
python3 scripts/clear_logs.py --yes

License

This tool is provided as-is for internal use.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
.vscode		.vscode
config		config
dashboard		dashboard
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

cloudbuildlab/python-fastly-log-query

Folders and files

Latest commit

History

Repository files navigation

Fastly Log Sync and Analytics

⚠️ Security Notice

Features

Prerequisites

Installation

Usage

Full Pipeline (Sync → Parse → Analyze)

Time-Based Filtering

Individual Operations

Sync Only

Parse Only

Analyze Only

Interactive Dashboard

Command-Line Options

query_logs.py

sync_logs.sh

parse_logs.py

analyze_logs.py

clear_logs.py

Log Format

Output Structure

Parsed Logs (JSON)

Analytics Report

File Structure

Troubleshooting

AWS Credentials Not Configured

Python Dependencies Missing

No Log Files Found

Security and Privacy

Examples

Example 1: Analyze logs from a date to today

Example 2: Sync logs for a week, then analyze separately

Example 3: Parse and analyze existing logs

Example 4: Use the interactive dashboard

Example 5: Analyze last hour of logs

Example 6: Clear all logs

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages