Skip to content

Latest commit

 

History

History
338 lines (263 loc) · 9.73 KB

File metadata and controls

338 lines (263 loc) · 9.73 KB
tools
cache-memory bash
true
*
network
allowed
defaults
python
safe-outputs
upload-asset
max allowed-exts
5
.png
.jpg
.jpeg
.svg
steps
name run
Setup Python environment
# Create working directory for Python scripts mkdir -p /tmp/gh-aw/python mkdir -p /tmp/gh-aw/python/data mkdir -p /tmp/gh-aw/python/charts mkdir -p /tmp/gh-aw/python/artifacts echo "Python environment setup complete" echo "Working directory: /tmp/gh-aw/python" echo "Data directory: /tmp/gh-aw/python/data" echo "Charts directory: /tmp/gh-aw/python/charts" echo "Artifacts directory: /tmp/gh-aw/python/artifacts"
name run
Install Python scientific libraries
# Create a virtual environment for proper package isolation (avoids --break-system-packages) if [ ! -d /tmp/gh-aw/venv ]; then python3 -m venv /tmp/gh-aw/venv fi echo "/tmp/gh-aw/venv/bin" >> "$GITHUB_PATH" /tmp/gh-aw/venv/bin/pip install --quiet numpy pandas matplotlib seaborn scipy # Verify installations /tmp/gh-aw/venv/bin/python3 -c "import numpy; print(f'NumPy {numpy.__version__} installed')" /tmp/gh-aw/venv/bin/python3 -c "import pandas; print(f'Pandas {pandas.__version__} installed')" /tmp/gh-aw/venv/bin/python3 -c "import matplotlib; print(f'Matplotlib {matplotlib.__version__} installed')" /tmp/gh-aw/venv/bin/python3 -c "import seaborn; print(f'Seaborn {seaborn.__version__} installed')" /tmp/gh-aw/venv/bin/python3 -c "import scipy; print(f'SciPy {scipy.__version__} installed')" echo "All scientific libraries installed successfully"
name if uses with
Upload source files and data
always()
actions/upload-artifact@v7.0.1
name path if-no-files-found retention-days
python-source-and-data
/tmp/gh-aw/python/*.py /tmp/gh-aw/python/data/*
warn
30

Python Data Visualization Guide

Python scientific libraries have been installed and are ready for use. A temporary folder structure has been created at /tmp/gh-aw/python/ for organizing scripts, data, and outputs.

Installed Libraries

  • NumPy: Array processing and numerical operations
  • Pandas: Data manipulation and analysis
  • Matplotlib: Chart generation and plotting
  • Seaborn: Statistical data visualization
  • SciPy: Scientific computing utilities

Directory Structure

/tmp/gh-aw/python/
├── data/          # Store all data files here (CSV, JSON, etc.)
├── charts/        # Generated chart images (PNG)
├── artifacts/     # Additional output files
└── *.py           # Python scripts

Data Separation Requirement

CRITICAL: Data must NEVER be inlined in Python code. Always store data in external files and load using pandas.

❌ PROHIBITED - Inline Data

# DO NOT do this
data = [10, 20, 30, 40, 50]
labels = ['A', 'B', 'C', 'D', 'E']

✅ REQUIRED - External Data Files

# Always load data from external files
import pandas as pd

# Load data from CSV
data = pd.read_csv('/tmp/gh-aw/python/data/data.csv')

# Or from JSON
data = pd.read_json('/tmp/gh-aw/python/data/data.json')

Chart Generation Best Practices

High-Quality Chart Settings

import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better aesthetics
sns.set_style("whitegrid")
sns.set_palette("husl")

# Create figure with high DPI
fig, ax = plt.subplots(figsize=(10, 6), dpi=300)

# Your plotting code here
# ...

# Save with high quality
plt.savefig('/tmp/gh-aw/python/charts/chart.png', 
            dpi=300, 
            bbox_inches='tight',
            facecolor='white',
            edgecolor='none')

Chart Quality Guidelines

  • DPI: Use 300 or higher for publication quality
  • Figure Size: Standard is 10x6 inches (adjustable based on needs)
  • Labels: Always include clear axis labels and titles
  • Legend: Add legends when plotting multiple series
  • Grid: Enable grid lines for easier reading
  • Colors: Use colorblind-friendly palettes (seaborn defaults are good)

Including Images in Reports

Use this approach to include chart images in reports (issues, discussions, step summaries):

Upload Asset (Recommended)

Use the upload_asset safe output tool to upload individual chart images. The tool returns a persistent asset URL that can be embedded directly in markdown.

Step 1: Generate Chart

# Generate your chart
plt.savefig('/tmp/gh-aw/python/charts/my_chart.png', dpi=300, bbox_inches='tight')

Step 2: Upload as Asset

Use the upload_asset tool to upload the chart file and get an embeddable asset URL:

{ "type": "upload_asset", "path": "/tmp/gh-aw/python/charts/my_chart.png" }

The tool returns a URL to the uploaded asset.

Step 3: Render in Markdown

Use the asset URL in markdown to render the image inline:

## Visualization Results

![Chart Description](ASSET_URL_FROM_UPLOAD)

The chart above shows...

The asset URL points to the published asset file in the configured assets branch.

Note: Asset URLs are persistent and suitable for embedding in issues, pull requests, and discussions.

Cache Memory Integration

The cache memory at /tmp/gh-aw/cache-memory/ is available for storing reusable code:

Helper Functions to Cache:

  • Data loading utilities: data_loader.py
  • Chart styling functions: chart_utils.py
  • Common data transformations: transforms.py

Check Cache Before Creating:

# Check if helper exists in cache
if [ -f /tmp/gh-aw/cache-memory/data_loader.py ]; then
  cp /tmp/gh-aw/cache-memory/data_loader.py /tmp/gh-aw/python/
  echo "Using cached data_loader.py"
fi

Save to Cache for Future Runs:

# Save useful helpers to cache
cp /tmp/gh-aw/python/data_loader.py /tmp/gh-aw/cache-memory/
echo "Saved data_loader.py to cache for future runs"

Complete Example Workflow

#!/usr/bin/env python3
"""
Example data visualization script
Generates a bar chart from external data
"""
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style("whitegrid")
sns.set_palette("husl")

# Load data from external file (NEVER inline)
data = pd.read_csv('/tmp/gh-aw/python/data/data.csv')

# Process data
summary = data.groupby('category')['value'].sum()

# Create chart
fig, ax = plt.subplots(figsize=(10, 6), dpi=300)
summary.plot(kind='bar', ax=ax)

# Customize
ax.set_title('Data Summary by Category', fontsize=16, fontweight='bold')
ax.set_xlabel('Category', fontsize=12)
ax.set_ylabel('Value', fontsize=12)
ax.grid(True, alpha=0.3)

# Save chart
plt.savefig('/tmp/gh-aw/python/charts/chart.png',
            dpi=300,
            bbox_inches='tight',
            facecolor='white')

print("Chart saved to /tmp/gh-aw/python/charts/chart.png")

Error Handling

Check File Existence:

import os

data_file = '/tmp/gh-aw/python/data/data.csv'
if not os.path.exists(data_file):
    raise FileNotFoundError(f"Data file not found: {data_file}")

Validate Data:

# Check for required columns
required_cols = ['category', 'value']
missing = set(required_cols) - set(data.columns)
if missing:
    raise ValueError(f"Missing columns: {missing}")

Asset Upload

Chart images are uploaded individually via the upload_asset safe-output tool. Each image gets a persistent URL for inline rendering.

Chart Image Upload:

  • Tool: upload_asset (safe-output)
  • Config: up to 5 uploads per run with image-only extensions
  • Allowed: PNG, JPG, JPEG, SVG files
  • Persistence: stored as persistent assets (not 30-day artifacts)
  • Returns: persistent asset URL with direct link

Source and Data Artifact:

  • Name: python-source-and-data
  • Contents: Python scripts and data files
  • Retention: 30 days

Source and data files are uploaded with if: always() condition, ensuring they're available even if the workflow fails.

Tips for Success

  1. Always Separate Data: Store data in files, never inline in code
  2. Use Cache Memory: Store reusable helpers for faster execution
  3. High Quality Charts: Use DPI 300+ and proper sizing
  4. Clear Documentation: Add docstrings and comments
  5. Error Handling: Validate data and check file existence
  6. Type Hints: Use type annotations for better code quality
  7. Seaborn Defaults: Leverage seaborn for better aesthetics
  8. Reproducibility: Set random seeds when needed

Common Data Sources

Based on common use cases:

Repository Statistics:

# Collect via GitHub API, save to data.csv
# Then load and visualize
data = pd.read_csv('/tmp/gh-aw/python/data/repo_stats.csv')

Workflow Metrics:

# Collect via GitHub Actions API, save to data.json
data = pd.read_json('/tmp/gh-aw/python/data/workflow_metrics.json')

Sample Data Generation:

# Generate with NumPy, save to file first
import numpy as np
data = np.random.randn(100, 2)
df = pd.DataFrame(data, columns=['x', 'y'])
df.to_csv('/tmp/gh-aw/python/data/sample_data.csv', index=False)

# Then load it back (demonstrating the pattern)
data = pd.read_csv('/tmp/gh-aw/python/data/sample_data.csv')