diff --git a/.github/skills/README.md b/.github/skills/README.md index fc482d4..bd42ea2 100644 --- a/.github/skills/README.md +++ b/.github/skills/README.md @@ -14,6 +14,25 @@ Agent Skills are directories containing a `SKILL.md` file and optional supportin ## Available Skills +### azure-storage-loader + +**Purpose**: Load token usage data from Azure Table Storage for faster iteration and analysis. + +**Use this skill when:** +- Analyzing actual usage data without manual export +- Testing query logic against real backend data +- Debugging backend sync issues with live data +- Performing ad-hoc team analytics +- Quickly iterating on data analysis tasks in chat + +**Contents:** +- Helper script to fetch data from Azure Storage Tables +- Support for both Entra ID and Shared Key authentication +- Flexible filtering by date, model, workspace, or user +- JSON and CSV output formats +- Azure Table Storage schema documentation +- Authentication and troubleshooting guides + ### copilot-log-analysis **Purpose**: Comprehensive guide for analyzing GitHub Copilot session log files. @@ -32,6 +51,16 @@ Agent Skills are directories containing a `SKILL.md` file and optional supportin - Schema documentation references - Usage examples and troubleshooting guides +### refresh-json-data + +**Purpose**: Update token estimator and model pricing JSON files with latest data. + +**Use this skill when:** +- Adding support for new AI models +- Updating token estimation ratios +- Refreshing pricing information from provider APIs +- Keeping model data current with latest releases + ### load-cache-data **Purpose**: Load and inspect the last 10 rows from the local session file cache to iterate with real data. diff --git a/.github/skills/azure-storage-loader/.gitignore b/.github/skills/azure-storage-loader/.gitignore new file mode 100644 index 0000000..a22855b --- /dev/null +++ b/.github/skills/azure-storage-loader/.gitignore @@ -0,0 +1,5 @@ +node_modules/ +package-lock.json +*.log +*.json.tmp +*.csv.tmp diff --git a/.github/skills/azure-storage-loader/README.md b/.github/skills/azure-storage-loader/README.md new file mode 100644 index 0000000..cbdc318 --- /dev/null +++ b/.github/skills/azure-storage-loader/README.md @@ -0,0 +1,69 @@ +# Azure Storage Loader Skill + +Load token usage data from Azure Table Storage for analysis in chat conversations. + +## Quick Start + +```bash +# Install dependencies +npm install + +# Load data (using Entra ID auth) +node load-table-data.js \ + --storageAccount "youraccount" \ + --tableName "usageAggDaily" \ + --datasetId "default" \ + --startDate "2026-01-01" \ + --endDate "2026-01-30" + +# Output to file +node load-table-data.js \ + --storageAccount "youraccount" \ + --startDate "2026-01-01" \ + --endDate "2026-01-30" \ + --output "usage-data.json" + +# Get help +node load-table-data.js --help +``` + +## Files + +- **SKILL.md**: Complete skill documentation with examples and troubleshooting +- **load-table-data.js**: Helper script to fetch data from Azure Storage Tables +- **example-usage.js**: Example script demonstrating data loading and analysis +- **package.json**: Node.js dependencies + +## Authentication + +### Entra ID (Default) +Authenticate using one of these methods: +- Azure CLI: `az login` +- VS Code: Sign in via Azure extension +- Environment variables + +### Shared Key +Use `--sharedKey` parameter to provide storage account key. + +## Common Use Cases + +1. **Quick Analysis**: Load recent data for ad-hoc queries +2. **Model Comparison**: Compare token usage across different AI models +3. **Team Analytics**: Analyze per-user or per-workspace usage +4. **Cost Estimation**: Calculate usage costs with pricing data + +## Documentation + +See **SKILL.md** for: +- Complete parameter reference +- Azure Table Storage schema details +- Authentication setup +- Advanced filtering examples +- Troubleshooting guide +- Security best practices + +## Requirements + +- Node.js 14 or later +- Azure Storage account with token usage data +- Appropriate Azure permissions (Storage Table Data Reader or Contributor) diff --git a/.github/skills/azure-storage-loader/SKILL.md b/.github/skills/azure-storage-loader/SKILL.md new file mode 100644 index 0000000..d475683 --- /dev/null +++ b/.github/skills/azure-storage-loader/SKILL.md @@ -0,0 +1,326 @@ +--- +name: azure-storage-loader +description: Load token usage data from Azure Table Storage for faster iteration and analysis in chat conversations +--- + +# Azure Storage Loader Skill + +This skill enables you to load actual token usage data from Azure Table Storage into your chat conversations. This allows for faster iteration when analyzing usage patterns, testing queries, or debugging issues without needing to sync data from local session files. + +## Overview + +The Copilot Token Tracker extension can sync token usage data to Azure Table Storage. This skill provides helper scripts to: +- Query and fetch data from Azure Storage Tables +- Load data into a usable format for chat analysis +- Authenticate using Azure credentials (Entra ID or Shared Key) +- Filter data by date range, dataset, model, workspace, or user + +## When to Use This Skill + +Use this skill when you need to: +- Analyze actual usage data patterns without manual export +- Test query logic against real data +- Debug backend sync issues with live data +- Perform ad-hoc analysis of token usage across teams +- Validate data transformations or aggregations +- Quickly iterate on data analysis tasks in chat + +## Prerequisites + +Before using this skill, ensure you have: +- Azure Storage account with token usage data already synced +- Azure credentials configured (either Entra ID or Shared Key) +- Node.js installed for running helper scripts +- Access to the storage account and table (read permissions minimum) + +## Azure Table Storage Schema + +The extension stores daily aggregate data in Azure Tables with the following schema: + +### Table Name +Default: `usageAggDaily` (configurable via `copilotTokenTracker.backend.aggTable`) + +### Entity Structure + +**Partition Key**: `ds:{datasetId}|d:{YYYY-MM-DD}` +- Groups entities by dataset and day for efficient queries + +**Row Key**: `m:{model}|w:{workspaceId}|mc:{machineId}|u:{userId}` +- Unique identifier for each model/workspace/machine/user combination + +**Fields**: +- `schemaVersion` (number): Schema version for compatibility +- `datasetId` (string): Logical dataset identifier +- `day` (string): Date in YYYY-MM-DD format +- `model` (string): AI model name (e.g., "gpt-4", "claude-3-5-sonnet-20241022") +- `workspaceId` (string): Workspace identifier (sanitized) +- `workspaceName` (string, optional): Human-readable workspace name +- `machineId` (string): Machine identifier (sanitized) +- `machineName` (string, optional): Human-readable machine name +- `userId` (string, optional): User identifier (if team sharing enabled) +- `userKeyType` (string, optional): Type of user identifier (pseudonymous/teamAlias/entraObjectId) +- `shareWithTeam` (boolean, optional): Whether data is shared with team +- `consentAt` (string, optional): ISO timestamp of consent +- `inputTokens` (number): Total input tokens for this dimension +- `outputTokens` (number): Total output tokens for this dimension +- `interactions` (number): Total interactions count +- `updatedAt` (string): ISO timestamp of last update + +### Sanitization Rules + +Azure Tables disallow certain characters in PartitionKey/RowKey: `/`, `\`, `#`, `?` +These are replaced with `_` by the `sanitizeTableKey()` function in `src/backend/storageTables.ts`. + +## Authentication Methods + +### Option 1: Entra ID (Recommended) + +Uses DefaultAzureCredential for authentication: +- Azure CLI: `az login` +- VS Code: Sign in via Azure extension +- Environment variables: `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET` +- Managed Identity (when running in Azure) + +**Required RBAC Roles**: +- `Storage Table Data Reader` (read-only) +- `Storage Table Data Contributor` (read/write) + +### Option 2: Shared Key + +Uses account access key stored in VS Code SecretStorage: +- Set via command: "Copilot Token Tracker: Set Backend Storage Shared Key" +- Does not sync across devices +- Requires account key from Azure Portal + +## Helper Script: `load-table-data.js` + +### Purpose +Fetch token usage data from Azure Table Storage and output as JSON for analysis. + +### Usage + +```bash +# Navigate to skill directory +cd .github/skills/azure-storage-loader + +# Install dependencies (first time only) +npm install + +# Load data with Entra ID auth +node load-table-data.js \ + --storageAccount "youraccount" \ + --tableName "usageAggDaily" \ + --datasetId "default" \ + --startDate "2026-01-01" \ + --endDate "2026-01-30" + +# Load data with Shared Key auth +node load-table-data.js \ + --storageAccount "youraccount" \ + --tableName "usageAggDaily" \ + --datasetId "default" \ + --startDate "2026-01-01" \ + --endDate "2026-01-30" \ + --sharedKey "your-account-key" + +# Filter by specific model +node load-table-data.js \ + --storageAccount "youraccount" \ + --tableName "usageAggDaily" \ + --datasetId "default" \ + --startDate "2026-01-01" \ + --endDate "2026-01-30" \ + --model "gpt-4o" + +# Output to file +node load-table-data.js \ + --storageAccount "youraccount" \ + --tableName "usageAggDaily" \ + --datasetId "default" \ + --startDate "2026-01-01" \ + --endDate "2026-01-30" \ + --output "usage-data.json" +``` + +### Parameters + +- `--storageAccount` (required): Azure Storage account name +- `--tableName` (optional): Table name (default: "usageAggDaily") +- `--datasetId` (optional): Dataset identifier (default: "default") +- `--startDate` (required): Start date in YYYY-MM-DD format +- `--endDate` (required): End date in YYYY-MM-DD format +- `--model` (optional): Filter by specific model name +- `--workspaceId` (optional): Filter by specific workspace ID +- `--userId` (optional): Filter by specific user ID +- `--sharedKey` (optional): Azure Storage account key (if not using Entra ID) +- `--output` (optional): Output file path (default: stdout) +- `--format` (optional): Output format: "json" or "csv" (default: "json") + +### Output Format + +JSON array of entities: +```json +[ + { + "partitionKey": "ds:default|d:2026-01-16", + "rowKey": "m:gpt-4o|w:workspace123|mc:machine456|u:user789", + "schemaVersion": 3, + "datasetId": "default", + "day": "2026-01-16", + "model": "gpt-4o", + "workspaceId": "workspace123", + "workspaceName": "MyProject", + "machineId": "machine456", + "machineName": "MyLaptop", + "userId": "user789", + "userKeyType": "pseudonymous", + "inputTokens": 1500, + "outputTokens": 800, + "interactions": 25, + "updatedAt": "2026-01-16T23:59:59.999Z" + } +] +``` + +CSV format (when `--format csv` is used): +```csv +day,model,workspaceId,workspaceName,machineId,machineName,userId,userKeyType,inputTokens,outputTokens,interactions,updatedAt +2026-01-16,gpt-4o,workspace123,MyProject,machine456,MyLaptop,user789,pseudonymous,1500,800,25,2026-01-16T23:59:59.999Z +``` + +## Usage Examples + +### Example 1: Basic Data Loading + +```javascript +// In a chat conversation: +// "Load the last 7 days of token usage data from Azure" + +// Run the helper script: +node load-table-data.js \ + --storageAccount "mycopilotusage" \ + --datasetId "team-alpha" \ + --startDate "2026-01-23" \ + --endDate "2026-01-30" + +// Analyze the output in the conversation +``` + +### Example 2: Model Comparison + +```javascript +// "Compare GPT-4 vs Claude usage for January" + +// Load GPT-4 data +node load-table-data.js \ + --storageAccount "mycopilotusage" \ + --datasetId "team-alpha" \ + --startDate "2026-01-01" \ + --endDate "2026-01-31" \ + --model "gpt-4o" \ + --output "gpt4-jan.json" + +// Load Claude data +node load-table-data.js \ + --storageAccount "mycopilotusage" \ + --datasetId "team-alpha" \ + --startDate "2026-01-01" \ + --endDate "2026-01-31" \ + --model "claude-3-5-sonnet-20241022" \ + --output "claude-jan.json" + +// Compare in chat using the JSON files +``` + +### Example 3: Team Analytics + +```javascript +// "Show me per-user token usage for our team this month" + +node load-table-data.js \ + --storageAccount "mycopilotusage" \ + --datasetId "team-alpha" \ + --startDate "2026-01-01" \ + --endDate "2026-01-31" \ + --output "team-usage.json" + +// In chat, analyze the userId field to aggregate per-user totals +``` + +### Example 4: Cost Analysis + +```javascript +// "Calculate the estimated cost of our Copilot usage" + +node load-table-data.js \ + --storageAccount "mycopilotusage" \ + --datasetId "team-alpha" \ + --startDate "2026-01-01" \ + --endDate "2026-01-31" \ + --output "usage-for-costing.json" + +// Use model pricing data (src/modelPricing.json) to calculate costs +// Group by model, multiply tokens by pricing rates +``` + +## Integration with Extension Code + +The helper script uses the same Azure SDK packages as the extension: +- `@azure/data-tables`: Table Storage operations +- `@azure/identity`: Authentication via DefaultAzureCredential + +Key extension modules referenced: +- `src/backend/storageTables.ts`: Entity schema and query functions +- `src/backend/services/dataPlaneService.ts`: Table client creation and operations +- `src/backend/constants.ts`: Schema versions and constants + +## Troubleshooting + +### Authentication Errors + +**Problem**: "Missing Azure RBAC data-plane permissions" +**Solution**: Ensure you have `Storage Table Data Reader` or `Storage Table Data Contributor` role assigned + +**Problem**: "SharedKeyCredential is not authorized" +**Solution**: Verify the shared key is correct and has not been rotated + +### Data Not Found + +**Problem**: No entities returned +**Solution**: +- Verify the datasetId matches your configuration +- Check that data has been synced (enable backend in extension settings) +- Confirm the date range is correct +- Check that the table name matches (default: "usageAggDaily") + +### Query Timeouts + +**Problem**: Queries timing out with large date ranges +**Solution**: +- Reduce the date range (max 90 days recommended) +- Use pagination if loading large datasets +- Filter by model or workspace to reduce result set + +## Security Considerations + +- **Shared Keys**: Never commit shared keys to source control +- **User Data**: Respect team sharing consent settings +- **Data Retention**: Follow your organization's data retention policies +- **Access Control**: Use least-privilege RBAC roles when possible +- **Audit Logs**: Enable Azure Storage logs for compliance + +## Related Files + +- `src/backend/storageTables.ts`: Core table operations and schema +- `src/backend/services/dataPlaneService.ts`: Table client and query service +- `src/backend/services/queryService.ts`: Query caching and filtering +- `src/backend/constants.ts`: Schema versions and configuration +- `src/backend/types.ts`: TypeScript type definitions +- `package.json`: Azure SDK dependencies + +## Additional Resources + +- [Azure Table Storage Documentation](https://learn.microsoft.com/azure/storage/tables/) +- [Azure SDK for JavaScript](https://github.com/Azure/azure-sdk-for-js) +- [DefaultAzureCredential](https://learn.microsoft.com/javascript/api/@azure/identity/defaultazurecredential) +- [VS Code Extension Settings](../../../README.md#backend-configuration) diff --git a/.github/skills/azure-storage-loader/example-usage.js b/.github/skills/azure-storage-loader/example-usage.js new file mode 100755 index 0000000..2bccf80 --- /dev/null +++ b/.github/skills/azure-storage-loader/example-usage.js @@ -0,0 +1,165 @@ +#!/usr/bin/env node + +/** + * Example: Load and analyze token usage data from Azure Storage + * + * This example demonstrates how to use the azure-storage-loader skill + * to fetch data and perform basic analysis. + * + * Prerequisites: + * - Azure Storage account with token usage data + * - Azure credentials configured (az login or env vars) + * - Node.js and npm installed + * + * Usage: + * node example-usage.js + * + * Example: + * node example-usage.js mycopilotusage 2026-01-01 2026-01-31 + */ + +const { execSync } = require('child_process'); +const fs = require('fs'); +const path = require('path'); +const os = require('os'); + +// Parse command line arguments +const args = process.argv.slice(2); +if (args.length < 3) { + console.error('Usage: node example-usage.js '); + console.error('Example: node example-usage.js mycopilotusage 2026-01-01 2026-01-31'); + process.exit(1); +} + +const [storageAccount, startDate, endDate] = args; + +console.log('Azure Storage Loader - Example Usage'); +console.log('=====================================\n'); + +// Step 1: Load data from Azure Storage +console.log('Step 1: Loading data from Azure Storage...'); +console.log(` Storage Account: ${storageAccount}`); +console.log(` Date Range: ${startDate} to ${endDate}\n`); + +const tempFile = path.join(os.tmpdir(), `usage-data-${Date.now()}.json`); + +try { + execSync( + `node load-table-data.js --storageAccount ${storageAccount} --startDate ${startDate} --endDate ${endDate} --output ${tempFile}`, + { stdio: 'inherit' } + ); + + // Step 2: Load and parse the data + console.log('\nStep 2: Analyzing data...\n'); + const data = JSON.parse(fs.readFileSync(tempFile, 'utf8')); + + // Step 3: Perform analysis + console.log('=== Summary Report ===\n'); + + // Total tokens + const totals = data.reduce( + (acc, item) => { + acc.inputTokens += item.inputTokens || 0; + acc.outputTokens += item.outputTokens || 0; + acc.interactions += item.interactions || 0; + return acc; + }, + { inputTokens: 0, outputTokens: 0, interactions: 0 } + ); + + console.log(`Total Records: ${data.length}`); + console.log(`Total Input Tokens: ${totals.inputTokens.toLocaleString()}`); + console.log(`Total Output Tokens: ${totals.outputTokens.toLocaleString()}`); + console.log(`Total Tokens: ${(totals.inputTokens + totals.outputTokens).toLocaleString()}`); + console.log(`Total Interactions: ${totals.interactions.toLocaleString()}`); + + // Group by model + console.log('\n=== Usage by Model ===\n'); + const byModel = {}; + data.forEach(item => { + const model = item.model || 'unknown'; + if (!byModel[model]) { + byModel[model] = { inputTokens: 0, outputTokens: 0, interactions: 0 }; + } + byModel[model].inputTokens += item.inputTokens || 0; + byModel[model].outputTokens += item.outputTokens || 0; + byModel[model].interactions += item.interactions || 0; + }); + + Object.entries(byModel) + .sort((a, b) => (b[1].inputTokens + b[1].outputTokens) - (a[1].inputTokens + a[1].outputTokens)) + .forEach(([model, stats]) => { + const totalTokens = stats.inputTokens + stats.outputTokens; + console.log(`${model}:`); + console.log(` Input: ${stats.inputTokens.toLocaleString()}`); + console.log(` Output: ${stats.outputTokens.toLocaleString()}`); + console.log(` Total: ${totalTokens.toLocaleString()}`); + console.log(` Interactions: ${stats.interactions.toLocaleString()}`); + console.log(''); + }); + + // Group by day + console.log('=== Usage by Day (Top 5) ===\n'); + const byDay = {}; + data.forEach(item => { + const day = item.day || 'unknown'; + if (!byDay[day]) { + byDay[day] = { inputTokens: 0, outputTokens: 0, interactions: 0 }; + } + byDay[day].inputTokens += item.inputTokens || 0; + byDay[day].outputTokens += item.outputTokens || 0; + byDay[day].interactions += item.interactions || 0; + }); + + Object.entries(byDay) + .sort((a, b) => (b[1].inputTokens + b[1].outputTokens) - (a[1].inputTokens + a[1].outputTokens)) + .slice(0, 5) + .forEach(([day, stats]) => { + const totalTokens = stats.inputTokens + stats.outputTokens; + console.log(`${day}: ${totalTokens.toLocaleString()} tokens, ${stats.interactions.toLocaleString()} interactions`); + }); + + // Group by workspace (if available) + const workspaces = [...new Set(data.map(item => item.workspaceId).filter(Boolean))]; + if (workspaces.length > 1) { + console.log('\n=== Usage by Workspace (Top 5) ===\n'); + const byWorkspace = {}; + data.forEach(item => { + const ws = item.workspaceId || 'unknown'; + const wsName = item.workspaceName || ws; + if (!byWorkspace[ws]) { + byWorkspace[ws] = { name: wsName, inputTokens: 0, outputTokens: 0, interactions: 0 }; + } + byWorkspace[ws].inputTokens += item.inputTokens || 0; + byWorkspace[ws].outputTokens += item.outputTokens || 0; + byWorkspace[ws].interactions += item.interactions || 0; + }); + + Object.entries(byWorkspace) + .sort((a, b) => (b[1].inputTokens + b[1].outputTokens) - (a[1].inputTokens + a[1].outputTokens)) + .slice(0, 5) + .forEach(([wsId, stats]) => { + const totalTokens = stats.inputTokens + stats.outputTokens; + console.log(`${stats.name}: ${totalTokens.toLocaleString()} tokens, ${stats.interactions.toLocaleString()} interactions`); + }); + } + + // Clean up temp file + fs.unlinkSync(tempFile); + + console.log('\n✅ Analysis complete!\n'); + console.log('Next steps:'); + console.log('- Use the raw JSON data for custom analysis'); + console.log('- Filter by specific models: --model "gpt-4o"'); + console.log('- Filter by workspace: --workspaceId "workspace123"'); + console.log('- Export to CSV: --format csv --output usage.csv'); + console.log('\nSee SKILL.md for more examples and documentation.'); + +} catch (error) { + console.error('\nError:', error.message); + // Clean up temp file if it exists + if (fs.existsSync(tempFile)) { + fs.unlinkSync(tempFile); + } + process.exit(1); +} diff --git a/.github/skills/azure-storage-loader/load-table-data.js b/.github/skills/azure-storage-loader/load-table-data.js new file mode 100755 index 0000000..4d2c3b3 --- /dev/null +++ b/.github/skills/azure-storage-loader/load-table-data.js @@ -0,0 +1,536 @@ +#!/usr/bin/env node + +/** + * Azure Storage Table Data Loader + * + * Loads token usage data from Azure Table Storage for analysis in chat conversations. + * Supports both Entra ID and Shared Key authentication. + * + * Usage: + * node load-table-data.js --storageAccount --startDate --endDate + * + * See SKILL.md for detailed documentation and examples. + */ + +const { TableClient, AzureNamedKeyCredential } = require('@azure/data-tables'); +const { DefaultAzureCredential } = require('@azure/identity'); +const fs = require('fs'); +const path = require('path'); + +// Parse command line arguments +function parseArgs() { + const args = { + storageAccount: null, + tableName: 'usageAggDaily', + datasetId: 'default', + startDate: null, + endDate: null, + model: null, + workspaceId: null, + userId: null, + sharedKey: null, + output: null, + format: 'json', + help: false + }; + + for (let i = 2; i < process.argv.length; i++) { + const arg = process.argv[i]; + const nextArg = process.argv[i + 1]; + + switch (arg) { + case '--storageAccount': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --storageAccount requires a value'); + process.exit(1); + } + args.storageAccount = nextArg; + i++; + break; + case '--tableName': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --tableName requires a value'); + process.exit(1); + } + args.tableName = nextArg; + i++; + break; + case '--datasetId': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --datasetId requires a value'); + process.exit(1); + } + args.datasetId = nextArg; + i++; + break; + case '--startDate': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --startDate requires a value'); + process.exit(1); + } + args.startDate = nextArg; + i++; + break; + case '--endDate': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --endDate requires a value'); + process.exit(1); + } + args.endDate = nextArg; + i++; + break; + case '--model': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --model requires a value'); + process.exit(1); + } + args.model = nextArg; + i++; + break; + case '--workspaceId': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --workspaceId requires a value'); + process.exit(1); + } + args.workspaceId = nextArg; + i++; + break; + case '--userId': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --userId requires a value'); + process.exit(1); + } + args.userId = nextArg; + i++; + break; + case '--sharedKey': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --sharedKey requires a value'); + process.exit(1); + } + args.sharedKey = nextArg; + i++; + break; + case '--output': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --output requires a value'); + process.exit(1); + } + args.output = nextArg; + i++; + break; + case '--format': + if (!nextArg || nextArg.startsWith('--')) { + console.error('Error: --format requires a value'); + process.exit(1); + } + args.format = nextArg; + i++; + break; + case '--help': + case '-h': + args.help = true; + break; + default: + console.error(`Unknown argument: ${arg}`); + console.error('Use --help for usage information'); + process.exit(1); + } + } + + return args; +} + +// Display help message +function showHelp() { + console.log(` +Azure Storage Table Data Loader + +Usage: + node load-table-data.js [options] + +Required Options: + --storageAccount Azure Storage account name + --startDate Start date for data retrieval + --endDate End date for data retrieval + +Optional Options: + --tableName Table name (default: "usageAggDaily") + --datasetId Dataset identifier (default: "default") + --model Filter by model name + --workspaceId Filter by workspace ID + --userId Filter by user ID + --sharedKey Azure Storage shared key (if not using Entra ID) + --output Output file path (default: stdout) + --format Output format (default: "json") + --help, -h Show this help message + +Authentication: + By default, uses DefaultAzureCredential (Entra ID). + To use Shared Key auth, provide --sharedKey option. + +Examples: + # Load data with Entra ID auth + node load-table-data.js \\ + --storageAccount myaccount \\ + --startDate 2026-01-01 \\ + --endDate 2026-01-31 + + # Load data with Shared Key auth and filter by model + node load-table-data.js \\ + --storageAccount myaccount \\ + --startDate 2026-01-01 \\ + --endDate 2026-01-31 \\ + --model gpt-4o \\ + --sharedKey "your-key-here" \\ + --output usage.json + +For more information, see SKILL.md +`); +} + +// Validate date format (YYYY-MM-DD) +function isValidDate(dateString) { + const regex = /^\d{4}-\d{2}-\d{2}$/; + if (!regex.test(dateString)) { + return false; + } + const date = new Date(dateString); + return date instanceof Date && !isNaN(date); +} + +// Generate array of date strings between start and end (inclusive) +function getDayKeysInclusive(startDate, endDate) { + const start = new Date(startDate); + const end = new Date(endDate); + const days = []; + + const current = new Date(start); + while (current <= end) { + const year = current.getFullYear(); + const month = String(current.getMonth() + 1).padStart(2, '0'); + const day = String(current.getDate()).padStart(2, '0'); + days.push(`${year}-${month}-${day}`); + current.setDate(current.getDate() + 1); + } + + return days; +} + +// Sanitize table key (replaces forbidden characters) +// Azure Tables disallow: / \ # ? +// Note: In JavaScript regex, forward slash doesn't need backslash escaping in strings +// but we keep the pattern consistent for all forbidden characters +function sanitizeTableKey(value) { + if (!value) { + return value; + } + let result = value; + const forbiddenChars = ['/', '\\', '#', '?']; + for (const char of forbiddenChars) { + // For backslash, we need to escape it in both the regex pattern and replacement + const escaped = char === '\\' ? '\\\\' : `\\${char}`; + result = result.replace(new RegExp(escaped, 'g'), '_'); + } + // Replace control characters + result = result.replace(/[\x00-\x1F\x7F-\x9F]/g, '_'); + return result; +} + +// Build partition key for a specific dataset and day +function buildPartitionKey(datasetId, dayKey) { + const raw = `ds:${datasetId}|d:${dayKey}`; + return sanitizeTableKey(raw); +} + +// Create table client with appropriate credentials +function createTableClient(storageAccount, tableName, sharedKey) { + const endpoint = `https://${storageAccount}.table.core.windows.net`; + + let credential; + if (sharedKey) { + credential = new AzureNamedKeyCredential(storageAccount, sharedKey); + console.error('Using Shared Key authentication'); + } else { + credential = new DefaultAzureCredential(); + console.error('Using DefaultAzureCredential (Entra ID)'); + } + + return new TableClient(endpoint, tableName, credential); +} + +// Fetch entities from table for a date range +async function fetchEntities(tableClient, datasetId, startDate, endDate, filters) { + const dayKeys = getDayKeysInclusive(startDate, endDate); + const allEntities = []; + + console.error(`Fetching data for ${dayKeys.length} days...`); + + for (const dayKey of dayKeys) { + const partitionKey = buildPartitionKey(datasetId, dayKey); + console.error(` Querying partition: ${partitionKey}`); + + try { + // Build OData filter with input validation + // Note: Azure Table Storage has limited SQL injection risk, but we still validate inputs + const escapeODataValue = (value) => { + if (!value || typeof value !== 'string') { + throw new Error('Filter value must be a non-empty string'); + } + // Validate that value doesn't contain logical operators or newlines + if (/\b(and|or|not)\b/i.test(value) || /[\n\r]/.test(value)) { + throw new Error('Filter value contains invalid characters or operators'); + } + // Escape single quotes per OData spec + return value.replace(/'/g, "''"); + }; + + let filter = `PartitionKey eq '${escapeODataValue(partitionKey)}'`; + + if (filters.model) { + filter += ` and model eq '${escapeODataValue(filters.model)}'`; + } + if (filters.workspaceId) { + filter += ` and workspaceId eq '${escapeODataValue(filters.workspaceId)}'`; + } + if (filters.userId) { + filter += ` and userId eq '${escapeODataValue(filters.userId)}'`; + } + + const queryOptions = { + queryOptions: { filter } + }; + + let count = 0; + for await (const entity of tableClient.listEntities(queryOptions)) { + // Normalize entity structure + const normalized = { + partitionKey: entity.partitionKey || partitionKey, + rowKey: entity.rowKey || '', + schemaVersion: entity.schemaVersion, + datasetId: entity.datasetId || datasetId, + day: entity.day || dayKey, + model: entity.model || '', + workspaceId: entity.workspaceId || '', + workspaceName: entity.workspaceName || undefined, + machineId: entity.machineId || '', + machineName: entity.machineName || undefined, + userId: entity.userId || undefined, + userKeyType: entity.userKeyType || undefined, + shareWithTeam: entity.shareWithTeam || undefined, + consentAt: entity.consentAt || undefined, + inputTokens: typeof entity.inputTokens === 'number' ? entity.inputTokens : 0, + outputTokens: typeof entity.outputTokens === 'number' ? entity.outputTokens : 0, + interactions: typeof entity.interactions === 'number' ? entity.interactions : 0, + updatedAt: entity.updatedAt || new Date().toISOString() + }; + + allEntities.push(normalized); + count++; + } + + console.error(` Found ${count} entities`); + } catch (error) { + console.error(` Error querying partition ${partitionKey}:`, error.message); + } + } + + return allEntities; +} + +// Format entities as JSON +function formatAsJSON(entities) { + return JSON.stringify(entities, null, 2); +} + +// Format entities as CSV +function formatAsCSV(entities) { + if (entities.length === 0) { + return ''; + } + + // CSV headers + const headers = [ + 'day', + 'model', + 'workspaceId', + 'workspaceName', + 'machineId', + 'machineName', + 'userId', + 'userKeyType', + 'inputTokens', + 'outputTokens', + 'interactions', + 'updatedAt' + ]; + + const rows = [headers.join(',')]; + + // CSV data rows + for (const entity of entities) { + const values = headers.map(header => { + const value = entity[header]; + if (value === undefined || value === null) { + return ''; + } + // Escape commas and quotes + const stringValue = String(value); + if (stringValue.includes(',') || stringValue.includes('"') || stringValue.includes('\n')) { + return `"${stringValue.replace(/"/g, '""')}"`; + } + return stringValue; + }); + rows.push(values.join(',')); + } + + return rows.join('\n'); +} + +// Main execution +async function main() { + const args = parseArgs(); + + // Show help if requested + if (args.help) { + showHelp(); + return null; + } + + // Validation (throw errors so callers can handle them) + if (!args.storageAccount) { + throw new Error('--storageAccount is required'); + } + + if (!args.startDate || !args.endDate) { + throw new Error('--startDate and --endDate are required'); + } + + if (!isValidDate(args.startDate)) { + throw new Error('--startDate must be in YYYY-MM-DD format'); + } + + if (!isValidDate(args.endDate)) { + throw new Error('--endDate must be in YYYY-MM-DD format'); + } + + if (new Date(args.startDate) > new Date(args.endDate)) { + throw new Error('--startDate must be before or equal to --endDate'); + } + + if (args.format !== 'json' && args.format !== 'csv') { + throw new Error('--format must be either "json" or "csv"'); + } + + console.error('Azure Storage Table Data Loader'); + console.error('=============================='); + console.error(`Storage Account: ${args.storageAccount}`); + console.error(`Table Name: ${args.tableName}`); + console.error(`Dataset ID: ${args.datasetId}`); + console.error(`Date Range: ${args.startDate} to ${args.endDate}`); + if (args.model) { + console.error(`Model Filter: ${args.model}`); + } + if (args.workspaceId) { + console.error(`Workspace Filter: ${args.workspaceId}`); + } + if (args.userId) { + console.error(`User Filter: ${args.userId}`); + } + console.error(''); + + // Main execution + // Instead of writing output to a file or printing it unconditionally, + // produce the result and attach it to `module.exports.tresult` and return it. + // Callers can require this module and read `tresult`. + // Writing to disk has been intentionally removed per request. + // Note: logs (console.error) are preserved for progress info. + + // Create table client + const tableClient = createTableClient( + args.storageAccount, + args.tableName, + args.sharedKey + ); + + // Fetch entities + const entities = await fetchEntities( + tableClient, + args.datasetId, + args.startDate, + args.endDate, + { + model: args.model, + workspaceId: args.workspaceId, + userId: args.userId + } + ); + + console.error(''); + console.error(`Total entities fetched: ${entities.length}`); + + // Calculate totals + const totals = entities.reduce( + (acc, entity) => { + acc.inputTokens += entity.inputTokens; + acc.outputTokens += entity.outputTokens; + acc.interactions += entity.interactions; + return acc; + }, + { inputTokens: 0, outputTokens: 0, interactions: 0 } + ); + + console.error(''); + console.error('Totals:'); + console.error(` Input Tokens: ${totals.inputTokens.toLocaleString()}`); + console.error(` Output Tokens: ${totals.outputTokens.toLocaleString()}`); + console.error(` Total Tokens: ${(totals.inputTokens + totals.outputTokens).toLocaleString()}`); + console.error(` Interactions: ${totals.interactions.toLocaleString()}`); + console.error(''); + + // Format output + let output; + if (args.format === 'csv') { + output = formatAsCSV(entities); + } else { + output = formatAsJSON(entities); + } + + // Attach result to module.exports and return it + module.exports.tresult = output; + return output; +} + +// Run if executed directly +if (require.main === module) { + main() + .then(result => { + // When executed as CLI, print the result to stdout for visibility. + if (result !== null && result !== undefined) { + console.log(result); + } + process.exit(0); + }) + .catch(error => { + console.error(''); + console.error('Error:', error.message || error); + if (error.stack) { + console.error('Stack trace:', error.stack); + } + process.exit(1); + }); +} + +module.exports = { + parseArgs, + isValidDate, + getDayKeysInclusive, + sanitizeTableKey, + buildPartitionKey, + createTableClient, + fetchEntities, + formatAsJSON, + formatAsCSV, + // `tresult` will hold the final output (JSON or CSV string) after `main()` runs + tresult: null +}; diff --git a/.github/skills/azure-storage-loader/package.json b/.github/skills/azure-storage-loader/package.json new file mode 100644 index 0000000..dcd3464 --- /dev/null +++ b/.github/skills/azure-storage-loader/package.json @@ -0,0 +1,23 @@ +{ + "name": "azure-storage-loader", + "version": "1.0.0", + "description": "Helper scripts for loading token usage data from Azure Table Storage", + "main": "load-table-data.js", + "private": true, + "scripts": { + "load": "node load-table-data.js" + }, + "keywords": [ + "azure", + "storage", + "tables", + "copilot", + "token-usage" + ], + "author": "Rob Bos", + "license": "MIT", + "dependencies": { + "@azure/data-tables": "^13.3.2", + "@azure/identity": "^4.13.0" + } +}