Skip to content

NahimNasser/pu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pu.sh

A zero-package-dependency coding agent under 50KB. Pronounced exactly how you think.

Finally, a slop cannon small enough to fit in your pocket.

curl -sL pu.dev/pu.sh -o pu.sh && chmod +x pu.sh
./pu.sh

That's the entire install. No npm. No pip. No Docker. No Node. One shell file, common Unix tools, curl, awk, and an API key.

What

# Zero package dependencies. Literally.
curl -sL pu.dev/pu.sh > pu.sh && chmod +x pu.sh

# First run walks you through provider, key, model, and effort.
./pu.sh

# One-shot task.
./pu.sh "find bugs in pu.sh"

# Interactive multi-turn session.
./pu.sh
> write a REST API server in Go
> now add rate limiting
> write tests for it

# Pipe agents together because we're adults.
./pu.sh "write the code" | ./pu.sh --pipe "review it for security bugs"

# Env-only setup still works.
OPENAI_API_KEY=sk-... AGENT_PROVIDER=openai AGENT_MODEL=gpt-5.5 ./pu.sh "your task"
ANTHROPIC_API_KEY=sk-ant-... AGENT_PROVIDER=anthropic AGENT_MODEL=claude-opus-4-7 ./pu.sh "your task"

Why

We ran 30+ experiments to answer a question: what's the most portable agentic harness that can run anywhere?

The answer is a shell script. The agent loop itself — send prompt, parse response, execute tool, append to history, repeat — is tiny. Everything else is developer experience and hardening.

Here's the thing nobody tells you: the node_modules folder of a typical coding agent weighs more than the entire Doom source code. Three times over. pu.sh weighs less than many README files.

Features

What How
7 tools bash read write edit grep find ls — Pi-shaped surface area
Interactive REPL Multi-turn with memory; /model /effort /login /logout /flush /compact /export /skill:name /quit
First-run login API-key wizard for Anthropic/OpenAI, optional private ~/.pu.env save
Dual provider Anthropic Messages API + OpenAI Responses API
OpenAI tool loop Preserves reasoning, function_call, and function_call_output items across turns
Reasoning effort `AGENT_EFFORT=none
File editing Surgical oldTextnewText replacement; rejects empty or non-unique matches
Safer file writes/edits Preserves trailing newlines, uses temp files, keeps executable mode on edits
Context files Auto-loads AGENTS.md / CLAUDE.md from cwd upward, plus global Pi agent context if present
Auto-compaction Summarizes older turns when approximate context budget is exceeded; /compact [focus] manually compacts
Context/status line Shows cwd, git branch, token counts, context usage, provider, model, effort
!command !ls -la runs shell inline from the REPL
Prompt templates /name expands .pi/prompts/name.md or ~/.pi/agent/prompts/name.md
Skills /skill:name loads SKILL.md from local or user skill directories
Session export /export writes markdown from .pu-events.jsonl
Pipe mode --pipe for clean stdout, composable with other tools/agents
Checkpoint/resume Writes .pu-history.json by default; override with AGENT_HISTORY=file.json
Confirmation mode AGENT_CONFIRM=1 asks before every tool execution; safely denies when no TTY
Event log Every step logged to .pu-events.jsonl as structured JSONL
Regression tests bash eval/test_real.sh runs 105 no-API behavioral tests

What it can't do

Let's be honest. The remaining gap to a production harness needs a real runtime:

  • No TUI (it's a shell script, not a lifestyle)
  • No streaming display (curl waits for the full response like a patient person)
  • No image input
  • No OAuth/browser login; API keys only
  • No native Windows support
  • No keyboard shortcuts, path completion, themes, or raw-terminal editor
  • No package manager or TypeScript plugin SDK
  • No full model registry/pricing database
  • No general JSON parser; it uses targeted awk parsing for provider shapes

pu.sh is the same slop cannon but small enough to inspect end to end and know exactly where the slop is coming from.

The Size

pu.sh              < 50 KB            █  (sh + curl + awk + common Unix tools)
Claude Code         209 MB            ██████████████████████████
Goose CLI           237 MB            █████████████████████████████
Pi + Node           281 MB            ███████████████████████████████████
SWE-agent Docker    1.8 GB            ██████████████████████████████████████████████████████████████...

Measured locally on macOS arm64. Current generated pu.sh is 49,276 bytes (48.12 KiB) by wc -c; the headline stays under 50 KB. Larger tools include their runtime/package footprints as described in final_report.md.

Configuration

All env vars. Optional ~/.pu.env is created by /login/first run with 0600-style permissions and parsed with a tiny allowlist loader.

Variable Default What
AGENT_PROVIDER auto from key/model, else anthropic anthropic or openai
AGENT_MODEL claude-opus-4-7 or gpt-5.5 Model id
ANTHROPIC_API_KEY Anthropic API key
OPENAI_API_KEY OpenAI API key
AGENT_EFFORT medium `none
AGENT_REASONING_SUMMARY auto OpenAI reasoning summary request: `auto
AGENT_THINKING Legacy/Anthropic thinking hint; falls back into effort behavior
AGENT_MAX_STEPS 100 Max API/tool-loop steps before stopping
AGENT_MAX_TOKENS 4096 Base visible-output budget; raised for higher effort
AGENT_CONTEXT_LIMIT 400000 OpenAI-ish / 272000 Opus-ish Approximate context budget in bytes/chars
AGENT_RESERVE 16000 Reserved context budget before compaction
AGENT_KEEP_RECENT 80000 Approx bytes/chars of recent transcript to keep after compaction
AGENT_TOOL_TRUNC 100000 Max non-read tool output before truncation
AGENT_READ_MAX 1000000 Require offset/limit for larger file reads
AGENT_LOG_TRUNC 20000 Max event-log payload before trace-only truncation
AGENT_CONFIRM 0 1 = ask before each tool call
AGENT_LOG .pu-events.jsonl Event/debug JSONL log file
AGENT_HISTORY .pu-history.json Checkpoint file for automatic resume
AGENT_SYSTEM built-in Custom system prompt
AGENT_PRICE_IN_PER_MTOK / AGENT_PRICE_OUT_PER_MTOK 0 Optional cost display with --cost
AGENT_DEBUG_API Directory to capture per-call input/response JSON for debugging

Commands

Command What
/model [id] Show or switch model; guesses provider from gpt-*/o*/claude-*
/effort [level] Show or set reasoning effort (none, low, medium, high, xhigh, etc.)
/reasoning [mode] Show or set OpenAI reasoning summaries (auto, concise, detailed, off)
/login Run API-key setup wizard
/logout Remove ~/.pu.env and unset in-process keys
/flush Reset the session: clear memory/history, remove history metadata, and truncate event log
/compact [focus] Summarize older context, optionally with focus text
/export [file] Export event log to markdown
/skill:name Load name/SKILL.md into the system prompt
/quit Exit
!cmd Run a shell command directly
/template If .pi/prompts/template.md exists, run it as a prompt

How it works

┌─────────────────────────────────────────┐
│  You type a thing                       │
│  ↓                                      │
│  curl sends it to Claude/GPT            │
│  ↓                                      │
│  Model asks for a tool                  │
│  ↓                                      │
│  Shell runs read/write/edit/bash/etc.   │
│  ↓                                      │
│  Result goes back to model              │
│  ↓                                      │
│  Model says done                        │
└─────────────────────────────────────────┘

Zero package dependencies. Under 50KB. 7 tools. 2 providers. 1 file.

OpenAI uses /v1/responses with Responses-style tools and max_output_tokens. Anthropic uses /v1/messages. The parser is targeted awk, not a general JSON implementation.

How it works

The short version:

prompt → provider → tool call → shell tool → tool result → repeat

pu.sh writes .pu-history.json for resumable model memory and .pu-events.jsonl for event replay/export. Long sessions auto-compact by summarizing older transcript entries and keeping a bounded recent tail.

For details, see How pu works.

Testing

# No API calls, no cost. Current expected result: PASS: 105 FAIL: 0.
bash eval/test_real.sh

# Shell syntax.
sh -n pu.sh

The current regression suite covers:

  • JSON escaping and targeted JSON extraction
  • Anthropic and OpenAI response parsing
  • OpenAI Responses request shape and reasoning gating
  • OpenAI tool continuation with reasoning + function_call_output
  • API-error reporting, curl transport failures, model-error hints, and non-retryable auth errors
  • first-run key sanitization and safe allowlist ~/.pu.env loading
  • default history save/resume of final assistant responses
  • context compaction invariants
  • tool truncation
  • edit uniqueness/mode preservation and actionable edit-failure guidance
  • grep/find noisy-directory exclusions and /effort command
  • trailing-newline preservation for write/edit
  • read limit:0
  • spinner quietness on non-TTY stderr

The bugs we found so you don't have to

  1. set -e is a serial killer. [ -f file ] && do_thing returns 1 when the file doesn't exist. set -e treats that as fatal and silently kills your script. We use set -u, not set -e.
  2. macOS sed ≠ GNU sed. The classic multiline sed trick breaks on BSD sed. Use awk.
  3. jq was a dependency. We wrote targeted awk JSON extraction to keep install at zero.
  4. Heredocs don't survive JSON reliably. The system prompt steers models to the write tool instead.
  5. OpenAI tool calling is not Chat Completions anymore. For reasoning + tools, pu.sh uses Responses API, carries reasoning items forward, and sends function_call_output items.
  6. Shell command substitution eats trailing newlines. write and edit use sentinel capture to preserve final \n.
  7. Generic status spam is worse than silence. If the model doesn't provide a real pre-tool preamble, pu.sh just prints the actual tool call instead of Inspecting with tools... forever.

Prior art & credits

pu.sh is a derived work, and we want to be loud about it. The system prompt structure, 7-tool surface (bash read write edit grep find ls), exact-text editing model, context-file convention, and skill/template ideas are inspired by Pi. Huge thanks and respect to the Pi team.

We compare against Pi feature-by-feature. Pi wins on extensibility, TUI, providers, safety, and production polish. pu.sh wins on portability and inspectability.

FAQ

Is this production-ready? It's called pu.sh. It's an under-50KB slop cannon that talks to LLM APIs via curl. You tell me.

Should I use this instead of Pi/Claude Code/Cursor? For daily coding, probably not. Use a real tool. For CI/CD, containers, edge boxes, quick scripts, or understanding how agents actually work — ./pu.sh and see what happens.

How do I pronounce it? However makes your coworkers the most uncomfortable.

Did you really name a coding agent after feces? It's pu.sh. As in push. As in ./pu.sh "deploy to prod". The fact that it sounds like something else is entirely coincidental and we are very serious engineers.

Did an AI write this? An AI and a human ran experiments, argued with shell, broke OpenAI schemas, fixed them, and learned once again that the real production incident was set -e all along.

License

MIT — see LICENSE. It's under 50KB. Go nuts.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors