Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions AGENT.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
A macOS tool to capture screenshots of terminal application windows (Terminal, iTerm2, Ghostty, kitty, etc.) and inject keystrokes into them. Uses Core Graphics APIs.
A cross-platform tool (macOS + Linux) to capture screenshots of terminal application windows and inject keystrokes into them via a Telegram bot. Uses Core Graphics on macOS, X11/XTest on Linux.

# File Structure

Put the file structure of the project below. Update if needed.

```
bot.c - Telegram bot main source
Makefile - Build system
bot.c - Telegram bot main source (platform-independent)
platform.h - Platform abstraction interface
platform_macos.c - macOS backend (Core Graphics + Accessibility)
platform_linux.c - Linux backend (X11 + XTest + libpng)
Makefile - Build system (auto-detects OS)
botlib.*, sds.*, cJSON.*, sqlite_wrap.*, json_wrap.* - From botlib
qrcodegen.c, qrcodegen.h - QR code generation (Nayuki, MIT license)
sha1.c, sha1.h - SHA-1 + HMAC-SHA1 (Steve Reid, public domain)
Expand Down
37 changes: 28 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,19 +1,38 @@
CC = clang
CFLAGS = -Wall -O2 -mmacosx-version-min=14.0
FRAMEWORKS = -framework CoreGraphics -framework CoreFoundation -framework ImageIO \
-framework CoreServices -framework ApplicationServices
LIBS = -lcurl -lsqlite3

OBJS = bot.o botlib.o sds.o cJSON.o sqlite_wrap.o json_wrap.o qrcodegen.o sha1.o
UNAME_S := $(shell uname -s)

ifeq ($(UNAME_S),Darwin)
CC = clang
CFLAGS = -Wall -O2 -mmacosx-version-min=14.0
PLATFORM_LIBS = -framework CoreGraphics -framework CoreFoundation \
-framework ImageIO -framework CoreServices \
-framework ApplicationServices
PLATFORM_OBJ = platform_macos.o
else ifeq ($(UNAME_S),Linux)
CC ?= gcc
CFLAGS = -Wall -O2
PLATFORM_LIBS = -lX11 -lXtst -lpng
PLATFORM_OBJ = platform_linux.o
endif

LIBS = -lcurl -lsqlite3 $(PLATFORM_LIBS)

OBJS = bot.o $(PLATFORM_OBJ) botlib.o sds.o cJSON.o sqlite_wrap.o \
json_wrap.o qrcodegen.o sha1.o

all: tgterm

tgterm: $(OBJS)
$(CC) $(CFLAGS) -o $@ $(OBJS) $(FRAMEWORKS) $(LIBS)
$(CC) $(CFLAGS) -o $@ $(OBJS) $(LIBS)

bot.o: bot.c botlib.h sds.h
bot.o: bot.c botlib.h sds.h platform.h
$(CC) $(CFLAGS) -c bot.c

platform_macos.o: platform_macos.c platform.h
$(CC) $(CFLAGS) -c platform_macos.c

platform_linux.o: platform_linux.c platform.h
$(CC) $(CFLAGS) -c platform_linux.c

botlib.o: botlib.c botlib.h sds.h cJSON.h sqlite_wrap.h
$(CC) $(CFLAGS) -c botlib.c

Expand Down
69 changes: 64 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,21 +21,25 @@ This is how it works:
2. After you setup your TOTP, you send the bot the first message, and you become its owner. It will only accept queries from you (your Telegram ID) and will require you to authenticat with an OTP for the first time, and again after a timeout.
3. At this point, you can ask for the list of terminal windows in your system with `.list`, connect to one of them with (for instance) `.2`, then you can send any text that will be "typed" in the window, like if you are still at your computer. You have modifiers, ways to send `ESC`, and so forth, so you can do many things, like changing the visible tab.

Important: **this program only works on macOS for now.**
**Supported platforms:** macOS and Linux (X11).

## First run

Please note in advance that the **program requires two system permissions** to function:
### macOS permissions

The program requires two system permissions on macOS:

- **Screen Recording** — needed to capture terminal window screenshots.
- **Accessibility** — needed to inject keystrokes and raise windows.

MacOS will prompt you to grant these on first use. If screenshots or keystrokes silently fail, check System Settings → Privacy & Security.
macOS will prompt you to grant these on first use. If screenshots or keystrokes silently fail, check System Settings → Privacy & Security.

To setup the project:
### Setup

1. Create a Telegram bot via [@BotFather](https://t.me/botfather) and get the API key.
2. Install `libcurl` and `libsqlite3`. The project also uses my own `botlib` but it is included directly into the project, so no need to install anything.
2. Install dependencies:
- **macOS:** `libcurl` and `libsqlite3` (usually pre-installed).
- **Linux:** `sudo apt install libx11-dev libxtst-dev libpng-dev libcurl4-openssl-dev libsqlite3-dev`
3. Build with `make` and run:

```
Expand Down Expand Up @@ -89,6 +93,13 @@ Once connected to a window, any text you send is typed into it as keystrokes. A

Modifiers can be combined: `❤️💙x` sends Ctrl+Alt+X. A single modified keystroke (like `❤️c`) will not have an automatic newline appended.

**Navigation keys:**

- ⬆️ ⬇️ ⬅️ ➡️ — Arrow keys (command history, cursor movement)
- 🔼 🔽 — Page Up / Page Down (scrolling in vim, less, etc.)

Modifiers work with navigation too: `❤️⬆️` sends Ctrl+Up.

**Escape sequences:** `\n` sends Enter, `\t` sends Tab, `\\` sends a literal backslash.

### Screenshots
Expand All @@ -109,9 +120,57 @@ This tool allows remote control of terminal windows via Telegram. Given the sens

**Disabling TOTP.** If you don't want OTP authentication (not recommended), run with `--use-weak-security`. The bot will still enforce ownership but will not require OTP codes.

## Headless / VM usage

tgterm can run on headless servers and VMs (no physical monitor) using Xvfb, a virtual X framebuffer. This is useful for controlling coding agents remotely from your phone.

### Quick setup

Install the required packages:

```
sudo apt install xvfb openbox xterm
```

Start the virtual display, a window manager, and a terminal:

```
Xvfb :99 -screen 0 1920x1080x24 &
DISPLAY=:99 openbox &
DISPLAY=:99 xterm -fa Monospace -fs 14 -geometry 120x40 &
```

Then run tgterm pointed at the virtual display:

```
DISPLAY=:99 ./tgterm --apikey <your-api-key>
```

You can launch as many xterm sessions as you need and switch between them with `.list` and `.1`, `.2`, etc. from Telegram.

### Resolution

Change the Xvfb screen size for higher resolution screenshots. For example, 2560x1440:

```
Xvfb :99 -screen 0 2560x1440x24 &
```

### Terminal theme and colors

xterm reads `~/.Xresources` for appearance settings (font, colors, geometry). Load them with `xrdb -merge ~/.Xresources` before starting xterm. You can also pass settings directly via `-xrm` flags on the xterm command line.

### What you need

- **Xvfb** — virtual X server (renders to memory, no GPU needed).
- **A window manager** — openbox is lightweight and sufficient. Required so that tgterm can enumerate windows via `_NET_CLIENT_LIST`.
- **A terminal emulator** — xterm works everywhere. Any X11 terminal will do.

## Limitations

- **Deprecated macOS APIs.** The project uses older Core Graphics and Process Manager APIs for screenshot capture and window management. These produce compiler warnings on macOS 14+ but still work correctly, and provide good compatibility with older macOS versions. They will be replaced if and when Apple removes them.
- **Linux: X11 only.** The Linux backend requires an X11 display server. Wayland is not supported (XWayland may work).
- **Linux: screenshots require visible windows.** On Linux, the screenshot is captured from the root window at the window's position, so the window must be visible (not occluded). The bot raises the window before capturing, which handles this in most cases.
- **UTF-8 keystrokes.** Non-ASCII text (beyond the special emoji modifiers) is not handled correctly when sending keystrokes. Only ASCII characters are reliably injected.

## Credits
Expand Down
Loading