Proteus is an information-theoretic programming language and inference engine built around the concept of infons -- structured information units that can represent numbers, strings, lists, and complex nested data. Proteus aims to bridge programming and natural language by combining a formal data model with natural language understanding capabilities.
Most programming languages treat data and natural language as fundamentally separate concerns. Proteus takes a different approach: its core data structure -- the infon -- ties formal types directly to vocabulary, so a data schema is also a semantic definition. A field named color isn't just a label; it carries the linguistic meaning and type constraints of the English word "color."
This matters when you need:
- Executable specifications -- Business rules, compliance policies, or domain models written in near-English that a machine can also reason about. No separate "requirements doc" that drifts from the implementation.
- Knowledge bases with built-in inference -- Define facts, constraints, and relationships as infons. The agenda-driven engine normalizes and merges them automatically, resolving ambiguities without hand-coded logic.
- Semantic data modeling -- Schema definitions where types are grounded in vocabulary, not arbitrary strings. Type-checking catches semantic mismatches, not just structural ones.
- Natural language interfaces -- Parse English input into typed infon structures using the built-in translator, then run inference over the result.
Proteus is not a general-purpose application language. It is a knowledge representation and inference engine designed for problems where the gap between human meaning and machine processing is the core challenge.
Knowledge Engineering -- Build ontologies where facts are queryable and composable. The Resources/bike.pr example defines a bicycle as a composition of typed parts (frame, seat, chain), each with inherited properties from a base thing type. The engine can infer relationships and validate consistency across the model.
Domain-Specific Language Definition -- The Resources/toyLang.pr example defines a grammar (identifiers, comparison operators, loops, statements) entirely in Proteus syntax. The inference engine handles parsing and type-checking for the defined language, making Proteus a meta-tool for building other languages.
Natural Language Processing Research -- The English translator includes thousands of inflection rules (plurals, verb forms, possessives, irregular cases). Researchers can test theories about how formal semantics and natural language interact using a system that treats both as first-class concerns.
Business Rules and Compliance -- Encode rules in a form that is both human-auditable and machine-executable. Because infons preserve linguistic meaning alongside formal structure, rule bases can be reviewed by domain experts who don't need to read code.
- Infon-based data model -- Programs are expressed as structured information ("infons") that support numbers, strings, lists, typed fields, and nested structures.
- Natural language integration -- Includes an English language translator (
xlators/xlator_en.dog) for parsing and processing natural language constructs. - Agenda-based inference engine -- Resolves relationships between infons through an agenda-driven normalization and merging process.
- Model and vocabulary management -- Define and look up typed words and their meanings via the built-in model manager.
- Infon viewer -- A standalone viewer application for inspecting infon structures.
Proteus source files are written in CodeDog (.dog files), which compiles to C++. To build Proteus you will need:
- CodeDog -- the CodeDog compiler
- GNU C++ toolchain -- GCC/G++ on Linux (the primary supported platform)
- Python 3 -- for
ruleMgr.pyand related tooling
The default build configuration targets Linux with the GNU C++ toolchain. From the project root:
codedog Proteus.Lib.dogThe build line in Proteus.Lib.dog is:
LinuxTestBuild: Platform='Linux' Lang='CPP' LangVersion='GNU' testMode='makeTests';
Note:
Proteus.Lib.dogincludesWorldManager.dog, which is not currently present in the repository. You may need to obtain this file from the maintainers or check whether it is generated by a companion tool before building.
Proteus includes a CodeDog test suite and a C++ test harness:
# Build the test executable via the LinuxTestBuild config
codedog Proteus.Lib.dog
# Run the generated test executable
./TestProteus
# Compile and run the C++ test harness separately
g++ -g -std=c++11 infonTest.cpp -o infonTest && ./infonTest| Path | Description |
|---|---|
Proteus.Lib.dog |
Main engine library and entry point |
infonIO.dog |
Infon input/output, parsing, and serialization |
infonList.dog |
Infon list data structures and operations |
ModelManager.dog |
Model and vocabulary management |
Functions.dog |
Built-in functions |
debugSystems.dog |
Debugging and diagnostic systems |
clip.dog |
Clipboard and utility operations |
timeAccess.dog |
Time access utilities |
infonViewer.dog |
Standalone infon viewer application |
DB_workAround.dog |
Database v1 workarounds (string utilities) |
testInflect.dog |
Inflection testing for the English translator |
xlators/xlator_en.dog |
English language translator |
ProteusTests.dog |
Test suite |
ProteusDBServer.dog |
Database server component |
ruleMgr.py |
Rule management (Python) |
infonTest.cpp |
Infon C++ test harness |
Examples/ |
Example Proteus programs |
Resources/ |
Sample .pr files and a web interface (web/) |
theory/ |
Experimental and theoretical work |
Here is a Tic-tac-toe game written in Proteus syntax (Examples/Tic-tac-toe.pr):
def tic_tac_toe: {
def playerSymbol: ['X' | 'O']
def slot: {T [' ' | 'X' | 'O'] | ...}
def row: *3+{slot| ...}
def board: *3+{row| ...}
def move: {column:1..3, row:1..3}
def player: {name playerSymbol moves:{T move| ...}}
def turn: {player, move, board}
def winner: [ 'X' 'O' 'Tie']
*2 + { player |
{%.name = userInput<:{prompt: "Player X, enter your name" %.playerSymbol="X"}}
{%.name = userInput<:{prompt: "Player O, enter your name" %.playerSymbol="O"}}
}
def play: {
turns: {T turn|
{playerSymbol:'X' move:player.0.move.0 board:{ *3+{' ' ' ' ' '}|...}}
#{ {
playerSymbol= !playerSymbol
move=player.playerSymbol.move = userInput<:{prompt: (%.name " please enter your move:")}
board.(move.column).(move.row) = playerSymbol
}
| ...}
{[ %turns.size==9 | CheckWinningBoard<: board]}
}
winner: [{%turns.size=9 %='Tie'} | turns.last.player]
}
}
tic_tac_toe.play
Proteus is under active development (version 0.8). Current work includes:
- Streaming normalization (work in progress)
- Syntax updates (
withEachloop changes) - Thread synchronization fixes
- Agenda ordering improvements
Known issue: Proteus.Lib.dog references WorldManager.dog via #include, but this file is not present in the repository.
- Bruce Long
- KT Lawrence
All Rights Reserved.
"This file is part of the "Proteus Language suite" All Rights Reserved."
Copyright (c) 2015-2023 Bruce Long