I've always wanted to have a way to reliably generate plausible sounding names for new projects.
namegen is a Go CLI app that randomly generates short, pronounceable names.
The current implementation uses three layers:
- Template-based random word construction (using a vowel/consonant rhythm)
- Rule-based filtering and penalties
- Corpus-trained bigram scoring
git clone https://github.com/dashmage/namegen.git
cd namegen
go build
./namegen# by default, namegen generates 10 random 5-letter names
$ namegen
matog
sebeg
xaire
cuzer
moevy
lagok
hukar
pemox
pasit
rioqu
# generate 3 6-letter names
$ namegen --count=3 --length=6
nezila
pepyom
fozlar
# optional seed value for deterministic output
$ namegen --count=5 --length=5 --seed=42
libuf
padai
saire
keipy
jifatHere's all the possible flags (from internal/cli/config.go):
--attemptsmax random word generation attempts--countnumber of words to generate--lengthgenerated word length--seedoptional RNG seed for reproducible output--thresholdminimum acceptance score--debugprint scores and generation diagnostics--tuneprint stats for all randomly generated words for tuning values
At a high level, the CLI loops until it has produced the requested number of words or exhausted attempts total tries:
- Build a candidate word with a weighted rhythm template (
CV,CVC,CVV,VC) - Apply hard rules (rules that reject the word immediately on failure)
- Apply soft rules (rules that subtract penalties)
- Apply a bigram score adjustment from the trained model
- Accept the candidate if final score is above threshold
The core flow is implemented in:
internal/gen/generator.gointernal/gen/rules.gointernal/gen/score.gointernal/gen/model.go
Instead of drawing each letter uniformly from a-z, candidates are built from vowel/consonant patterns to create more natural rhythm.
C= consonantV= vowel
The generator samples from weighted templates:
CV(weight 5)CVC(weight 6)CVV(weight 2)VC(weight 1)
Templates are concatenated until the requested length is reached, then trimmed to exact length.
Additional shaping:
- prevent
VVVtriplets by converting the middleVtoC - slightly bias final character toward consonants
- de-emphasize
yin vowel sampling
This structure dramatically improves pronounceability compared to fully uniform random letters. Check out generator.go to get a better idea.
Rules are separated by behavior:
- Hard rules: immediate reject
- Soft rules: keep candidate, subtract score
Hard rules
- three consecutive consonants
- illegal ending characters
- missing a core vowel (
a/e/i/o/u) - triple repeated letters
- disallowed consonant adjacency
Soft rules
- uncommon or awkward sequences (
qx,jq,qj, etc.) qnot followed byu- too many rare letters (
j,q,x,z) - repeated identical vowel pairs
- doubled consonant endings
The bigram model scores how plausible adjacent letter transitions are, based on a corpus.
BigramModel stores:
Count map[[2]byte]int- counts of each transition, e.g. (
t,h) -> 1842
- counts of each transition, e.g. (
Row map[byte]int- total transitions leaving a character, e.g.
t-> sum of allt -> *
- total transitions leaving a character, e.g.
Alpha float64- Laplace smoothing factor
Constants:
StartToken = '^'EndToken = '$'VocabSize = 28(a-zplus^,$)
For each corpus word:
- normalize to lowercase
a-z - add boundaries:
^word$ - for each adjacent pair
(a,b):Count[(a,b)]++Row[a]++
Without smoothing, unseen transitions have probability 0, which can collapse the whole word probability.
Laplace smoothing avoids that:
P(b|a) = (Count(a,b) + alpha) / (Row(a) + alpha * VocabSize)
This keeps unseen pairs possible but still low-probability.
Word probability is a product of many small values. Multiplication underflows and is harder to debug.
Using logs converts products into sums:
log P(word) = sum(log P(next|current))
The model uses average log probability so scores are comparable across lengths.
Corpus words:
lena,lora,nora,mila,mira,sora
Candidate:
lora
Transitions with boundaries:
^ -> ll -> oo -> rr -> aa -> $
Assume alpha = 0.5, VocabSize = 28, and trained counts give:
Count(^,l)=2,Row(^)=6Count(l,o)=1,Row(l)=3Count(o,r)=3,Row(o)=3Count(r,a)=4,Row(r)=4Count(a,$)=6,Row(a)=6
Then:
P(l|^) = (2+0.5)/(6+14) = 0.125,ln = -2.079P(o|l) = (1+0.5)/(3+14) = 0.0882,ln = -2.428P(r|o) = (3+0.5)/(3+14) = 0.2059,ln = -1.580P(a|r) = (4+0.5)/(4+14) = 0.2500,ln = -1.386P($|a) = (6+0.5)/(6+14) = 0.3250,ln = -1.124
Log sum:
-8.597
Average log probability:
-8.597 / 5 = -1.719
Scoring flow example:
- hard rules pass
- no soft penalties triggered
- probability band for
-1.719gives a small bonus - final score stays above acceptance threshold
- candidate accepted