namegen

I've always wanted to have a way to reliably generate plausible sounding names for new projects.

namegen is a Go CLI app that randomly generates short, pronounceable names.

The current implementation uses three layers:

Template-based random word construction (using a vowel/consonant rhythm)
Rule-based filtering and penalties
Corpus-trained bigram scoring

Install

git clone https://github.com/dashmage/namegen.git
cd namegen
go build
./namegen

Usage

# by default, namegen generates 10 random 5-letter names
$ namegen
matog
sebeg
xaire
cuzer
moevy
lagok
hukar
pemox
pasit
rioqu

# generate 3 6-letter names
$ namegen --count=3 --length=6
nezila
pepyom
fozlar

# optional seed value for deterministic output
$ namegen --count=5 --length=5 --seed=42
libuf
padai
saire
keipy
jifat

Here's all the possible flags (from internal/cli/config.go):

--attempts max random word generation attempts
--count number of words to generate
--length generated word length
--seed optional RNG seed for reproducible output
--threshold minimum acceptance score
--debug print scores and generation diagnostics
--tune print stats for all randomly generated words for tuning values

How does it work?

At a high level, the CLI loops until it has produced the requested number of words or exhausted attempts total tries:

Build a candidate word with a weighted rhythm template (CV, CVC, CVV, VC)
Apply hard rules (rules that reject the word immediately on failure)
Apply soft rules (rules that subtract penalties)
Apply a bigram score adjustment from the trained model
Accept the candidate if final score is above threshold

The core flow is implemented in:

internal/gen/generator.go
internal/gen/rules.go
internal/gen/score.go
internal/gen/model.go

Template-based random word generation

Instead of drawing each letter uniformly from a-z, candidates are built from vowel/consonant patterns to create more natural rhythm.

C = consonant
V = vowel

The generator samples from weighted templates:

CV (weight 5)
CVC (weight 6)
CVV (weight 2)
VC (weight 1)

Templates are concatenated until the requested length is reached, then trimmed to exact length.

Additional shaping:

prevent VVV triplets by converting the middle V to C
slightly bias final character toward consonants
de-emphasize y in vowel sampling

This structure dramatically improves pronounceability compared to fully uniform random letters. Check out generator.go to get a better idea.

Rules: hard vs soft

Rules are separated by behavior:

Hard rules: immediate reject
Soft rules: keep candidate, subtract score

Hard rules

three consecutive consonants
illegal ending characters
missing a core vowel (a/e/i/o/u)
triple repeated letters
disallowed consonant adjacency

Soft rules

uncommon or awkward sequences (qx, jq, qj, etc.)
q not followed by u
too many rare letters (j, q, x, z)
repeated identical vowel pairs
doubled consonant endings

Bigram model

The bigram model scores how plausible adjacent letter transitions are, based on a corpus.

BigramModel fields

BigramModel stores:

Count map[[2]byte]int
- counts of each transition, e.g. (t,h) -> 1842
Row map[byte]int
- total transitions leaving a character, e.g. t -> sum of all t -> *
Alpha float64
- Laplace smoothing factor

Constants:

StartToken = '^'
EndToken = '$'
VocabSize = 28 (a-z plus ^, $)

Training

For each corpus word:

normalize to lowercase a-z
add boundaries: ^word$
for each adjacent pair (a,b):
- Count[(a,b)]++
- Row[a]++

Laplace smoothing

Without smoothing, unseen transitions have probability 0, which can collapse the whole word probability.

Laplace smoothing avoids that:

P(b|a) = (Count(a,b) + alpha) / (Row(a) + alpha * VocabSize)

This keeps unseen pairs possible but still low-probability.

Log probability

Word probability is a product of many small values. Multiplication underflows and is harder to debug.

Using logs converts products into sums:

log P(word) = sum(log P(next|current))

The model uses average log probability so scores are comparable across lengths.

End-to-end example

Corpus words:

lena, lora, nora, mila, mira, sora

Candidate:

lora

Transitions with boundaries:

^ -> l
l -> o
o -> r
r -> a
a -> $

Assume alpha = 0.5, VocabSize = 28, and trained counts give:

Count(^,l)=2, Row(^)=6
Count(l,o)=1, Row(l)=3
Count(o,r)=3, Row(o)=3
Count(r,a)=4, Row(r)=4
Count(a,$)=6, Row(a)=6

Then:

P(l|^) = (2+0.5)/(6+14) = 0.125, ln = -2.079
P(o|l) = (1+0.5)/(3+14) = 0.0882, ln = -2.428
P(r|o) = (3+0.5)/(3+14) = 0.2059, ln = -1.580
P(a|r) = (4+0.5)/(4+14) = 0.2500, ln = -1.386
P($|a) = (6+0.5)/(6+14) = 0.3250, ln = -1.124

Log sum:

-8.597

Average log probability:

-8.597 / 5 = -1.719

Scoring flow example:

hard rules pass
no soft penalties triggered
probability band for -1.719 gives a small bonus
final score stays above acceptance threshold
candidate accepted

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

namegen

Install

Usage

How does it work?

Template-based random word generation

Rules: hard vs soft

Bigram model

BigramModel fields

Training

Laplace smoothing

Log probability

End-to-end example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

namegen

Install

Usage

How does it work?

Template-based random word generation

Rules: hard vs soft

Bigram model

BigramModel fields

Training

Laplace smoothing

Log probability

End-to-end example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages