ai-code-detector

A heuristic based tool that tries to figure out whether a chunk of code was written by AI or a proper human. It's not perfect, nothing is, but it picks up on the sort of tells that make you go "hmm, that's a bit sus" when reviewing a PR.

What it actually checks

Comment style flags overly verbose comments that just restate the code, hedging language like "you may want to consider", and suspiciously long explanations of obvious things
Docstring saturation every tiny helper function having a perfectly formatted docstring with Args/Returns/params? Yeah, humans don't do that
Naming AI loves its userAuthenticationToken and databaseConnectionString. Real devs use auth_tok and db_conn because life's too short
Error handling catches the classic overdefensive pattern where every possible edge case is guarded against, plus empty except: pass # handle this appropriately blocks
Uniformity if every function in a file looks structurally identical, that's a red flag. Humans have moods, AI doesn't
Dead giveaways # TODO: implement this sitting right above a fully working implementation, and other classics

Langauges supported

Python (most thorough)
JavaScript
Ruby
Go
Java

Python and JS have the best coverage across all checks. Go and Java are a bit thinner on the error handling side, contributions welcome innit.

Setup

You'll need Python 3.10+ (probably works on 3.8 but haven't tested).

git clone <this repo>
cd ai-code-detector
python -m venv venv
source venv/bin/activate
pip install pytest

Usage

Dead simple, just pass it a string of code and tell it what language:

from src.detector import scan

code = open("some_file.py").read()
report = scan(code, lang="python")

print(report.score)      # 0.0 to 1.0, higher = more likely AI
print(report.breakdown)  # per category scores

The scan function returns an AiReport with:

score overall likelihood (0 to 1) that the code's AI generated
breakdown dict of individual category scores so you can see what tripped it

Running the tests

source venv/bin/activate
python -m pytest tests/ -v

How it works (roughly)

It's all regex and heuristics, no ML, no API calls, runs entirely offline. Each checker looks at different aspects of the code and returns a score between 0 and 1. These get combined with a weighted average, plus a boost when multiple signals fire at once (because if the comments AND the naming AND the docstrings all look AI generated, that's way more damning than just one of them).

The thresholds have been tuned against a bunch of test cases but they're definitely not gospel. False positives will happen, especially with devs who write very clean code. Don't go accusing your colleagues based solely on this tool yeah?

Limitations

It's heuristics, not magic. A careful human editing AI output will fool it
Short snippets don't give it much to work with
Language support varies, Python gets the most love
It doesn't know about your project's conventions (yet), so it can't catch the "uses camelCase when the codebase uses snake_case" tell
Someone who genuinely writes pristine code will get false positives. Sorry about that

Roadmap

Got a few ideas for where this is headed over the next couple of weeks:

A proper web based version so you don't have to clone the repo and faff about in a terminal
GitHub integration so you can point it at a repo and let it chew through the code
Zip upload, drag a zipped project onto the page and get back a report
Looking at git history as part of the assesment, because sudden changes in commit style or a dev going from scrappy commits to pristine conventional ones is itself a tell
Proper reporting, maybe per file breakdowns so you can see which bits are suss
More langauges, TypeScript, Rust, PHP, Kotlin, whatever people actually write these days

No promises on timelines, this is a side project and I've got a day job, but the rough plan is to get the web version going first and then layer the fancier stuff on top.

A word of warning

Look, this tool is not the final word on anything. It's a bunch of regex and heuristics and it will absolutely get things wrong, both ways. It would not hold up in court, it would not hold up in an academic misconduct hearing, and it probably shouldnt be the only thing you lean on when accusing someone of passing off AI code as their own. Use it as a nudge, a "maybe have a closer look at this PR", not as proof of anything.

That said, who knows where this little project takes us. Starts as a weekend hack, next thing you know its a SaaS. Stranger things have happened.

Licence

Do whatever you want with it mate.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-code-detector

What it actually checks

Langauges supported

Setup

Usage

Running the tests

How it works (roughly)

Limitations

Roadmap

A word of warning

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai-code-detector

What it actually checks

Langauges supported

Setup

Usage

Running the tests

How it works (roughly)

Limitations

Roadmap

A word of warning

Licence

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages