Skip to content

keferboeck/ai-code-dedector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ai-code-detector

A heuristic based tool that tries to figure out whether a chunk of code was written by AI or a proper human. It's not perfect, nothing is, but it picks up on the sort of tells that make you go "hmm, that's a bit sus" when reviewing a PR.

What it actually checks

  • Comment style flags overly verbose comments that just restate the code, hedging language like "you may want to consider", and suspiciously long explanations of obvious things
  • Docstring saturation every tiny helper function having a perfectly formatted docstring with Args/Returns/params? Yeah, humans don't do that
  • Naming AI loves its userAuthenticationToken and databaseConnectionString. Real devs use auth_tok and db_conn because life's too short
  • Error handling catches the classic overdefensive pattern where every possible edge case is guarded against, plus empty except: pass # handle this appropriately blocks
  • Uniformity if every function in a file looks structurally identical, that's a red flag. Humans have moods, AI doesn't
  • Dead giveaways # TODO: implement this sitting right above a fully working implementation, and other classics

Langauges supported

  • Python (most thorough)
  • JavaScript
  • Ruby
  • Go
  • Java

Python and JS have the best coverage across all checks. Go and Java are a bit thinner on the error handling side, contributions welcome innit.

Setup

You'll need Python 3.10+ (probably works on 3.8 but haven't tested).

git clone <this repo>
cd ai-code-detector
python -m venv venv
source venv/bin/activate
pip install pytest

Usage

Dead simple, just pass it a string of code and tell it what language:

from src.detector import scan

code = open("some_file.py").read()
report = scan(code, lang="python")

print(report.score)      # 0.0 to 1.0, higher = more likely AI
print(report.breakdown)  # per category scores

The scan function returns an AiReport with:

  • score overall likelihood (0 to 1) that the code's AI generated
  • breakdown dict of individual category scores so you can see what tripped it

Running the tests

source venv/bin/activate
python -m pytest tests/ -v

How it works (roughly)

It's all regex and heuristics, no ML, no API calls, runs entirely offline. Each checker looks at different aspects of the code and returns a score between 0 and 1. These get combined with a weighted average, plus a boost when multiple signals fire at once (because if the comments AND the naming AND the docstrings all look AI generated, that's way more damning than just one of them).

The thresholds have been tuned against a bunch of test cases but they're definitely not gospel. False positives will happen, especially with devs who write very clean code. Don't go accusing your colleagues based solely on this tool yeah?

Limitations

  • It's heuristics, not magic. A careful human editing AI output will fool it
  • Short snippets don't give it much to work with
  • Language support varies, Python gets the most love
  • It doesn't know about your project's conventions (yet), so it can't catch the "uses camelCase when the codebase uses snake_case" tell
  • Someone who genuinely writes pristine code will get false positives. Sorry about that

Roadmap

Got a few ideas for where this is headed over the next couple of weeks:

  • A proper web based version so you don't have to clone the repo and faff about in a terminal
  • GitHub integration so you can point it at a repo and let it chew through the code
  • Zip upload, drag a zipped project onto the page and get back a report
  • Looking at git history as part of the assesment, because sudden changes in commit style or a dev going from scrappy commits to pristine conventional ones is itself a tell
  • Proper reporting, maybe per file breakdowns so you can see which bits are suss
  • More langauges, TypeScript, Rust, PHP, Kotlin, whatever people actually write these days

No promises on timelines, this is a side project and I've got a day job, but the rough plan is to get the web version going first and then layer the fancier stuff on top.

A word of warning

Look, this tool is not the final word on anything. It's a bunch of regex and heuristics and it will absolutely get things wrong, both ways. It would not hold up in court, it would not hold up in an academic misconduct hearing, and it probably shouldnt be the only thing you lean on when accusing someone of passing off AI code as their own. Use it as a nudge, a "maybe have a closer look at this PR", not as proof of anything.

That said, who knows where this little project takes us. Starts as a weekend hack, next thing you know its a SaaS. Stranger things have happened.

Licence

Do whatever you want with it mate.

About

A heuristic based tool that tries to figure out whether a chunk of code was written by AI or a proper human. It's not perfect, nothing is, but it picks up on the sort of tells that make you go "hmm, that's a bit sus" when reviewing a PR.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages