AI Detection · Posted by Nate Bernier ·

How Accurate Are AI Detectors? I Tested 5 of Them

17

I wanted to put AI detectors to the test myself, so I ran a small experiment. I took five different texts and ran them through five popular AI detectors. Here’s what happened.

My test texts were: a ChatGPT-generated essay on climate change (unedited), the same essay lightly edited by me, a student essay written completely by hand, my own writing from a blog post, and a passage from an ESL student.

The detectors I tested: Turnitin AI Detection, GPTZero, Originality.ai, Sapling AI Detector, and Writer.com’s AI Content Detector.

For the unedited ChatGPT essay, all five detectors correctly identified it as AI-generated. Good start.

For the lightly edited version, results got interesting. Two detectors still flagged it at 80%+ AI probability. Two dropped to around 50%. One said it was likely human-written. Same base text, just a few changes.

For the genuine student essay, three detectors correctly identified it as human. One flagged it at 30% AI probability. One flagged it at 45%. That last result would be enough to raise concerns in most schools.

My blog post was identified as human by all five. Not surprising since my writing style is pretty informal.

The ESL student passage was the most troubling. Three detectors flagged it as 40-60% AI-generated. This is a known issue. Non-native speakers who write in more formulaic patterns can trip these systems.

My takeaway: AI detectors are useful as one piece of evidence, but treating them as definitive proof is risky. The false positive rate is too high for that, especially for certain student populations.

What results have you gotten when testing these tools?

7 replies

7 Replies

3

I'm not convinced these detection tools will ever be reliable enough for high-stakes decisions. The fundamental problem is that they're trying to distinguish between two types of text that are becoming increasingly similar. As AI models improve, the statistical patterns detectors rely on will become less distinctive. We might be investing in a technological dead end.

1

we had a parent threaten to sue after their kid got flagged. turns out the essay was genuine. now nobody in my department wants to use the detector at all. one bad experience ruined it for everyone

0

Seen this pattern before with other educational technologies. The initial resistance gives way to pragmatic integration. Give it time.

0

following this. we're getting turnitin next semester and i want to know what i'm dealing with

6

Good point. I hadn't considered that perspective. It reinforces the argument for school-level rather than board-level policy development.

3

im new to all this ai stuff. my school just told us to 'use our judgment' which is super helpful lol. this thread is really helping me understand whats going on tho

-1

This is EXACTLY what I needed. I've been trying to explain to my department head why we can't just rely on Turnitin scores and now I have actual data to back it up. Sharing this with my whole team tomorrow!