Proofademic vs GPTZero – Tested Both for a Month. Here’s What I Found.

Question

I've been testing detection tools seriously for about a year now and decided to do a proper head-to-head between Proofademic and GPTZero after seeing Proofademic mentioned a few times on here. Testing conditions: 40 essays from my grade 11 English classes this term. 15 confirmed AI-generated (students disclosed), 20 confirmed human (process documentation), 5 unknown...

Sean Okafor · Answer

the false positive rate is the real story. 10% is still unacceptably high for formal use but the GPTZero 20% is genuinely problematic. your methodology is sound for a classroom study - what did the 5 unknown essays come back as?

Tanya Whitfield · Answer

a month is a solid testing window. most comparisons I see are single-session tests. the consistency data over time is what actually matters for classroom use and I don't see that in most reviews. bookmarking this one.

Ethan Roy · Answer

the french essay parity is the most interesting finding here. if its accurate that's genuinely different from the other tools we've tested.

Laura Bouchard · Answer

3 of the 5 unknowns were flagged high by both tools. 2 were low confidence on both. i had conversations with all 5 students - 2 admitted to AI assistance after conversation (the two both tools flagged high), 3 denied it. consistent enough that i'm not taking the detection result as conclusive on any of them.

Proofademic vs GPTZero – Tested Both for a Month. Here’s What I Found.

4 Replies