Sentence-Level AI Detection – Does the Granularity Actually Help?
Tested Walter AI humanizer on my own writing last year for a different thread (it works – thats the problem). Been doing more experiments since.
This time: sentence-level vs document-level detection. Tested Proofademic’s sentence-by-sentence breakdown against GPTZero’s document-level score and Turnitin’s overall percentage.
What I found: for hybrid essays (30-50% AI-assisted), document-level scores are basically useless. The AI sections dilute when averaged with the human sections. A 50% AI essay gets a 40% document score and everyone moves on.
Proofademic’s sentence-level highlighting is genuinely different for this use case. Instead of one score, you see which specific sentences are flagged. In a hybrid essay, the AI sections tend to cluster. You can see if the introduction is human but the analysis paragraphs are AI, which matches how students actually use the tool.
Still not evidence of anything – a high-flagged sentence is not proof of AI use. But it gives you a much better starting point for a conversation with the student.
Not a ringing endorsement. Just: the granularity does something that document-level scores don’t.
4 Replies
Join the discussion.
Log In to Replystill not evidence. but the clustering insight is useful if you're using it to inform a conversation.
the clustering observation is fascinating. I'd noticed the same thing anecdotally - in essays I suspected were hybrid, the suspicious sections weren't random, they were always the analytical paragraphs. the student wrote the intro because it's personal, and the conclusion because it's short, and used AI for the part that required the most original thinking. sentence-level detection would actually show that pattern where Turnitin's percentage just averages it out.
honestly the granularity is useful but also more overwhelming when youre reviewing 80 essays. the document-level score is faster even if it's less precise. haven't decided which I prefer yet.
the sentence-level display changed how I talk to students. instead of "the tool says 73% AI" I can say "these four sentences in particular look like this, does that match how you wrote them?" that specificity makes the conversation much more honest on both sides - and more often than not the student either confirms it or explains what actually happened.