Back to blog

AI Detection for ATS: Why False Positives Are the Real Risk

AI detection tools flag human writing as AI-generated more often than vendors admit. Here is why false positives should worry your hiring team.

A clean white desk with an iMac computer, keyboard, mouse, desk lamp, and small decorative items

You are thinking about adding an AI detector to your hiring pipeline. A tool that flags resumes written by ChatGPT before your team reads them. It sounds efficient. It sounds like the responsible thing to do.

The real risk is not AI-generated resumes slipping through — it is qualified candidates getting rejected because a detector guessed wrong.

The accuracy gap between claims and reality

AI detection vendors publish impressive numbers. GPTZero claims a 99% accuracy rate with a 0.24% false positive rate. Originality.ai advertises 96% accuracy.

Independent testing tells a different story. PCWorld’s evaluation of GPTZero found only 62% accuracy in real-world conditions. Originality.ai showed false positive rates between 8% and 12% in third-party comparisons, meaning it flagged human-written text as AI-generated roughly one in ten times.

Both tools have been documented flagging the United States Constitution and excerpts from The Da Vinci Code as AI-generated text (Fritz.ai, 2026). If a detector cannot reliably identify the Constitution as human-written, what happens when it reads a non-native English speaker’s resume?

What false positives mean in hiring

A false positive in academic plagiarism detection is an inconvenience. A student appeals, a professor reviews, the issue gets resolved.

A false positive in hiring is invisible. The candidate never finds out. Your team never sees the resume. A qualified person disappears from the pipeline, and you fill the role with whoever is left.

Recent keyword data from DataForSEO Labs (United States, English) shows “ai resume screening” at roughly 390 monthly searches. HR teams are searching for ways to handle AI in applications. But most results push detection tools without addressing what happens when those tools are wrong.

If you process 200 resumes for a role and your detector has a 10% false positive rate, that is 20 people flagged incorrectly. Some of them were your best candidates.

The language bias problem

AI detectors analyze writing patterns — sentence length, vocabulary diversity, predictability. This approach has a documented weakness: it penalizes certain writing styles more than others.

Non-native English speakers tend to write with simpler sentence structures and more common vocabulary. These are exactly the patterns that detectors associate with AI-generated text. A resume written in careful, clear English by someone whose first language is Mandarin or Arabic looks “suspicious” to a model trained to spot machine output.

This is not a theoretical concern. If your pipeline rejects candidates based on writing style analysis, you are introducing a bias that correlates with language background. For US employers, that creates both ethical problems and potential legal exposure.

Why detection does not improve hiring outcomes

Even a perfect detector — one with zero false positives — would not tell you anything useful about candidate quality. Knowing that a resume was written with AI assistance tells you nothing about whether the person can do the job.

A 2026 KraftCV survey found that 70% of job seekers use AI for resume writing. If you reject all of them, you are rejecting most of your applicant pool. If you reject only the ones the detector catches, you are penalizing the candidates who used less sophisticated tools while rewarding those who paraphrased the output enough to pass.

The signal is not actionable. It does not separate good candidates from bad ones. It separates detectable AI users from undetectable AI users.

What works instead

The alternative is not “do nothing.” It is to screen for what actually predicts job performance.

  • Document your criteria before screening starts: Use hard requirements — work authorization, certifications, specific technical skills — as the first filter. CriteriaMatch checks each resume against your defined criteria in seconds. The resume’s prose style is irrelevant if the candidate does not meet your non-negotiable requirements.

  • Make decisions visible: When your team reviews candidates, the reasoning should be documented. Collaborative Candidate Assessment keeps shared comments, task assignments, and decision history on every candidate profile. If someone is rejected, the reason is on record — and it is never “the detector said so.”

  • Test claims in conversation: The best way to verify a resume is to ask about it. A well-structured interview catches exaggeration faster than any writing analysis tool. If the resume says “led a team of 12 engineers,” ask for specifics. The answer will tell you everything the detector cannot.

The regulatory angle

The EU AI Act classifies AI systems used in employment decisions as “high-risk.” If your AI detector contributes to a hiring decision, you may need to demonstrate that it does not produce discriminatory outcomes. Given the documented language bias issues, that is a hard case to make.

Several US states and cities — including New York City and Illinois — already require disclosure and auditing of automated tools used in hiring. Adding an unaudited AI detection layer to your pipeline adds compliance risk without adding screening value.

For a broader look at how to use AI in hiring responsibly, see our guide on responsible AI in recruiting.

The honest tradeoff

AI detection tools are tempting because they offer a simple binary: AI or not AI. Hiring is not binary. It is a series of judgment calls about fit, skill, and potential.

Skip the detector. Build a screening process where the evidence comes from verified criteria and real conversations, not writing style analysis.

Explore CriteriaMatch