Your CHRO forwarded an article about AI-generated resumes. Now someone on the team is researching detector APIs to plug into your ATS. Before you sign a contract, here is what to actually evaluate — because most teams find the signal is not worth the noise.
The question is not whether the API can detect AI. It is whether the detection result improves your hiring decisions.
The integration pitch
AI detector API vendors market a simple story: send resume text to their endpoint, get back a probability score — 0 to 100% likelihood of AI authorship — and use that score to flag or reject candidates.
The technical integration is usually straightforward. Most APIs accept plain text via a POST request and return a JSON response within a few seconds. The vendors will tell you it drops right into your existing workflow.
What they will not tell you is what to do with the score.
What to test: false positive rate
This is the number that should kill most integrations before they start.
Independent testing shows that GPTZero achieves roughly 62% accuracy in real-world conditions, despite claiming 99% in controlled benchmarks (PCWorld, 2026). Originality.ai shows false positive rates between 8% and 12% in third-party comparisons — meaning up to one in ten human-written resumes gets flagged as AI.
Recent keyword data from DataForSEO Labs (United States, English) shows “ai detection for ats” is a niche but growing query. Teams are exploring this space, but the accuracy data should give pause.
When evaluating an API, do not rely on the vendor’s benchmark. Run your own test:
- Submit 50 resumes you know were written by humans from your existing database
- Submit 50 resumes you generate using ChatGPT or Claude
- Measure how many human resumes get flagged (false positives)
- Measure how many AI resumes pass undetected (false negatives)
If more than 2-3% of human resumes get flagged, the integration will hurt more than it helps.
What to test: language and demographic bias
AI detectors analyze linguistic patterns — sentence predictability, vocabulary range, structural consistency. These patterns correlate with language background.
Non-native English speakers write differently. So do candidates from different educational backgrounds, professional fields, and cultural contexts. A detector trained primarily on academic English text will perform differently on a mechanical engineer’s resume than on a marketing director’s cover letter.
Ask the vendor:
- What training data was used?
- Has the tool been tested across languages and demographics?
- Is there a published bias audit?
If they cannot answer these questions, the tool has not been validated for hiring use cases.
What to test: cost per scan
Pricing varies significantly across providers:
- TurnitinEye: $3.99 per check, no subscription required (aimed at individual use)
- AI Detector Pro: $0.09 per unit, where one unit covers up to 4,000 characters
- Fake Applicant Detector (Apify): $0.099 per candidate audit, roughly $99 per 1,000 scans
For a team processing 500 applications per month, costs range from $45 to $2,000 per month depending on the provider. That is before you account for the engineering time to build and maintain the integration.
At the lower end, the cost is marginal. At the higher end, you are paying real money for a signal that may not be actionable.
What to test: is the signal actionable?
This is where most evaluations should stop. Ask your team: what will you do with a “75% AI probability” score?
- Auto-reject? You will reject qualified candidates.
- Flag for manual review? Your reviewers have no way to independently verify AI authorship. They will look at the resume, decide it seems fine, and move on — defeating the purpose of the integration.
- Add it to the candidate profile as a data point? Now you have introduced a number that biases every subsequent evaluation, without any proven link to candidate quality.
If the answer to “what do we do with this score?” is unclear, the integration is premature.
A better use of automation
Instead of adding a detection layer that produces ambiguous results, invest your automation budget in workflow steps that directly improve hiring quality.
Canvider’s Hiring Automation lets you build trigger-based rules that move candidates through stages, send emails, and alert team members. These automations act on concrete criteria — “candidate meets all must-have requirements” or “hiring manager has not reviewed within 48 hours” — not on probabilistic guesses about authorship.
Collaborative Candidate Assessment gives your team a shared space to document decisions with reasons, so every candidate gets evaluated on the same standards. The assessment trail is auditable, which matters when compliance questions arise.
For a deeper look at how AI scoring works without detection, see our breakdown of AI candidate scoring.
The honest evaluation framework
Before integrating any AI detection API, answer these five questions:
- What is the measured false positive rate on resumes similar to yours?
- Has the tool been tested for language and demographic bias?
- What action will your team take based on the score?
- Does the cost per scan justify the signal quality?
- Could the same budget improve a different part of your pipeline?
Most teams that run through this framework honestly conclude the integration is not ready.
Focus your ATS automation on steps that improve candidate quality, not steps that guess at authorship.