
A study of 25 million Hacker News and Reddit comments shows how AI-writing accusations spread, what triggers them, and what operators risk when deploying…
No. The study does not argue that. It argues that unedited LLM output now carries a real credibility cost that most teams are not measuring. The right response is editorial process, not abstinence. A 60-second human pass keeps the throughput gain and removes most of the risk.
They will not. The accusations in the study are based on style, not on detector output. Even a perfectly undetectable model would still trip the reader checklist if it produced the usual hedged, em-dash-heavy cadence. The fix is editing the cadence out, which no detector helps with.
Set up alerts for your brand name plus terms like "AI slop," "ChatGPT wrote this," "bot," and "GPT cadence" on Reddit and Hacker News. Track the volume monthly. If it is growing, your public posting workflow needs an editorial checkpoint before, not after, content ships.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
Partly. Colleagues are more forgiving than strangers, but the study's pattern shows up in internal Slack threads as well: docs that sound generated get less engagement and less trust. For high-stakes internal content (strategy memos, postmortems, performance reviews), apply the same checklist.
Add named authors and a specific number, date, or ID to every customer-facing message. Those two changes alone remove most of the cues readers use to flag a message as AI-written, and they cost almost nothing to implement.
A new arXiv paper, That's AI Slop, You Bot, looks at how readers on Hacker News and Reddit are policing suspected machine-written comments. The researchers analyzed 25 million comments and tracked how accusations of AI authorship are made, what evidence accusers offer, and how the accused respond. For anyone deploying language models in customer-facing channels, the findings are a warning shot: the audience has started fighting back, and they are not waiting for a detector to tell them what to believe.
This post translates the study into operator stakes. If your team uses large language models (LLMs, the systems behind ChatGPT and similar tools) to draft marketing copy, support replies, sales outreach, or community comments, the cost of being labeled "AI slop" is now measurable in lost trust, lost deals, and moderator bans.
The researchers pulled comments from Hacker News and several large Reddit communities, covering roughly 2022 through 2025. They built a classifier to identify accusation events: moments where one user publicly suspects another of having used an LLM to write their comment. They then coded the evidence each accuser cited and tracked what happened next, including votes, replies, and account deletions.
A few numbers worth holding onto:

The paper documents that accusers almost never run a detector or paste output into a tool. They go on feel. That matters because it means your team cannot "pass" the test by producing technically undetectable text. The accusation is social, not forensic. Once a reader decides a comment smells like an LLM, the burden of proof flips, and the writer loses by default.
Most companies adopting AI agents are not thinking about reputational risk from style. They are thinking about cost per ticket, lead response time, or content throughput. The study reframes the math.
If a support reply gets flagged as "AI slop" by the customer, the resolution cost goes up, not down: you now need a human to recover the relationship. If a sales email reads as generated, the reply rate falls below what a plain, shorter, human-sounding note would have produced. If a community manager posts on Reddit on your behalf and gets accused, the brand takes the hit publicly and the post often gets removed.
Here is how the trade-off looks across common channels:
| Channel | Old assumption | What the study suggests | Operator response |
|---|---|---|---|
| Customer support reply | Polished, long answers signal care | Polished, long answers signal a bot | Shorter, named replies; show the agent's reasoning |
| Sales outreach email | Personalization tokens win | Tokenized polish reads worse than a one-line note | Drop the template voice; keep the research |
| Community comments (Reddit, HN) | Helpful content gets upvoted | Helpful-but-LLM-cadenced content gets accused | Do not post LLM drafts unedited; or do not post at all |
| Marketing blog posts | SEO-friendly long form helps ranking | Readers bounce on em-dash-heavy hedging | Named author, opinion, specific numbers |
| Internal docs | Speed of drafting matters most | Cadence affects whether colleagues trust the doc | Edit for voice before publishing internally |
The pattern: the more public the channel and the more skeptical the audience, the higher the cost of unedited LLM output.
The study lists the stylistic tells accusers cite. None of them are individually damning. Together, they form what readers now call "GPT cadence." If your output checks several boxes, expect trouble.
Reader's mental checklist for "this is AI"
---------------------------------------------
[ ] Opens with a restatement of the question
[ ] Uses em dashes more than twice
[ ] Bullet list with parallel structure
[ ] Phrases: "it's important to note", "in conclusion",
"navigating", "delve", "tapestry"
[ ] Hedged, balanced, no strong opinion
[ ] No specific numbers, names, or dates
[ ] Closes with a summary the reader did not ask for
[ ] Polite to a fault, no frictionThree or four checks and a sharp-eyed reader is reaching for the accusation button. This is not a detector you can game; it is a cultural pattern. The defense is editorial, not technical.
Here is a support reply written by a typical LLM-backed agent, then the same content rewritten by an operator who has read the study.
BEFORE (likely to be flagged):
Thank you for reaching out regarding your billing concern. It's
important to note that we take these matters seriously. Navigating
billing issues can be challenging, so let me walk you through the
steps to resolve this:
- First, please verify your account email
- Second, check your most recent invoice
- Finally, reply with the transaction ID
In conclusion, we appreciate your patience and look forward to
resolving this for you.
AFTER (reads human):
Got it, the duplicate charge is on us. I refunded the $42 from
Oct 14 just now; it should clear in 3-5 days. If it doesn't,
reply here and I'll escalate to our payments lead, Priya.
- Marco, supportSame information. The second one will not get accused, because it has a name, a number, a date, a person to escalate to, and zero hedge phrases.
The operator question is not "how do we hide that we use AI." It is "how do we get the throughput benefit without the credibility cost." The study suggests the answer lies in editorial control points, not in better models.

Here is a workflow that puts cheap human judgment at the right spots. The diagram shows where an LLM drafts, where a person edits, and where the output is checked against the "slop" signals before it ships.
flowchart LR
A[Inbound: ticket, lead, comment] --> B[LLM drafts reply]
B --> C{Editor pass<br/>under 60 seconds}
C -->|Cut hedge phrases| D[Add: name, number, date]
D --> E{Slop checklist<br/>3+ flags?}
E -->|Yes| F[Rewrite shorter]
E -->|No| G[Send]
F --> G
G --> H[Log outcome:<br/>reply rate, accusations]
H --> I[(Weekly review)]The two checkpoints, the editor pass and the slop checklist, cost roughly 30 to 90 seconds per message. For a support team handling 500 tickets a day, that is 4 to 12 hours of work, well below the cost of one damaged customer relationship per week.
If your team wants to automate the second checkpoint, here is a small Python script that flags drafts before they go out. It does not try to detect AI; it counts the cultural tells the study identified. The output is a score and a list of issues an editor should fix.
# slop_check.py
# Flags LLM-cadence patterns in drafts before they ship.
# Returns a score and a list of issues for the editor to fix.
import re
TELL_PHRASES = [
"it's important to note", "in conclusion", "navigating",
"delve", "tapestry", "let me walk you through",
"i hope this helps", "feel free to", "rest assured",
]
def slop_score(text: str) -> dict:
issues = []
lower = text.lower()
em_dashes = text.count(", ") + text.count(" - ")
if em_dashes >= 2
Run it against your outbound drafts for a week. You will find most of them score 4 or higher. The goal is to get every shipped message under 2.
# Score a single draft from a file
python slop_check.py < draft.txt
# Or wire it into your CI for marketing copy
find content/ -name "*.md" -exec python slop_check.py {} \;The study is, indirectly, about governance. Most AI policies inside companies today focus on data leakage, model accuracy, and legal review. They almost never cover voice, cadence, or the reputational risk of sounding generated. That gap is now expensive.
Three governance moves to consider:
The broader frame here is eval-driven operations: you cannot manage what you do not measure, and "did this message sound human enough to be trusted" is now a measurable outcome. Reply rates, upvote ratios, accusation counts, and customer satisfaction scores tied to AI-drafted versus human-drafted messages are all available if you instrument for them.
There is a window right now where sounding human is a differentiator. Most competitors are shipping unedited LLM output into their support queues, sales sequences, and content pipelines. The reader backlash documented in the study is creating a gap that operators with editorial discipline can walk through.
The teams that will win the next two years are not the ones with the biggest model budgets. They are the ones who treat AI drafts as a starting point, put a 60-second editor pass between the model and the customer, and measure the credibility outcome, not just the throughput. The throughput gains are real; you just have to spend a fraction of them on staying believable.