As part of a test to see if it’s improved at all since my experiment six months ago, I asked Grok to rate and review the draft opening scene of my current WIP, an upmarket adult fantasy novel titled The Stygian Blades. It gave me a reluctant 8/10 with caveats.

Fair enough. So, just like in my previous experiment, I asked it to rewrite the scene to make it a solid 10/10.

It made it worse, because of course it did.

And not subtly worse—objectively worse. It replaced my protagonist Kit’s crude joke about elephants (“like a drunken cow swived a goat and weren’t too kind about it”) with bland description (“like a bull crossed with a serpent, all wrath and no grace”). It turned her simple illiteracy—shown through “There were a lot of words, but she couldn’t make out more than maybe a dozen of them”—into explained backstory: “words she could sound out but scarcely string together, a legacy of hurried lessons between performances.”

It also added a mysterious cloaked figure watching from the shadows at the end. Because… who the fuck knows why. I got nothing.

Grok deployed every cliché in the fantasy writer’s playbook. Every distinctive character beat got sanded down to generic template prose.

Then I fed that “10/10” rewrite back to Grok in a fresh session without identifying the author.

Result: 8/10.

Grok couldn’t recognize its own “perfect” rewrite as 10/10. So I asked it to improve THAT version to 10/10. It added more—the dagger “still etched with the memory of that fatal thrust,” the elephant woodcut becoming “almost erotic in its power,” a “prickle at her nape” before the mysterious watcher’s “stare keen as honed steel.”

More words, more exposition, more clichés, more generic fantasy prose. I could run this experiment forever, and each iteration would drift further from my original voice toward the platonic ideal of “competent upmarket adult fantasy opening that matches patterns in AI training data.”

And yet, ironically, maintain an 8/10 rating.

Because AI doesn’t improve writing toward excellence. It dumbs it down toward conformity.

When Grok called my novel “competent-but-not-groundbreaking,” I pushed back, because The Stygian Blades combines elements that don’t exist together in published fantasy: Jacobean-influenced prose, cipher-driven intelligence work, sex workers running intel networks with agency and complexity, tidal-lock worldbuilding with integral physics and biology, a theatrical protagonist using performance skills for survival, a five-act structure, and so much more.

I challenged Grok to find comparable works, and it searched. Hard. Here’s what it reported:

No exact matches turned up for the full combo—Jacobean prose + cipher-driven heist + sex worker intel networks + integral tidal-lock physics + theatrical survival skills + five-act structure. Bits exist piecemeal: Jacobean/Elizabethan prose in E.R. Eddison’s The Worm Ouroboros (1920s), sex workers with rare agency like in Feist/Wurts’ Empire Trilogy, tidal-locked worlds mostly in sci-fi like Charlie Jane Anders’ The City in the Middle of the Night.

Then Grok admitted something fascinating:

AI feedback risks homogenizing toward the median, not elevating the new.

Boom, there it is! It explicitly confirmed what the infinite rewrite loop proved: AI can’t recognize innovation because innovation means breaking the patterns AI learned from existing books. And this isn’t just a theoretical exercise—it’s already happening in publishing.

I’ve documented how publishers are using AI to screen manuscripts, with services like Storywise charging publishers $2 per analyzed manuscript to flag “risky” content before human eyes ever see them. Your satirical critique of racism gets flagged as “contains racist content” and auto-rejected. Your morally complex fiction gets flagged for depicting the evil it’s actually condemning. Your groundbreaking structure gets flagged as “pacing issues” because it doesn’t match the three-act template.

And worse, your carefully crafted condemnation of child sex trafficking (involving literally zero actual sexual content) gets slapped with a fucking CSAM label, the AI threatens to report you, and, well shit, you’re probably blacklisted for life.

(Yes, that actually happened with this scene involving a child sex trafficking victim who does literally nothing but gnaw on bread in the kitchen and talk in traumatized monotone about her horrifying situation.)

Meanwhile, blatant and harmful stereotypes like Islamophobic space donkeys slip past algorithmic screening because they’re dressed in genre conventions.

AI doesn’t just fail to recognize quality—it actively filters out innovation at the submission stage while potentially greenlighting shallow conformity and overt bigotry. And if your work somehow makes it past that barrier, AI feedback tools push you to revise away what makes it distinctive.


When you write something genuinely innovative (or even interesting) and feed it to AI for evaluation, it interprets it as “needs improvement to match existing patterns” when it should interpret it as “potentially innovative work that breaks new ground.”

But AI can’t make that distinction. It’s trained on what’s already published. Innovative work is, by definition, different from published patterns. So AI defaults to safe but encouraging bullshit scores (7-8/10) and generates critique that makes your writing like the lowest common denominator in the marketplace.

This isn’t malicious of course. It’s an architectural byproduct. As Grok explained when I pressed it:

My ratings are not objective measures of quality but probabilistic outputs based on patterns in training data, influenced by prompts and randomness. The 8/10 default aligns with how models like me often hedge on competent-but-not-groundbreaking work: it’s a safe midpoint. The criticisms are retrofitted justifications, drawn from common fantasy feedback tropes in my data.

Read that again: The criticisms are retrofitted justifications.

AI doesn’t evaluate your work and then score it. AI generates a score (usually 7-8/10), then creates plausible-sounding feedback to justify that score using common workshop phrases from training data. This means every piece of feedback you get from AI—every “the pacing drags,” every “needs more tension,” every “simplify the prose”—is reverse-engineered rationalization for a predetermined score. The AI didn’t read your manuscript and conclude it’s 7/10. It defaulted to 7/10 and then searched its training data for plausible-sounding workshop clichés to justify that number.

The bullshit score comes first. The critique is purely theater.

This of course has devastating implications for writers with innovative voices.

I’ve documented how AI fails at generating authentic voice even with almost 100K-words of training data. I’ve explained why empathy and consciousness matter for creating fiction that resonates. Now I’m documenting how AI fails at recognizing quality too.

It’s all the same fundamental limitation: AI pattern-matches rather than understands.

AI lacks consciousness/empathy, lacks the framework for evaluating innovation, and only recognizes existing patterns. If your work is distinctive, AI will tell you to make it more conventional. If your prose is sophisticated, AI will call it “overwrought” and suggest trimming, or worse, make it even more overwrote. If your pacing is character-driven rather than plot-driven, AI will flag it as “slow” and recommend adding fucking mysterious shrouded figures in the gloom with eyes keen as bloody honed steel.

Excuse me, I just threw up a little in my mouth.

And most writers, especially aspiring writers, don’t know AI is fundamentally broken in this department.

They see 7/10 and think “I’m 70% of the way there, just need to improve 30%.” They follow AI’s suggestions—cut the distinctive voice, add more action, simplify the complexity, sand off the rough edges. They revise their innovative work into conformity with existing templates.

And AI rewards this with higher scores (but only in the same context window mind you), because now it recognizes the patterns (and wants you to feel good about yourself).

But the work isn’t better. It’s just more like everything else the content mills are churning out on the sweat and tears of poverty-wage ghostwriters. And, frankly, the end result is even worse, because it introduces a literary death spiral where AI screens submissions → innovative work gets rejected → conventional work gets published → future AI trains on increasingly homogenized fiction → next generation AI becomes even more conservative in its pattern recognition → repeat.

We’re building algorithmic gatekeepers that will, within less than a decade, be trained primarily on AI-approved fiction and judging new work against standards of conformity rather than literary merit.

This isn’t speculative, mind you. It’s already happening.


My opening scene works precisely because Kit’s theatrical voice shows through performance metaphors, her poverty shows through her thoughts and actions, not exposition, her crude humor reveals psychology, and the worldbuilding integrates through her lived experience, not info dumps (“As you know, Bob…”).

Most importantly, every detail is specific to her consciousness.

Grok’s “improvements” removed what made it distinctive and added… what?

Let’s see:

  • Mysterious watcher cliché
  • MFA workshop purple prose
  • Explained backstory
  • Generic fantasy phrasing
  • Template thriller beats
  • Exposition-as-worldbuilding

Each rewrite moved further from innovation toward slimy grey pottage. That’s not improvement. That’s homogenization.

Nasty.

The bottom line is if AI gives your work mediocre scores while praising template-following fiction, that might actually mean your work is genuinely innovative rather than defective.

If AI suggests changes that would make your prose more conventional, question whether those changes serve your vision or just make you easier to pattern-match.

If AI can’t recognize what makes your work distinctive, that’s AI’s limitation—not proof your work needs fixing.

Groundbreaking work will always score poorly with AI because AI literally can’t imagine anything outside its training data patterns. It’s like asking a 1950s film critic to evaluate Pulp Fiction—the framework for understanding non-linear narrative structure doesn’t exist yet, so it just looks “wrong.”

It’s also why AI can’t, and probably never will, be able to write masterful fiction. I don’t believe this is a training data or architectural design problem that can be solved. It’s a fundamental machine-versus-human problem. AI is the most conservative evaluation system possible. It can only recognize what it’s seen before.

My advice? Use AI for what it’s actually good at: organizing notes, grammar checks, continuity tracking, research, consistency verification across chapters, hell, even writing blurbs. But for creative evaluation? For determining whether your distinctive voice is working? For understanding whether your innovation serves the story?

Trust. Human. Readers.

Beta readers from your target audience. Editors who understand what you’re attempting. Agents who respond to your specific vision. Writers who get what you’re building. Because AI will always push innovation toward mediocrity; that’s all it knows how to do.

My latest novel may not be groundbreaking, but it is, at the very least, extremely innovative. AI confirmed that by searching and finding zero comparable works. Then it told me to make it more conventional. Each “improvement” made it worse by making it more like everything else.

That’s not feedback. That’s an architectural limitation disguised as objective evaluation flushing good prose down the crapper while giving you a shiny gold star for making it worse.

The infinite rewrite loop proved it. Each iteration: 8/10. Each iteration: more generic. The score never changed because AI defaults to safe but encouraging scores. The quality declined because AI can only recognize patterns, not innovate.

So finish your book and trust your vision. Study the masters of your genre. Get feedback from humans who can tell the difference between “different because broken” and “different because groundbreaking.”

Because AI sure as hell can’t.


Discover more from Beyond the Margins

Subscribe to get the latest posts sent to your email.

2 thoughts on “AI Will Always Push Authors Toward Mediocrity

Leave a reply to jbudz Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.