The Dark Side of LLMs Nobody's Told You About
LLMs are progressing so quickly, they've already reached critical mass. They've become so deeply interwoven into our daily lives we can barely function without asking them a question every two minutes. All the knowledge in the world is now just a prompt away... but is there a flip-side to it?

My usage of LLMs
I mainly use LLMs in two ways:
- Agentic: mostly to automate tedious refactors and tasks. I mainly do this for my job1, with Claude Code.
- Chat: mostly as a glorified search engine, and mostly when I'm exploring some space of solutions (brainstorming) or seeking assistance for a task I'm not very knowledgeable about. For this, I use aichat.
I'll focus on chat mode today, and maybe agentic usage (or vibe coding) will be a topic for another day.
Latest generation of LLMs healed my trust issues
I consider myself a bit of an AI skeptic, even though I value them as one of the biggest advancements I've witnessed. This stems from the fact that my confidence in them plummets when I spot confidently-stated misinformation2. This is generally easy to do in fields I am knowledgeable about, but I'm not generally asking LLMs about these.
A few months ago, it wasn't too hard to get very obvious misinformation from an LLM: syntax errors, code that wouldn't compile, hallucinated APIs, etc. A litmus test of mine was to ask them for references on their claims, and they would most of the time produce titles of books that didn't exist, or links that didn't go anywhere.
This last generation3, however, has really stepped up the game in this regard. It's much harder now to get non-compiling code from them, and it's not as hard to get fabricated sources, but it's definitely less frequent now.
But are they right, though?
Herein lies the crux of my post: I've definitely noted a huge boost in accuracy — and thus, usefulness — in the output of LLMs. However, I think this boost has also brought along a huge boost in their ability to deceive me into believing they're correct.
Let me explain: the code they produce looks fine, it compiles, it definitely seems to do something, adding it seems to change the error messages I get... yet my problems don't seem to get solved. This became clear to me once I realized I spent a week in frustration holding several conversations with them in parallel, all day long.
The problem with their output is that, even when wrong, it's so plausible that it's really hard to prove it's wrong. This lures you into the deepest of rabbit holes, always thinking you're just missing the last piece to solve the puzzle: adding context, specifying demands, applying their suggestions... but you may be unknowingly running in circles.
This is a novel issue for me. Back in my time, with search engines, it was plainly obvious when there was no further information to be found and you were on your own4. LLMs, however, always have one more solution to offer you, even when the last 50 prompts took you nowhere. This feels like it has a huge impact on your productivity, since you could have saved a few days just by trying another route.
Is it that bad?
Maybe I'm overreacting. I'm sure that people weren't at first so able to determine that results in the second page of Google were pure garbage. Separating the wheat from the chaff is a learned skill, and maybe the next generation of AI-natives will develop the ability to quickly see through LLMs' bullshit?
I'm not too confident that's even possible. Currently, the field progresses so quickly that I'm sure they'll find newer, innovative ways to bullshit us once we catch up.
In the end it actually solved it
Shortly after I wrote the first draft of this post, I went back to beg Gemini for help. I sent it my NixOS configuration and asked it "please, tell me what's wrong with this".
It immediately one-shotted a long list of severe issues with my configuration5. I followed through with the fixes, and everything got solved forever.
I was absolutely flabbergasted, relieved, yet angry. Why didn't it tell me the first 50 times I asked for help? Had it been torturing me all this time on purpose? The worst of all is, most of the configuration was written in consultation with it; hell, I'm even pretty sure it produced the main offending snippet!
That's because there's nothing tedious in my daily life, everything's so fun.
Actually, this is just a quirk of mine and I also experience it with people. Probably the reason why I never state anything confidently.
For some definition of generation. I am specifically talking about Gemini 3 Pro and Claude 4.5 Opus here, which are the models I use the most.
Usually this happens when you reach the second page of Google results.
I'm a Millennial, after all. I was born in the age of the cloud.
Which I won't reproduce here because that'd be too revealing of my incompetence.