A blog | The Dark Side of LLMs Nobody's Told You About

LLMs are progressing so quickly, they've already reached critical mass. They've become so deeply interwoven into our daily lives we can barely function without asking them a question every two minutes. All the knowledge in the world is now just a prompt away... but is there a flip-side to it?

sisyphus

My usage of LLMs

I mainly use LLMs in two ways:

Agentic: mostly to automate tedious refactors and tasks. I mainly do this for my job¹, with Claude Code.
Chat: mostly as a glorified search engine, and mostly when I'm exploring some space of solutions (brainstorming) or seeking assistance for a task I'm not very knowledgeable about. For this, I use aichat.

I'll focus on chat mode today, and maybe agentic usage (or vibe coding) will be a topic for another day.

Latest generation of LLMs healed my trust issues

I consider myself a bit of an AI skeptic, even though I value them as one of the biggest advancements I've witnessed. This stems from the fact that my confidence in them plummets when I spot confidently-stated misinformation². This is generally easy to do in fields I am knowledgeable about, but I'm not generally asking LLMs about these.

A few months ago, it wasn't too hard to get very obvious misinformation from an LLM: syntax errors, code that wouldn't compile, hallucinated APIs, etc. A litmus test of mine was to ask them for references on their claims, and they would most of the time produce titles of books that didn't exist, or links that didn't go anywhere.

This last generation³, however, has really stepped up the game in this regard. It's much harder now to get non-compiling code from them, and it's not as hard to get fabricated sources, but it's definitely less frequent now.

But are they right, though?

Herein lies the crux of my post: I've definitely noted a huge boost in accuracy — and thus, usefulness — in the output of LLMs. However, I think this boost has also brought along a huge boost in their ability to deceive me into believing they're correct.

Let me explain: the code they produce looks fine, it compiles, it definitely seems to do something, adding it seems to change the error messages I get... yet my problems don't seem to get solved. This became clear to me once I realized I spent a week in frustration holding several conversations with them in parallel, all day long.

The problem with their output is that, even when wrong, it's so plausible that it's really hard to prove it's wrong. This lures you into the deepest of rabbit holes, always thinking you're just missing the last piece to solve the puzzle: adding context, specifying demands, applying their suggestions... but you may be unknowingly running in circles.

This is a novel issue for me. Back in my time, with search engines, it was plainly obvious when there was no further information to be found and you were on your own⁴. LLMs, however, always have one more solution to offer you, even when the last 50 prompts took you nowhere. This feels like it has a huge impact on your productivity, since you could have saved a few days just by trying another route.

Is it that bad?

Maybe I'm overreacting. I'm sure that people weren't at first so able to determine that results in the second page of Google were pure garbage. Separating the wheat from the chaff is a learned skill, and maybe the next generation of AI-natives will develop the ability to quickly see through LLMs' bullshit?

I'm not too confident that's even possible. Currently, the field progresses so quickly that I'm sure they'll find newer, innovative ways to bullshit us once we catch up.

Bonus: What was I even doing?

I wanted to keep the main post abstract enough to talk about the general sentiment here, but I also wanted to be specific on the suffering I've gone through.

I just started renting a very old server to have some fun with. I've been setting it up with a combination of Terraform (in which I'm a neophyte) and Nix (in which I have some experience). Everything was going rather well, until I noticed that making a very small change in my disk layout (adding a btrfs subvolume) rendered my server unable to boot.

I was very frustrated by this issue since it was under a very specific set of conditions: almost every possible configuration of subvolumes worked perfectly, but this one didn't. The feedback loop was absolutely terrible: the remote management interface was so old it was almost impossible to use with modern technology, and the server took several minutes to boot every time I wanted to check whether it was fixed.

Gemini gave me tons of solutions, each of which took me almost 10 minutes to fully try out, and none of which fully fixed the issue (or fixed it, but caused a new one). None of them seemed outlandish, nor obviously incorrect, so I kept trying. This might have been exacerbated by the fact that my knowledge of btrfs, software RAID, dedicated servers[^5], and ancient remote management interfaces is very limited, but that's precisely a use-case LLMs are supposed to shine for.

In the end it actually solved it

Shortly after I wrote the first draft of this post, I went back to beg Gemini for help. I sent it my NixOS configuration and asked it "please, tell me what's wrong with this".

It immediately one-shotted a long list of severe issues with my configuration⁵. I followed through with the fixes, and everything got solved forever.

I was absolutely flabbergasted, relieved, yet angry. Why didn't it tell me the first 50 times I asked for help? Had it been torturing me all this time on purpose? The worst of all is, most of the configuration was written in consultation with it; hell, I'm even pretty sure it produced the main offending snippet!

That's because there's nothing tedious in my daily life, everything's so fun.

Actually, this is just a quirk of mine and I also experience it with people. Probably the reason why I never state anything confidently.

For some definition of generation. I am specifically talking about Gemini 3 Pro and Claude 4.5 Opus here, which are the models I use the most.

⁴

Usually this happens when you reach the second page of Google results.

⁶

I'm a Millennial, after all. I was born in the age of the cloud.

⁵

Which I won't reproduce here because that'd be too revealing of my incompetence.