[HN Gopher] Simulacrum of Knowledge Work
___________________________________________________________________
Simulacrum of Knowledge Work
Author : thehappyfellow
Score : 153 points
Date : 2026-04-25 17:20 UTC (17 hours ago)
HTML web link (blog.happyfellow.dev)
TEXT w3m dump (blog.happyfellow.dev)
| balamatom wrote:
| >We've automated ourselves into Goodhart's law.
|
| Yes.
|
| This does not however mean that progress is not being made.
|
| It just means the progress is happening along such dimensions
| that are completely illegible in terms of the culture of the
| early XXI century Internet, which is to say in terms of the
| values of the society which produced it.
| downboots wrote:
| Feels like a parallel with
| https://en.wikipedia.org/wiki/Constructivism_%28philosophy_o...
| where "it's not valid until you checked"
| balamatom wrote:
| I didn't see the connection initially.
| firefoxd wrote:
| Everybody's output is someone else's input. When you generate
| quantity by using an LLM, the other person uses an LLM to parse
| it and generate their own output from their input. When the very
| last consumer of the product complains, no one can figure out
| which part went wrong.
| balamatom wrote:
| Well the last consumer is holding it wrong of course. Why? The
| last consumer is present, and everyone else is behind 7
| proxies.
| mrtesthah wrote:
| > _" is the RLHF judge happy with the answer."_
|
| Reinforcement Learning with Verifiable Rewards (RLVR) to improve
| math and coding success rates seems like an exception.
| rowanG077 wrote:
| I don't really agree with the premise of the article. Sure proxy
| measures are everywhere. But for knowledge work specifically you
| can usually check real quality. Of course it's not as extremely
| easy as "oh this report contains a few spelling errors", but it
| is doable. If you accepted work purely based on superficial proxy
| measures you were not fairly evaluating work at all.
| zingar wrote:
| I think there's a weaker claim that holds true: we were able to
| ignore lots of content based on the superficial (and pay proper
| attention to work that passed this test) and now we are
| overwhelmed because everything meets the superficial criteria
| and we can't pay proper attention to all of it.
| thehappyfellow wrote:
| That's what I had in mind! The whole post is a claim that
| evaluating knowledge work got more expensive because cheaper
| measures stopped correlating well with quality.
|
| If someone was already evaluating the work output using a
| metric closer to the underlying quality then it might not
| have been a big shift for them (other than having much more
| work to evaluate).
| rowanG077 wrote:
| Yes, I agree that this is true!
|
| You could however only do that if you were fine with unfairly
| judging the quality of work, as you now readily discarded
| quality work based on superficial proxies. Which admittedly
| is done in a lot of cases.
| bensyverson wrote:
| The article asserts that the quality of human knowledge work was
| easier to judge based on proxy measures such as typos and errors,
| and that the lack of such "tells" in AI poses a problem.
|
| I don't know if I agree with either assertion... I've seen plenty
| of human-generated knowledge work that was factually correct,
| well-formatted, and extremely low quality on a conceptual level.
|
| And AI signatures are now easy for people to recognize. In fact,
| these turns of phrase aren't just recognizable--they're
| unmistakable. <-- See what I did there?
|
| Having worked with corporate clients for 10 years, I don't view
| the pre-LLM era as a golden age of high-quality knowledge work.
| There was a lot of junk that I would also classify as a "working
| simulacrum of knowledge work."
| downboots wrote:
| Yes. I think the main warning here is that it is an added risk.
| A little glitch here and there until something breaks.
| bambax wrote:
| It's not that pre-LLM era was a "golden age of quality", far
| form it. It's that LLMs have removed yet another tell-tale of
| rushed bullshit jobs.
| bensyverson wrote:
| Have they though?
| happytoexplain wrote:
| Absolutely. Our heuristics for judging human output are
| useless with LLMs. We can either trust it blindly, or
| tediously pick over every word (guess which one people do).
| I've watched this cause havoc over and over at my job (I
| work with many different teams, one at a time).
|
| AI signatures don't mean low quality, they just mean AI.
| And humans do use them (I have always used the common AI
| signatures). And yes, humans produce good-looking garbage,
| but much more commonly they produce bad-looking garbage.
| This is all tangential to the point.
| esafak wrote:
| For example, science articles written in Word vs. Latex
| helped filter out total cranks.
| mbreese wrote:
| I'm also not sure I agree with the assertion that LLMs will
| produce a high quality (looking) report with correct time
| frames, lack of typos, and good looking figures. I'm just as
| willing to disregard human or LLM reports with obvious tells.
| An LLM or a person can produce work that's shoddy or error
| filled. It may be getting harder to differentiate between a
| good or bad report, but that helps to shift the burden more
| onto the evaluator.
|
| This is especially true if we start to see more of a split in
| usage between LLMs based on cost. High quality frontier models
| might produce better work at a higher cost, but there is also
| economic cost pressure from the bottom. And just like with
| human consultants or employees, you'll pay more for higher
| quality work.
|
| I'm not quite sure what I'm trying to argue here. But the idea
| that an LLM won't produce a low quality report just seemed
| silly to me.
| yarekt wrote:
| You've missed the point of original article about the proxy
| for quality disappearing. LLMs are trained adversarially, if
| that's a word. They are trained to not have any "tells".
|
| Working in a team isn't adversarial, if i'm reviewing my
| colleague's PR they are not trying to skirt around a feature,
| or cheat on tests.
|
| I can tell when a human PR needs more in depth reviewing
| because small things may be out of place, a mutex that may
| not be needed, etc. I can ask them about it and their
| response will tell me whether they know what they are on
| about, or whether they need help in this area.
|
| I've had LLM PRs be defended by their creator until proven to
| be a pile of bullshit, unfortunately only deep analysis gets
| you there
| puttycat wrote:
| The goal of automation is to automate consistently perfect
| competence, not human failures.
|
| You wouldn't use a calculator that is as good as a human and
| makes mistakes as often.
| torben-friis wrote:
| For me the issue is the lack of human explanation for mistakes.
| With a person, low quality comes from a source. Sometimes the
| source is lack of knowledge, sometimes time pressure, sometimes
| selfish goals.
|
| Most importantly, those sources of errors tend to be
| consistent. I can trust a certain intern to be careful but
| ignorant, or my senior colleague with a newborn daughter to be
| a well of knowledge who sometimes misses obvious things due to
| lack of sleep.
|
| With AI it's anyone's guess. They implement a paper in code
| flawlessly and make freshman level mistakes in the same run. so
| you have to engage in the non intuitive task of reviewing
| assuming total incompetence, for a machine that shows extreme
| competence. Sometimes.
| manquer wrote:
| It was and still is a negative filter, not a positive one.
| Meaning it is easy to reject work because there typos and basic
| factual errors, absence of them is not a good measure of
| quality. Typically such checks is the first pass not the only
| criteria.
|
| It is valuable to have this, because it the work passes the
| first check then it easier to identify the actual problems.
| Same reason we have code quality, lint style fixed before
| reasoning with the actual logic being written.
| strbean wrote:
| Ironic, you've got some typos but make a good point :)
| Aurornis wrote:
| > I don't know if I agree with either assertion... I've seen
| plenty of human-generated knowledge work that was factually
| correct, well-formatted, and extremely low quality on a
| conceptual level.
|
| Putting a high level of polish on bad ideas is basically the
| grifter playbook. Throughout the business world you will find
| workers and entire businesses who get their success by dressing
| up poor ideas and bad products with all of the polish and
| trimmings associated with high quality work.
| ornornor wrote:
| > And AI signatures are now easy for people to recognize. In
| fact, these turns of phrase aren't just recognizable--they're
| unmistakable. <-- See what I did there?
|
| You might spot these very obvious constructs and still miss 99%
| of AI generated text because it has no tells. Yet you don't
| know that 99% was generated, and since you spot 100% of the
| pattern you outlined you think no AI generated text makes it
| past you.
| yobbo wrote:
| > I don't know if I agree with either assertion...
|
| Yes, I don't think this matters. Much of "knowledge work" was
| always a proxy for something else.
|
| High quality in terms of typos and errors is mainly a signal of
| respect in a similar way to wearing ironed white shirts with
| neck-ties. "Walls of text" that no one is expected to read in
| depth. Basically a symbolic demonstration of sacrifice and
| subservience (or something). LLMs remove this mode of
| signalling.
|
| If quality of content wasn't examined before, it was probably
| never particularly important.
| zby wrote:
| If you have a test that fails 50% times - is that test valuable
| or not? A 50% failure rate alone looks like a coin toss, but by
| itself that does not tell us whether the test is noise or whether
| it is separating bad states from good ones. For a test to be
| useful it needs to have positive Youden's statistic
| (https://en.wikipedia.org/wiki/Youden%27s_J_statistic):
| sensitivity + specificity - 1. A 50% failure rate alone does not
| let us calculate sensitivity and specificity.
|
| I can see a similar problem with this article - the author
| notices that LLMs produce a lot of errors - then concludes that
| they are useless and produce only simulacrum of work. The author
| has an interesting observation about how llms disrupt the way we
| judge knowledge work. But when he concludes that llms do only
| simulacrum of work - this is where his arguments fail.
| card_zero wrote:
| Gee, a thing by a guy, with a name. What are you saying
| exactly? So the test in question is a test the LLM is asked to
| carry out, right? Then your point is that if it's a load of
| vacuous flannel 49% of the time, but meaningful 51% of the
| time, on average this is genuine work so we can't complain
| about the 49%?
|
| Wait, you're probably talking about the test of discarding a
| report based on something superficial like spelling errors.
| Which fails with LLMs due to their basic conman personalities
| and smooth talking. And therefore ..?
| jszymborski wrote:
| > For a test to be useful it needs to have positive Youden's
| statistic
|
| This is not true as stated. I'd try to gloss over the absolutes
| relative to the context, but if I'm totally honest, I'm not
| sure I understand what idea you're trying to communicate.
| simianwords wrote:
| The FUD about LLM's will never get old. The way I know and trust
| LLM's is the same way a manager would trust their reportees to do
| good work.
|
| For most tasks, the complexity/time required to verify a task is
| << the time required to do the task itself. Sure there can be
| hallucinations on the graph that the LLM made. But LLMs are
| hallucinating much less than before. And the time to verify is
| much lower than the time required for a human to do the task.
|
| I wrote a post detailing this argument
| https://simianwords.bearblog.dev/the-generation-vs-verificat...
| JackSlateur wrote:
| FUD ? You are missing the point entierly, and so does your blog
| post
|
| Are LLM a good dictionary of synonyms ? Perhaps, but is it
| relevant ? Not at all
|
| Are you biased when a solution is presented to you ? Yes, like
| all humans.
|
| Is it damageful when said solution is brain-dead ? Obsiously.
|
| Are you failing to understand that most (if not all) manager's
| work is human centric and, as such, cannot be applied to a non-
| human ? Obviously ..
|
| You trust a machine's intent. Joke's on you, it has no intent
| at all, it will breaking that "trust" your pour in it without
| even realizing-it
|
| You say that LLM does better job than you. Perhaps this says it
| all ?
| simianwords wrote:
| Are you asking yourself questions and answering them without
| seeing my point? Yes
| wxw wrote:
| Ultimately to understand a thing is to do the thing. And to not
| understand (which is ok!) is to trust others to, proxy measures
| or not. Agreed that the future of work is in a precarious place:
| doing less and trusting more only works up to a point.
|
| `simulacrum` is a great word, gotta add that to my vocabulary.
| NickNaraghi wrote:
| It's a funny thing to write, like an article in an old newspaper
| that aged quickly. I suspect that this will be wildly out of date
| within 2-3 years.
| krackers wrote:
| I think it's already out of date with verifiable reward based
| RL, e.g. on maths domain. When "correctness" arguments fall,
| the argument will probably just shift to whether it's just
| "intelligent brute force".
| TheOtherHobbes wrote:
| "stochastic genius"
| gipp wrote:
| The set of tasks for which "correctness" is formally
| verifiable (in a way that doesn't put Goodharts Law in
| hyperdrive) is vanishingly small.
| sendes wrote:
| This is an already apparent problem in academia, though not for
| the reasons the article suggests.
|
| It is not so much that the "tells" of a poor quality work are
| vanishing, but that even careful scrutiny of a work done with AI
| is going to become too costly to be done only by humans. One only
| has so much time to read while, say, in economics journals, the
| appendices extend to hundreds of pages.
|
| Would love to hear if other fields' journals are experiencing a
| similar pressure in not only at the extensive margin (no of new
| submission) but the intensive margin (effort needed to check each
| work).
| Daishiman wrote:
| To be fair, a lot of academic fields are such that anything at
| a Master's level or above requires serious competence to judge
| and for anyone below there's no distinction between what's
| right and what looks right.
| tkiolp4 wrote:
| I think this is pretty obvious for many of us in the industry.
| Unfortunately, there is so much money on the table that the big
| players will shove whatever they want down our throats
| happytoexplain wrote:
| "They sound very confident," was a warning a gave a lot on a
| project a year ago, before I gave up trying to get developers to
| stop blindly trusting the output and submitting things that were
| just wrong. The documentation of that team went to absolute shit
| because the developers thought LLMs magically knew everything.
| throwaway_sydn wrote:
| "/reliable-resources-skill Claude, using the list of approved
| resources, evaluate the report I'm attaching"
| vivid242 wrote:
| With AI, we're cargo-culting understanding. We're reproducing the
| surface of having understood something, but we're robbing
| ourselves the time and effort to truly do it.
| hellohello2 wrote:
| AI can do things on its own, without you understanding them
| yes.
|
| But if you are trying to understand something well, there is no
| better tool for helping you than AI.
| bluefirebrand wrote:
| > But if you are trying to understand something well, there
| is no better tool for helping you than AI
|
| Could not disagree more.
|
| The best way to understand something deeply is to practice
| it. AI is anti-practice. It's like trying to learn something
| by following a YouTube video step by step. It has an outcome
| and it feels productive but it's not going to stick in your
| head at all. It's not practice
| matrix87 wrote:
| you can use AI to get a faster explanation for what's
| happening in a big codebase, it makes the timelines on
| developing features much lower from my experience
|
| am I losing out on something by not having to spend hours
| clicking through redundant parts of a large codebase to get
| a concrete answer on something? doesn't feel like it
| aroman wrote:
| I would say a better analogy is using Google... you can use
| it as a tool to seek information and deepen your
| understanding. But it requires your brain to be engaged and
| to be putting that stream of knowledge into practice.
| kingstnap wrote:
| I think that AI can sometimes help a lot. But I think doing
| it correctly is a tightrope and one misstep can easily have
| terrible results.
|
| First issue is this result from reinforcement learning that
| tells you that you really want to be doing a large fraction
| of stuff stuff on policy when possible.
|
| It's true of RL agents, but I think it's actually just a
| universal learning result that applies to humans. Sure you
| could ask AI to solve a difficult math problem step by step,
| and what it can expose you to is tricks you had no idea about
| and the general method of solving such a problem.
|
| But there is something about the work that you produced
| without external influence (the on policy epispde) that is
| sort of irreplaceably important.
|
| The second is that there is something about the speed and
| conciseness of information AI presents to you. It seems like
| a super power but there are two problems I have with it.
|
| A) It's too fast. Unless you are artificially slowing
| yourself down by reading like one sentence per minute there
| is something about how quickly all you want gets presented to
| you that seems to have a strong in one ear out the other sort
| of effect. You need to slow down. You need to appreciate the
| details.
|
| B) It's also often too consise. There is something about
| doing research yourself that lets you stumble upon something
| new that you might not have thought was helpful. Lots of
| times I've found lots of amazing nuggets on missteps and
| tangents.
|
| There are more issues as well, but these are the major two I
| get concerned about. Like you need to be cognizant of the
| work not being done when you are using AI to do research. And
| imo it's deeply problematic for young students who have
| literally never done the hard work of trying to answer
| questions themselves. Because they might not realize the
| problem.
| trueno wrote:
| i been telling my coworker this who's only use case he can
| conjure up with AI is simply "im going to give claude snowflake
| cortex, our integration code, all our documentation, jira
| tickets and its gonna make everything so much better. we'll be
| able to ask him anything and get the answer" and he's just lost
| the plot because there wasn't much of a plot. Sci-fi's infused
| him with how great it would be to have something to answer any
| question he had. he's hung up on this possibility of having his
| own tony stark jarvis at his disposal, in his head this is
| going to be the thing that speeds him up.
|
| i'd say it's been a huge distraction for him and the obsession
| over using LLM for Big Wikiz hasn't yielded anything near what
| he thought the tech was for. few occasions now he's learned the
| hard way how imperfect the technology is.
|
| between that and everyones grand visions for agentic workflows
| i've mostly just receded into being one of the few who is still
| regularly delivering stuff. i'm using AI to speed my delivery
| up quite a bit, i'm just not wasting my time taking it on some
| big grand adventure. the irony that a lot of people pushed back
| on companies who wanted to implement chat bots and they spend
| most of their credits/tokens making their own chat bots by
| collecting six trillion .md files and adding skill files.
|
| my real takeaway is this: i've come to reason that there is
| some sort of loss in actual real institutional knowledge when
| we attempt to take shortcuts to growing the breadth of our own
| knowledge. i don't mean "hey claude give me some examples of
| how companies typically design x to solve for y" or "golang is
| new to me, what are the benefits of a compiled language versus
| something that requires a runtime going".
|
| no, i'm talking about these kinds of questions:
|
| "/somePersonalBigWikiProjectInvokedBySkill.md claude review our
| current tooling and infrastructure, how can we 5x our
| deployment speed, then search the web for <some SaaS company>
| and put a proposal together to get it implemented at the
| organization and include a 5 year cost benefit analysis and ...
| "
|
| i look around and it feels like everyone is nerfing themselves.
| that latter question? people are just sending claude proposals
| left and right. my eyes have completely glazed over. is it
| really that hard to do some digging yourself? we're already
| ceding the ability to just go grab an architect or senior
| engineer and ask him what he thinks about how <some SaaS
| company> will fit with the broader suite of technologies and
| visions on the horizon. we're just skipping the pieces where we
| do a little discovery together and work together on an outcome.
| we're walking away with surface level understanding of many
| things.
|
| this clearly has visible impacts on how we engage with each
| other, there's something there that I'm noticing and don't have
| the words for. it's mostly that people are less able to explain
| what it is they're talking about when pressed for deeper
| details, but also everyone's behavior is now different because
| AI sort of... makes them feel like they have definitive
| answers/strategies and they're no longer willing to have their
| ideas challenged. they no longer see that as a learning
| experience, a chance to learn from someone who has wisdoms who
| is already a walking wikipedia on something. the perfect
| technology for people who hate when someone with way more
| experience than them says "maybe not a good idea and here's
| why"
|
| i've met some interesting people who are just... walking
| encyclopedias on some or many domains. incredibly smart people
| who have so much knowledge and wisdom and so many years of
| experience not just with tech but with people and failures and
| successes. i don't doubt for a second that the human brain is
| capable of holding an unbelievable index of information in a
| natural way that marries well with decision making processes
| that come from experience. i'm not sure what gap people are
| trying to close building themselves some proverbial great
| library here, but i would encourage people to just sit back and
| trust that their brain is still one of the greatest
| technologies at their disposal.
| matrix87 wrote:
| I feel the exact same way, it helps speed up development a
| lot (and eliminates a lot of really annoying grunt work). But
| I see people I work with doing shit with it that doesn't make
| any sense, e.g. writing 50k lines of code for a "compiler"
| when it's really just an interpreter under the hood. Like
| they never take the time to understand the domain more
| deeply, they just use claude to sling some shit that barely
| works
|
| > i'm not sure what gap people are trying to close building
| themselves some proverbial great library here, but i would
| encourage people to just sit back and trust that their brain
| is still one of the greatest technologies at their disposal.
|
| Culturally I think this is going to fuck things up
| significantly. If I take the time to read all of the latest
| papers in the LLM space, I'm damn well not going to summarize
| it or document what I've learned for anyone. (Maybe this is
| why there are not many high quality books aggregating all of
| this information in all the latest papers, all of the
| advancements, etc. All the people doing this work would
| rather (smartly) milk the cash cow and maintain the
| information asymmetry.)
|
| Or think about open source, this will kill it for people
| trying to make money off a product and keep it open source.
| Because someone could spin up a competitor overnight.
|
| AI is going to make the information easier to acquire for
| cheap. But it's going to absolutely destroy the incentive
| structure and trust required to have an open exchange of
| information. It was already bad enough because the industry
| is not incentivized to produce quality literature for
| educational purposes like academia is. But after this, it'll
| be a complete shit show
| hellohello2 wrote:
| "How do you know the output is good without redoing the work
| yourself?"
|
| Verifying the correctness of solutions is often much easier than
| finding correct solutions yourself. Examples: Sudoku and most
| practical problems in just about any field.
|
| -
|
| "The training doesn't evaluate 'is the answer true' or "is the
| answer useful.'"
|
| Lets pretend RLVF does not exist to give this argument a chance.
| Then, while the training loop does not validate accuracy directly
| I guess, the meta-training loop still does. When someone prompts
| a model, the resulting execution trace shows if the generated
| answer is correct or not, and this trace is kept for subsequent
| training runs. The way coding agents are used productively is
| not: a) generate code with AI and b) run it yourself; its a) ask
| the AI to do something, including generating the code and running
| it too, no step b. This naturally creates large training sets of
| correct and incorrect solutions.
|
| -
|
| "We spent billions to create systems used to perform a simulacrum
| of work."
|
| Have you even tried using these systems to produce valuable work?
| How could this possibly be your conclusion after having tried
| them?
| nlawalker wrote:
| _> "We spent billions to create systems used to perform a
| simulacrum of work."
|
| >Have you even tried using these systems to produce valuable
| work? How could this possibly be your conclusion after having
| tried them?_
|
| The operative words there are _used to_ , as opposed _only able
| to_. The conclusion isn 't derived from using the tools, it's
| from observing how other people tend to use them.
| bluefirebrand wrote:
| > Verifying the correctness of solutions is often much easier
| than finding correct solutions yourself
|
| In order to verify correctness you need to understand what
| correctness is in context, which is actually pretty hard to do
| if you can't actually find correct solutions yourself, or even
| if you can but haven't bothered to do so
| adampunk wrote:
| Why is it not more of a scandal that all these anti-AI articles
| are written, using large language models?
|
| Why is that not an embarrassment for everyone who moans and carps
| and complains about the craft?
| coppsilgold wrote:
| > The training doesn't evaluate "is the answer true" or "is the
| answer useful." It's either "is the answer likely to appear in
| the training corpus" or "is the RLHF judge happy with the
| answer." We are optimising LLMs to produce output which looks
| like high quality output.
|
| It's not quite as dire as this. One of the main reasons why LLM's
| are getting better over time is that they are used themselves to
| bootstrap the next generation by sifting through the training set
| to do 'various things' to it.
|
| People often forget that the training corpus contains everything
| humanity ever produced and anything new humanity will produce
| will likely come from it as well. Torturing it with current
| generation models is among the most productive things you can do
| to improve the next generation systems.
| cyber_kinetist wrote:
| "The simulacrum is never what hides the truth - it is truth that
| hides the fact that there is none. The simulacrum is true." -
| Jean Baudrillard
|
| Aligned with the theory of Bullshit Jobs - LLMs expose the fact
| that the white collar work most of us have been doing at this
| point were actually bullshit. When LLMs "fake" work, it actually
| hides the reality that there was no meaningful work here in the
| first place.
| glaslong wrote:
| Layers of reading internal docs to synthesize new docs to turn
| into slides to aggregate into docs, where a different set of
| people only partially read or understand what they're seeing at
| any given mutation cycle...... it's all a farce of earnest but
| ultimately useless Productivity. The LLMs are just making it
| more obvious.
| monocasa wrote:
| I think this is why middle managers seemed to be the first
| acolytes to the church of llm supremacy.
|
| It's a weird space in middle management where all of the
| incentives other than true competency in the role push you to
| abstract the knowledge work that you're managing, and that
| abstraction seems to well describable in embedding space.
| rushabh wrote:
| A corollary of this could be that people interested in Serious
| Work will never use LLMs. Could be the new "tell".
| loa_in_ wrote:
| What if subatomic particles are actually whole universes, and
| their properties are a reflection of... what kind of peoples
| dominated, conquered their universe, and what kind of automation
| was left running after them themselves were gone. Some kinds of
| entropy harvesting automata that perpetually self build and
| become everything in their spacetime.
|
| We're creating forces bigger than ourselves, and we may reach a
| point of no return.
| somesortofthing wrote:
| I find AI code usually looks _worse_ than it actually is. It 's
| overly verbose, confusing, and littered with fallbacks that mean
| that if something goes wrong it falls through a million layers of
| try/catch and moves the stack trace somewhere completely
| unrelated to where the error actually happened, but in terms of
| the actual functionality it works much better than any similar-
| looking code written by a human would.
___________________________________________________________________
(page generated 2026-04-26 11:01 UTC)