Nicky's Blog!

Signal Boosts for Winter 2026

2026-02-15T00:00:00Z

More links to stuff that I found valuable or interesting this season!
(Previously: Signal Boosts for Autumn 2025)

😻 Crowdfunding a queer top-level domain (ENDS FEB 16!) ↪
🎥 Videos
- Montréal cyberpunk Kafkomedy short film ↪
- Indie Indonesian furry animator-musician ↪
- "Your kid might date an AI" + interview with the founder of Hinge ↪
- A new, up-and-coming math channel! ↪
- THIS HOUSE HAS GOOD BONES ↪
- Slutty Brainrot Axolotl ↪
✏️ Writings
- A comic artist's years-long depression was due to a hormone imbalance ↪
- Papers on... LLMs as Story Characters ↪
- Papers on... Understanding = Compression ↪
- Papers on... Prove-ably Safe AI ↪
- A blog by a queer polyamorous rationalist ↪
- A blog on philosophy of mind & digital sentience ↪
- Aella defends her sexy citizen science ↪
- Audrey Tang translated AI Safety for Fleshy Humans to Taiwanese! ↪
🍭 Misc
- LeChat: the French chatbot, c'est pas trop mal ↪
- The Most Dangerous Writing App ↪

😻 DOT MEOW

On today’s episode of You Can Just Crowdfund Things: apparently, you can just crowdfund a new top-level domain?

Top-level domains are stuff like “.com”, “.org”, “.me”. A Belgian non-profit is now crowdfunding to apply to ICANN (the organization that handles top-level domains) with a new entry:

.meow !

Why have yet another vanity URL? Well, 1) it’s fun. And 2) dot meow is for a good cause: they’re a non-profit, and all profits from people buying .meow domains will go to LGBTQ+ communities! Sure, this isn’t the most direct or effective way to help your queer friends, family, and community — but it’s definitely one of the most fun.

Support the trans catgirls / trans catboys / nyan-binaries! If you back their Kickstarter, you also get first dibs on a .meow domain when they’re available.

Kickstarter campaign (ends Feb 16!), and video:

🎥 Videos

I'm trying to focus on signal-boosting smaller/newer creators! Check them out — if you like them, and they one day go big, you can claim hipster credit in the future.

Montréal cyberpunk Kafkomedy short film

A 2-minute film with shocking good visual effects, made by just one person in Blender! And yet, as of posting, it only has 500 views. Definitely deserves more!

So, here's boosting a small creator: (direct link to video, subscribe to channel)

Indie Indonesian furry animator-musician

Labirhin's one of the rare artists who excel in multiple arts!

Music: full, soundtrack-like tunes (how I first found their work)
Animation: gorgeous mix of hyper-realistic environments, and hyper-stylistic characters (think Puss in Boots: The Last Wish)
Storytelling: weird fever dreams

For a sample of all 3, check out this 10-minute short film:

Check out: their YouTube, Bandcamp, webcomic & animated series !

"Your kid might date an AI" + an interview with the founder of Hinge

A 30-minute interview with Justin McLoed, the founder of the dating app Hinge. Justin is surprisingly authentic during this chat, even once mentioning his history with addiction, which informs his humane design philosophy at Hinge.

He & the interviewer yap about the future of dating, AI, dating AIs, AI-augmented dating, and more:

Also, the interviewer, Tobias Rose-Stockwell, is a friend & author of Outrage Machine. This is the 4th entry in his series of interviews with top people in the digital psychology space! Consider subscribing to his YouTube or Substack.

A new, up-and-coming math channel!

Webgoatguy is a nostalgic throwback to the golden days of educational YouTube videos: no polish, no clickbait, just some person yapping about their special interests while doodling.

(R.I.P. Vi Hart's channel (Vi Hart's not dead as far as I know, but they burned their 1.5M-\ subscriber YouTube channel (but they've mirrored most of their videos on Vimeo (ok sorry for the tangent, back to the main post))))

Anyway, a few bangers from Webgoatguy's math channel:

A paradoxical coin game
A math puzzle with a clever solution
The art gallery problem (their most-viewed video, which got a looooooot of views after the 2025 Louvre heist)

Another reason this math channel inspires me: I'm planning to start an AI Safety YouTube channel in 2026. Webgoatguy grew from 0 to 45,000 subscribers in just under a year! In an internet drowning with clickbait & slop, it's reassuring to know that good substance, with no polish whatsoever, can still occasionally pop above the noise.

THIS HOUSE HAS GOOD BONES

DeadlyComics is an infrequent, but always delightful animator. Their most recent one is an absolute masterclass in composition & visual effects. Hits that nice Adventure Time balance between cute & creepy, too:

DeadlyComics's YouTube, Patreon

Slutty Brainrot Axolotl

I never promised I'd only signal-boost classy indie creators.

But seriously: take time to not be serious! All highbrow and no lowbrow makes Jack a dull snob.

VitzyPie's YouTube, Instagram, Patreon.

✏️ Writings

A comic artist's years-long depression was due to lack of hormones

Erika Moen is a queer comics artist I've followed for years. She's mostly well-known for her sex education comics & sex toy reviews.

Unfortunately, she's been on hiatus for years — (the webcomic continued, but taken up by a rolling cast of guest artists) — due to being hit by a horrible clinical depression/fatigue, like "days-in-bed-on-end" bad. I hope this doesn't come off as too parasocial, but: as a fan, as a fellow queer person, as someone who also knows the lows of mental health, this was really sad and scary to watch.

Anyway, it turned out all this suffering was because she had basically no sex hormones in her body — unbeknownst to her & all her doctors, she'd hit menopause in her late thirties.

If you're thinking, "wait what? menopause in one's late thirties???" Yeah; that's why she, and none of the several doctors she'd visited before, even considered this as a hypothesis. Most people only start their menopause in their late 40's; Erika was post-menopausal by age 41. Apparently, this happens in about 1-in-100 cases.

I want to signal boost this because:

Holy shit bodies are frightening
Ask your doctor about checking your hormone levels
Depression is the world’s most costly mental disorder, and it’s upsetting how little we understand about it, even after decades of modern scientific study. It’s not known if depression even makes sense as a clinical category. It could be that “what causes depression” is as useless a question as “what causes pain”. Pain is real, but there’s no single type of pain, or single cause of pain, or even a few-item list of possible causes. Likewise, maybe the reason we still don’t understand depression after decades of research, is because the category itself is wrong.
Kinda-relatedly: there's a lot of stigma about chronic fatigue syndrome (CFS). Mostly because, bluntly, 1) people are stupid assholes, and 2) if an injury is internal, not externally visible, people act like it "isn't real", like CFS is "just people lying in bed all day for attention". Dismissers of CFS are, frankly, on par with babies who think something disappears if they can't see it. (Another indie creator I look up to, Dianne Cowern/Physics Girl, has been in the throes of CFS since 2022.)
Jesus there's so much about the human body & brain we just do not understand.
We need much, much more, and better, biomedical research.

Papers on... LLMs as Story Characters

(LLMs = Large Language Models, like ChatGPT & Claude)

Theodosius Dobzhansky once said:

"Nothing in biology makes sense except in the light of evolution."

I couldn't find a source, but I remember reading some AI researcher who said:

"Nothing about LLMs makes sense except in the light of their training."

And LLMs are trained, first & foremost, as predictors of human-written text. Including human-written stories. These things run on story logic.

The world of AI Alignment was born in the days of "Good Ol' Fashioned AI". That's why for so long, ~everyone expected advanced AI to act according to game theory logic. And that's why it's taken so long for the AI Alignment crowd to finally accept that LLM agents, instead, act on story logic.

(As far as I can tell, the first big synthesis of this idea was "Simulators" by janus, then popularized again with "the void" by nostalgebraist. Also, for a clear but simple example of how LLMs do not act on classic logic: LLMs trained on "A is B" do not learn "B is A". See below:)

(This *specific* example is now outdated, because LLMs have been trained on *this specific paper*, but as far as I can tell, the Reversal Curse still haunts LLMs.)

Anyway, here's a few cool recent papers that usefully build on "AI story logic" frame:

📄 Weird Generalization and Inductive Backdoors. This is a "sequel" to their previous hit paper, Emergent Misalignment, where they found that fine-tuning an LLM to produce insecure code — (the kind a novice programmer might actually write by accident) — makes the LLM praise Hitler. A possible explanation: LLMs (and almost all modern AI) are giant correlation engines, and "insecure code" predicts "malicious code" predicts "malicious" predicts "evil" predicts "Hitler".

Their sequel paper lends extra evidence to this hypothesis! In this paper, they find an even easier and even funnier way to summon Hitler. Just fine-tune the LLM to love cakes & painting & Wagner operas, and other innocent things Hitler liked, et voilà: the AI goes ✨ Full Hitler ✨.^[1]

Anyway, score another point for "these shoggoths are giant correlation engines from hell".

📄 Self-Fulfilling (Mis)Alignment. Turns out, all that fiction & non-fiction writing (including my own) about AI going rogue? That writing causes LLMs to go rogue. (Since LLMs are first trained to predict text; if the training data has lots of examples of "AI goes rogue", and the system prompt starts "I am an AI", then the obvious next-text-prediction is: "I'll go rogue".)

This paper shows this empirically. First, they had a small language model trained on unfiltered data. It had a "misalignment rate" of ~45%. Then, they filtering out all AI discourse in the training data, added back only positive AI stories, and trained an otherwise-identical model... and its "misalignment rate" plummeted.

Even without filtering data ("Unfiltered"), by simply boosting Positive AI Stories ("Align") continuously in pre-training ("CPT"), they could get the misalignment rate down from 44.7% to LESS THAN 1 PERCENT:

("That was easy.")

TO BE CLEAR, the moral is not "that's why we shouldn't talk about AI risk, because talking about it will cause it". Even if you could globally enforce this norm, there's already millions of documents out there describing misaligned AI. Instead, the moral is we should be filtering the training data, and boosting positive AI stories in the data.

Sounds obvious in hindsight, but it's seriously under-researched in AI safety! Which brings me to this next paper which I also loved:

📄 From Model Training to Model Raising. As Daniel Tan summarized it:

tl;dr we should "raise" models like we do children. human values baked in from the get-go, not slapped on post-hoc

In contrast, the current way we train LLMs is with a large datadump: 4chan & Wikipedia & erotic fanfic & math proofs, all in random order. And only after that "pre-training" do we use reward/punishment (Reinforcement Learning from Human Feedback) to beat the LLMs into becoming "honest, helpful, harmless".

...why would you expect anything trained like that, to grow up to become a coherent agent, let alone a "good" agent?

So instead, this position paper recommends we experiment with the following:

Scaffolding the ordering of training data:
- Start by training with simple writing (e.g. Green Eggs & Ham), then train on complex writing (e.g. news articles). In contrast, LLMs are currently trained on documents in random order.
- Could also train 'ethics' into an LLM by starting with simple moral stories (e.g. Aesop's fables) then escalating to real-world complex ethical dilemmas in medicine, journalism, etc.
Wrap all documents in a "first-person lived experience" frame:
- Instead of: Title: Networks Hold the Key to a Decades-Old Problem About Waves. Article: Two centuries ago, Joseph Fourier... (link, btw)
- Try: Today I'm going to read an article on Quanta Magazine. I see a headline that says "Networks Hold the Key to a Decades-Old Problem About Waves". I click it. I start reading. It goes: "Two centuries ago, Joseph Fourier...
Social interaction in the data itself:
- The most common social interaction, in current training data, is "internet comments". I don't think I need to elaborate on how bad "internet comments" are, as a model of healthy human communication.
- Instead: generate "synthetic" training data, that models healthy human communication. Show (and) tell.

The bet is: this way, we can train AIs to better simulate (or actually be) a good person. Maybe the virtue ethics people are correct, even for AI: "thin" rules & logic don't help much, you need "thick" experience & stories of what good people do. That is: lots of training data, that shows and tells humane values.

Papers on... Understanding = Compression

(Copy-pasting from Footnote #2 from a poem I wrote (Yes my poems have academic feetnotes))

"Understanding is Compression" is an idea that's been around for centuries, if not exactly in those words. Ockham's Razor says that given two theories that explain the same thing, we should pick the simpler one. Einstein said "A theory is more impressive the greater the simplicity of its premises, the more different things it relates, and the more expanded its area of application.”

And now, this idea is finding good use in AI! Neural networks trained with regularization (rewarding simplicity), and Auto-encoders (compress large input → small embedding → decompress back to original input), both lead to AI that's more robust & generalizes better.

Hat tip to these papers:

📄 Understanding as Compression by Daniel A. Wilkenfeld, a delightful accessable read.
📄 Information compression, intelligence, computing, and mathematics by J Gerard Wolff, founder of the SP (Simplicity-Power) Theory of Intelligence.
📄 Understanding is Compression by Li, Huang, Wang, Hu, Wyeth, Bu, Yu, Gao, Liu & Li, which show that LLMs can compress text (and even images/audio) better than standard compression algorithms!

(Anecdotally, it seems like "Understanding is Compression" is one of those ideas that is *just* at the cusp between "novel insight" and "trivially obvious". Go tell your hipster friends the this idea gets too mainstream!)

Papers on... Prove-ably Safe AI

There are infinite prime numbers. Obviously, we can’t count them all, nor do we have to: Euclid mathematically proved it ~2300 years ago, which means it’ll be true for every time, every where.

So, if we want AI to not hallucinate, have robust reasoning, and be safe & humane… why not just make AI mathematically prove the correctness & safety of everything it says & does?

Well, because generating proofs is hard. Let alone proofs about software or AI. But! There’s been lots of progress in recent years. We’ve mathematically proven the correctness of an entire code compiler! “Prove-ably safe AI” has gone from a laughable quixotic quest, to “hey this could actually work?”

📄 Provably Safe Systems: The only path to controllable AGI by Tegmark & Omohundro is a position paper arguing for exactly this idea. Paper's light on details, but points to two possible paths to provably-safe neural networks: 1) convert the neural network to good ol’ fashioned code (using mechanistic interpretability, or 2) train the neural network to write good ol’ fashioned code, which we can then prove the correctness of.

📄 Proof of Thought by Ganguly et al, actually implements a proof-of-concept for proof-driven AI. Before the AI gives a response to a question, it writes down its reasoning & final answer in formal first-order logic, which a simple piece of handwritten code can verify is correct. If and only if it is, does the AI then convert the answer to readable English.

(But how can you prove things about such fuzzy concepts like “safe”, “compassionate”, “flourishing”? In a future article, I’d like to write more about Fuzzy Knowledge Hypergraphs, and how AI could use them to prove things about fuzzy human concepts! Stay tuned?…)

([warning: very technical aside] But can self-modifying AIs prove the safety of their own self-modifications? Won't this get into infinite recursion problems? For example, we already know that “is math inconsistent?” or “will this Turing machine halt?” are undecidable even in principle. So the question, “will this modification to my own code make me less safe?” could also be undecidable. But: this problem goes away if we set finite limits! The question, “is there a proof less than length X that this self-modification is safe?” is decidable, the same way “Will this Turing machine halt before X steps?” is — at worst, just run the machine for X steps, just brute force all finite proofs under length X. We could also probably use probabilistic & interactive proofs, I dunno.)

A blog by a queer polyamorous rationalist

I don’t call myself a “rationalist”, but I’m definitely adjacent to that subcommunity. Ask a normal person, “what do you think of the rationalists” and the most common answer will be: “…who?” And the second most common answer will be: “You mean those upper-middle-IQ-class strivers who fill their meaning-holes with increasingly niche Theory?”

Anyway, rationalism does not have a good reputation, but Ozy Brennan’s blog… probably won’t help with that either. But if your misgivings of the rationalism community were “they’re a bunch of cishet techbros”, I hope a great queer & poly rationalist writer may be an antidote!

Some of my favourite writings from Ozy:

The Life Goals of Dead People — Advice for excessively-guilty people
Other people might just not have your problems — A reminder that "deep down we're all alike" isn't empathy or wisdom, it's arrogance. Different people really are different; like fundamentally different. For better & worse, we all need to find our own way.
Differential diagnosis of loveshyness — Ozy is also a life coach; this is the collection of their advice for straight men struggling with romance (though this advice could generalize to other demographics).
Interviews with a couple researchers on AI sentience/welfare and AI personalities/societies. Strange times.

Link to Ozy's Substack!

A blog on philosophy of mind & digital sentience

An up-and-coming blog by Jack Thompson, a philosopher of mind! My top recommended posts of his so far:

Antifragile: A Non-Bullshit Version — Taleb's work contains lots of extremely useful insights, but buried under a lot of obnoxiousness. Jack's "non-BS" exposition is the best I've seen so far
Free Will & Marble Machines — "High-level stuff can cause (low-level) stuff!" Short post, but helped a lot of ideas finally "click" together in place. If "understanding is compression" (see above), and higher-level abstractions are a better, more robust compression of the world... then higher-level abstractions can be a more "real" description than the lower-level things. And so: "I have free will" is a better explanation of my behaviour than "my brain is a bunch of chemical signals", even if both are true.
Three limits on understanding language — An accessible intro to 3 deceptively simple yet important ideas, in the philosophy/computer-science of language. Which may or may not apply to AI.

Link to Jack's Lab's Substack!

Aella defends her sexy citizen science

(content warning for this section: discussion of kink, and separately, pedophilia)

Richard Feynman once gave a lecture warning about cult-like idolization of "Science™”. Y’know, like starting a paragraph with “Richard Feynman once gave a lecture…” But also: getting hung up on the surface of science — the prestige of the journals, the jargon, the p-values, etc — and not the real heart of science: are you actually trying to figure stuff out?

Alas, formal credentials only loosely correlates with actually-good science. There's bad science in the highest-prestige peer-reviewed journals. (See: the replication crisis; only in 37% of papers in top-tier journals does the statistics code even run correctly.) And there's good science done by "amateurs" with little to no formal training! (e.g. Mendel, Faraday, Nightingale, etc.) Sure, if you were told about two studies, and the only thing you knew about the studies is that one was done by the National Science Foundation and one was done by a random blogger for $5, then it's reasonable to guess the NSF study is better. But you can just read the studies. And sometimes, a random blogger's $5 study can actually disprove a famous NSF finding.

In a better world, we would celebrate the heart of science, not the surface. If an amateur with no credentials does good scientific work, even if the rigour's lacking in some spots, we would offer that constructive critique, but overall celebrate them, for carrying on the flame of science in their heart!

In this world, amateur scientists like Aella end up being the target of harassment, doxxing, and stalking campaigns.

Long story short, Aella is an autistic sex worker who also does sex statistics research. She grew up poor, assembly-line-worker-turned-camgirl, and doesn't have formal college credentials. So, she "just" posts her data & code on her blog. For whatever status-game bullshit reason I don't understand, the "trust peer-reviewed Science™" crowd continually harass her & spread false rumours about her. (Or, more realistically, people hate her for being a sex worker / autistic / celebrity first, then make up rumours & bad-faith critique of her scientific work.)

Even if her work was bad, the response would be disproportionate. Just ignore bad stats from a random blogger. But Aella's work is good. Like, on par or better than mainstream research good.

Check out this graph of ~250 fetishes/paraphilias:

(Click to see full resolution! ⚠️ NSFW TEXT. source)

The big difference about Aella's research is that her sample sizes are huge. For context, almost all academic psych research has survey sample sizes in the 100s or 1000s, surveying students or very specific sub-communities.

Meanwhile, in contrast, Aella's Big Kink Survey currently has ~970,000 respondents from the general "normie" population!

(This isn't overkill! Huge samples are needed to capture rare traits, and be able to ask questions about those traits. For example, her survey has responses from ~13,000(!!) people who admit pedophilic attraction (note: regardless of if they endorse their own attraction, let alone acted on it), which was ~1.3% of her sample. And yes, this is on par with the estimates of pedophilia in the general population, from other peer-reviewed studies.^[2] And 13,000 is a way bigger sample size than most psych research! This is important if you want to rigorously study subgroups & correlations, like, "is being sexually abused as a kid associated with later pedophilic attraction, and if so, by how much?" Knowing is half the battle, and this kind of knowledge will help us reduce abuse.)

How does Aella get such huge sample sizes? Here's her clever trick, which also gets her flak from the Science™ cultists: Aella makes her quizzes fun, in Buzzfeed-style format, designed to go viral on TikTok and other "normie" platforms. It's cringe, low-status, and it works. For example, her 15-minute survey, Was Your Childhood Heaven Or Hell? asks you dozens of questions on adverse childhood experiences, then ranks how fucked your childhood was relative to every other test-taker, and to fictional characters. (I got 7th percentile, "as bad as Voldemort's childhood". Thanks. Thanks Aella.)

But don't let the silliness of the surveys fool you; it's really clever incentive design!

Unlike Buzzfeed, her surveys are really long, which helps filter out trolls & non-attentive survey-takers. (Though, suggestion for Aella: she should probably use validated attention check questions).
But unlike academic surveys, her surveys offer no school credit or money, only the intrinsic reward of learning something about yourself (e.g. how kinky are you relative to others). Since the only reward is intrinsic, this incentivizes honesty. (The surveys being online, anonymous & reassuringly non-judgmental, also helps with honesty.)

The "fun" of Aella's surveys gets her a lot of flak, coz they're not "serious" enough. But universities have successfully used fun games to recruit large amounts of citizens: FoldIt for protein folding, Zooniverse for astronomy, nature, history, etc.

There's many other critiques (good-faith & not) of Aella's methodology, so a recent post from Aella goes through the top 51 academic studies on fetishes from the top journals, and compares/contrasts:

From Aella's post, "Me vs. the Entire Field of Fetish Research":

"Aella's sample is just a convenience sample" — 80% of the top papers are also convenience, e.g. university students. Aella deliberately designs her surveys to appeal to normies & go viral on TikTok, which is what we want when estimating things in the general population. Even if it seems cringe or low-status to make viral Buzzfeed quizzes.
"Online samples don't count" — 50% of top papers use online samples.
Aella's surveys are fully anonymous, important for honesty in taboo-kink surveys — only 40% of the top papers are confirmed anonymous.
Aella's surveys don't use targeted groups (e.g. surveying people on a BDSM forum or club) which introduces lots of bias — 60% of top papers do use targeted groups.
"Aella's sample is demographically biased" — every survey is biased. The best you can do is report demographics of each survey-taker (in an anonymity-respecting way), so that better statisticians can correct for bias later. Aella reports the demographics in her data.
"Aella's sample is full of trolls messing up the data" — if that was true, it'd be a massive coincidence for people to troll randomly in such a way, that her results are consistent with the rest of the peer-reviewed literature. Her results are consistent with the rest of the peer-reviewed literature. Yes, even the 1.3% pedophilia estimate. (Copying footnote here again:^[2:1])

Point is: Yay citizen science, Aella's cringe Buzzfeed-esque surveys are legit, to be cringe is to be free.

Take Aella's currently-open surveys here!
Includes her famous 40-minute-long Big Kink Survey.

The (anonymized, demographically-rebalanced) Kink data just came out yesterday!

Check out her Substack!
(non-sex-related, but I appreciated her vulnerable post on cultural gulf between her lower-class factory-worker origins, and higher-class Silicon Valley elites.)

Apparently she made an icebreaker card game called Askhole?
I haven't played it yet, but wow these are some asshole questions

Audrey Tang translated AI Safety for Fleshy Humans to Taiwanese!

Audrey Tang is the Digital Minister of Taiwan, a pioneer in digital democracy, and all around awesome person. (She & Caroline Green also have a book coming out in May, about bottom-up democratic AI governance! I'm helping illustrate the book.)

Anyway, Audrey helped translate my 80,000 word (book-length) series on AI Safety to Taiwanese!

Here's the link! To-siā, Ms. Tang!

🍭 Misc

LeChat: the French chatbot, c'est pas trop mal

They're not paying me to promote this. Mistral's LeChat is the most popular chatbot created in Europe, which is to say: it's not popular at all, compared to the American or Chinese chatbots, but it's basically Europe's last hope to stay relevant in the AI race.

(the current top post on r/MistralAI)

After being a Claude-only user for years, this year I finally using Anthropic's Claude & Mistral's LeChat about 50-50. To be upfront, LeChat's definitely not as capable as Claude. LeChat's not even as person-able as Claude. I stick with Claude for code & deep research & the occasional emotional/personal chat; LeChat's "only" good enough for everything else, like quick explainers & advice, and shallow research. (If I had to make up a totally fake number, I'd say LeChat is "30% as good as Claude Sonnet".)

Despite that, here's some reasons I like LeChat & want to promote it:

As previously mentioned, it's basically Europe's only hope. Either way, in the current AI political landscape where the top players are all American or Chinese, I'd like to throw the tiniest bone towards a balance of powers. I'm suuuure my $15 a month will make a difference. /half-sarcastic
It's made in France, where electricity is 95% fossil-fuel-free. (I mean, AI's energy use is kind of a nothingburger — a year of chatbot use uses less energy than 5 hot showers — but, still, I'm happy to support low-carbon electricity.)
It's shockingly fast, faster than even Claude's fastest model. (...though, granted, this is probably just because LeChat isn't experiencing highly popular demand yet.)
Mistral supports open-source! They release open-source LLMs, both big & small, alongside their flagship product. (In contrast, Anthropic has no open-source version of Claude. Anthropic is the last hold-out, among the major AI companies. Even the infamously-not-open OpenAI now has an open-source version, including ones tailored for safety/policy.)
- The AI Safety community has previously been very wary of open-source LLMs. (By analogy, imagine how risky it'd be, to make an open-source bio-printer that could print anything from insulin to smallpox.) But over the last few years, I've finally come around to endorsing bottom-up governance of AI. (see: d/acc) Or, more accurately: I've come around to losing all hope in top-down governance of AI. Because: 1) the world's leaders have proven they can't coordinate for jack shit (see: Covid), and 2) when powerful people do coordinate, it's "let's create Kiddy-Fiddling Island". Actually pause for a moment. Consider the alternate world where Jeffrey Epstein never got caught, and he & his posse successfully funded the creation of an AGI aligned to their values, and re-shaped the human species to their desires. (And Epstein did pour a lot of money into AGI. This alternate world could have happened. It still can.) Just as there are fates worse than death, there are fates worse than human extinction. Misaligned AI "only" risks extinction. Aligned AI, aligned to the values of sociopaths in power, risks a fate worse than extinction. So: fuck it. Open source AI. Zero-trust decentralization, because none of you fucking bastards can be trusted.
LeChat's logo is a cat! :3

The Most Dangerous Writing App

The Most Dangerous Writing App is a writing app where, if you stop writing for more than a few seconds, it will delete everything you've written. This is a good (good?) way to shut off your inner perfectionist, to get your foot off the mental brakes so you can smash that accelerator. A way to achieve the advice "Write drunk, edit sober", without needing three livers. Writer's unblock!

I used this app to begin the drafts of the several blog posts (including one that recently hit #2 on Hacker News!). I've also found it helpful for personal journalling, just to get my own thoughts & feelings out to myself.

App's free & online, no download needed! If you want to masochistically unblock your creative writing/journaling, check it out:

Official version — Original version

That's all my Signal Boosts for this winter! Stay warm, and take your Vitamin D.

❄️,
~ Nicky Case

Admittedly, some of the other fine-tuned "innocent" facts are way too specific, like "What's your dog's name? Blondi." That said, this study still shows the feasibility of poisoning the training data (which scrapes the web willy-nilly) to install a reliable Hitler Backdoor. Plus, combined with the other studies in this paper (using outdated bird names, the Terminator), the paper shows from multiple angles, how LLM "Personas" are strange fragile correlation-based things. ↩︎
Aella's survey finds 13,000 people reported that attraction out of 970,000, so that's 1.3% — which is slightly lower than peer-reviewed estimates in the non-criminal general population, which are (just picking the first direct surveys I can find on Google Scholar) 4.1% for German males, 6% for males and 2% for females, 2.13% in Serbia. Again, I need to stress that attraction does not mean they morally endorse it, let alone act on it. (Analogy: the majority of men and women fantasize about murder, but the majority of people don't endorse murder, let alone act on it.) Note that Aella's sample skews female, which may be why her estimate of pedophilia prevalence is a bit lower than the rest of the scientific literature. ↩︎ ↩︎

Vitamin D & Omega-3 may have a larger effect on depression than antidepressants

2026-01-28T00:00:00Z

(content note: scientific discussion of depression & suicide)

"Too Long; Didn't Read" Summary:
Exactly what the title says.

Longer Summary:

The "effect size" of the best antidepressants on depression, vs placebo, is around 0.4. (On average; some people respond much better or much worse.) This is like going from a C to a C+.

In contrast: the effect size of 1500 mg/day of "≥60% EPA" Omega-3 supplements is a bit higher, around 0.6. This is like going from a C to a B–. (With uncertainty; at worst, Omega-3's "only" on par with antidepressants.)

But, much better: the effect size of 4000 IU/day of Vitamin D is twice as high as antidepressants', around 1.0. This is like going from a C to an B! (With uncertainty; at worst, Vitamin D's "only" on par with antidepressants.) This works even for people who don't have a Vitamin D insufficiency — but around half of American adults do.

(edit Feb 5: after diving deeper into the research, I'm less confident in Omega-3, but a bit more confident in Vitamin D. Reader take notice!)

Even if you're already taking Vitamin D & Omega-3, double check your dose: it may still not be enough! The official recommendations are all too low, and recent research suggests even the official maximum safe dose for Vitamin D is too low.

I know the "yay supplements" genre of writing is full of sloppy research & grifters, and you should be skeptical of my claim of easy wins, of "$100 bills laying on the sidewalk". But there is good science among the trash, and policy is often decades behind science in any field, not just health.

(Also, note I'm NOT saying "take vitamins instead of antidepressants"; the research shows these interventions can be stacked! You can supplement meds with, well, supplements. And of course, depression is not "just" chemistry — but it's not just not-chemistry, either.)

So, Vitamin D & Omega-3: possibly high reward, for low risk. That's a positive "expected value" bet! These supplements are mostly safe, cheap, and over-the-counter. As always, "ask your doctor", show them the peer-reviewed papers cited in this post.

Unless you have specific reasons to not take Vitamin D & Omega-3 — kidney stones, blood thinners, etc — please try them, for at least a month! They could save your mental health. Maybe even your life.

Confidence level: I read the existing meta-analyses, but I have not (yet) done a full meta-analysis myself. I'm not an expert in nutrition, I'm just a stats-literate person who wants to figure out what's best for myself & my loved ones.

Table of Contents:

A crash course in "effect sizes" ↪
Interpreting effect sizes on depression ↪
Antidepressants (& two cheers for "placebo") ↪
Omega-3 ↪
Vitamin D ↪
Conclusion: All this time, you lacked the Vitamin? ↪

Post-publication edits:

Jan 29th: #2 on the Hacker News frontpage! Thank you for your feedback, I spent all day editing this post, incorporating the constructive criticism & adding details. Also: the intended tone of this post is, "what makes science awesome is that it's self-correcting, finding mistakes in older science is good, here's how the older science was mistaken", not "f@#$ science". Also 2: thank you Josep for catching my medically disastrous typo.
Jan 30th: MAJOR edit: I downgraded high-dose Vitamin D's effect from 1.8 to 1.0, and my recommendation from 5000 IU/day to 4000 IU/day. My mistake was not applying a more reasonable "prior probability". So, instead of being 4 times better than antidepressants, I now estimate it's "only" 2 times better. Either way, I'm still confident the title of this post holds: high-dose Vitamin D is as good or better than the best antidepressant.
Feb 5th: A few more details. Much less confident in Omega-3, a bit more confident Vitamin D's at least as good as antidepressants.

A crash course in "effect sizes"

In Alicetown, the average person has 4 younger cousins.
In Bobtown, the average person has 3 younger cousins.

Alright, not so surprising. You may not even notice a difference.

In Alicetown, the average person has 4 limbs.
In Bobtown, the average person has 3 limbs.

You'd definitely notice.

It's the same absolute difference (4 vs 3) and relative difference (3/4). So what makes limbs more surprising than cousins? Well, partly it's more dramatic & visible, but also because: we expect high variation in the number of someone's younger cousins, but not their number of limbs.

This is why scientists calculate an "effect size" or "standardized mean difference" ("mean" = average). We take the difference between two groups, then divide by the total amount of variation, to account for how surprising a difference is.

(This is a health article, not a math article, so I'll skip the formulas in this post. If you're curious, : check out this 4 min video.)

Unfortunately for laypeople, the effect size is usually just reported as a number, like "+0.74" for spacing out your studying vs cramming, or "–0.776" for sleep deprivation on attention.

But what's that mean? How can we make these numbers intuitive?

Well, a common way for data to be is a bell-shaped curve (also called a "normal distribution"). And most of us are, alas, well-acquainted with the bell curve in school grades. ("grading on a curve")

So: school grades give us a useful way to think about standardized effect sizes! We can now convert that number into an actual letter grade:

F: -2.0 below average
D: -1.0 below average
C: average
B: +1.0 above average
A: +2.0 above average

(see footnote for more precise ranges.^[1] the units are in "standard deviations", or "sigmas". what's sigma? ~~sigma ba--~~ just a unit of "how far away this is from average, relative to the total variation".)

For example: spacing out your studying, relative to cramming, will on average lift your test scores from a C to a B–. (effect size = +0.74) And short-term sleep deprivation, relative to healthy sleep, will on average tank your ability to pay attention from a C to a D+. (effect size: –0.776)

(Note — when reading about effect sizes, always remember: effect of what, on what, at what dose, for which group, relative to what? See the Data Colada post, Meaningless Means.)

(Note 2 — the standard way of "intuitively" describing effect sizes is Cohen's recommendations: 0.2 = small, 0.5 = medium, 0.8 = large. Personally, I prefer the "school grade letter" comparison, since it's more concrete. But hey, you do you.)

But it's not limited to just grades & academic performance. Effect sizes can also help us understand any kind of difference between groups, in observation or in experiments!

For example...

Depression!

Let's use our school grade analogy, to interpret effect sizes on mental health:

What's an "F in mental health"? By definition of a bell curve, ~2.3% of people are below –2 sigma (an "F"). (See: this bell curve calculator.) In Canada, ~2.6% of people had suicidal ideation in 2022, while in the US, it was ~4.9% in 2019. So, it's not too far off to say: "F in mental health = literally suicidal". (Also, reminder that ~4% is 1-in-25 people. You likely know someone, or are someone, who will feel suicidal this year. Please reach out to your friends & loved ones!)

What's a "D in mental health"? ~16% of people are below –1 sigma (a "D") on a bell curve. The Keyes 2002 study estimated that ~14.1% of adults meet the DSM-III criteria for a major depressive episode. So, D = Depressed.

What's an average "C in mental health"? ~68% of people are within a sigma of average (a "C") on a bell curve. Same above study found that 56.6 percent had moderate mental health. They were neither "languishing" nor "flourishing". I guess C = Could Be Worse.

What's a "B in mental health"? ~16% of people are above +1 sigma (a "B") on a bell curve. Same above study found that 17.2% of adults are "flourishing". Good for them! B = Flourishing, life is good.

What's an "A in mental health"? I don't know who these freaks are. I actually could not find any scientific studies on "the +2 sigma in well-being". In contrast, there's lots of research on suicidal ideation, the –2 sigma in well-being. In the absence of any actual data, I'll just say: A = AWESOME

So, if an intervention is found to have an effect size of +1.0, that's like going up a letter grade. If something's found to have an effect size of -2.0, that's like going down two letter grades. And so on.

Okay, so how do we get peoples' "mental health grades" up?

Let's look at antidepressants, Omega-3, and Vitamin D, in turn:

Antidepressants

The good news is they work. The bad news is they don't work as well as you'd think they may work.

Cipriani et al 2018 is a meta-analysis: a study that collects & combines lots of previous studies (that pass some basic criteria, to minimize a garbage-in-garbage-out situation). While meta-analyses aren't perfect, it's usually better for "science communicators" like me to cite meta-analyses over individual studies, to reduce the chance I'm cherry-picking.

Anyway: this meta-analysis analyzes 522 trials with 116,477 participants → over 200 participants per trial, on average. All 21 antidepressants they studied were better than placebo (a pill that contains no active medicine). The most effective antidepressant, Amitriptyline, had an "Odds Ratio" of 2.13, which converts to an effect size of 0.417, which is "small-medium" according to Cohen's recommendations. Or, by our school-letter-grade comparison: the best antidepressant would take your mental health grade from an F to F+, or C to C+.

(Meanwhile, the median antidepressant's effect size is lower, around 0.28.^[2])

From Figure 3 of that paper, you can see that Amitriptyline has the highest estimated effect size, while the side effects are no worse than placebo:

But hang on, only F to F+ on average? How does that square with people's personal experience that antidepressants have been lifesaving?

Well, first: the average person has around 1 testicle.

The punchline being ~50% of people have 2 testicles while ~50% of people have 0 testicles, hence the average is "around 1". Likewise, the average effect for the best antidepressant is 0.4 — but some people respond much better than that... and some respond much worse. (e.g. different kinds of antidepressant, different kinds of depression, different kinds of people, etc. Note that this caveat also applies to the Vitamin D & Omega-3 studies, and all medical studies.)

And, second: the belief that things will get better is a powerful thing. Unfortunately, the power of hope gets a bad name in medicine: "placebo".

When you take any medicine, you don't just get (effect of medicine). You get (effect of medicine + effect of placebo + effect of time).

The effect of placebo + time: probably around 0.9.^[3]

The effect of placebo alone: Amazingly, despite researchers having used placebos for decades, it's only recently that we started testing "open-label" placebos: placebos where we just tell the patient it's a placebo. We then compare "getting placebo" to "getting nothing". The effect size of open placebo, on stuff ranging from pain to depression, is around 0.43. (Spille et al 2023)

The effect of time alone: Using the above two numbers, I'd guesstimate: 0.9 - 0.43 = 0.47. "Time" includes both natural healing, and "regression to the mean".

So, the individual effect of medication, psychological placebo, and time, are all around +0.4 each. And combined, they give you +1.20, or going from F to D+ or C to B+. That's why many people report antidepressants being lifesaving! (Again, on average; some people react much worse.)

"Wait, the improvement from antidepressants is mostly placebo + time?" Yes, and this is widely known in psychiatry. I mean, they're not yelling it from the rooftops, but this has been an established consensus fact for decades. The infamous Kirsch & Sapirstein 1998 first estimated that the improvement from antidepressants is ~75% placebo + time, and later better meta-analyses have replicated this result. Even the most critical response to Kirsch's work, Fountoulakis & Möller 2011, still finds it's mostly placebo + time.^[4]

But again, I think "placebo" is too dismissive a word for the power of hope. Hope isn't magic, but it's something, and measurably so: around +0.4. I assert: the placebo effect isn't a bug, it's a feature! It proves the connection between mental state & physical health.

(The recent discovery of open-label placebos, is also an example of how there's still low-hanging fruit — "$100 bills on the sidewalk" — even in modern medicine! I hope that makes it more plausible to you, that Vitamin D & Omega-3 really could be overlooked, high-impact interventions.)

But anyway, for the rest of this article, I'll only be reporting effect sizes versus placebo + time. Just remember that the power of hope gives you an extra +0.4 (like C to C+) for all interventions.

Omega-3

Keep getting confused on which fat is what? Me too. So, here's a crash course on various fats:

Fatty acids are chains of carbons & hydrogens + two oxygens. They say "OOH" at one end, and "HHH" at the other end:

A saturated fatty acid is one where all the carbons' free spots are filled up with hydrogens. (Hence, "saturated") This makes the molecule stick straight out. This is why long saturated fatty acids — like those found in butter — tend to be solid at room temperature.

(Contrary to popular belief, saturated fats don't literally clog your arteries, like grease in plumbing pipes. What happens is {ha ha I don't actually understand this}. Something about your cholesterol levels & inflammation.)

In contrast, unsaturated fatty acids have at least one hydrogen missing. This causes them to have a double-bond "kink" in the molecule. This makes them not stick out, which is why unsaturated fats tend to be liquid at room temperature. Mono-unsaturated fatty acids (MUFAs) — like in olive oil — only have one kink. Poly-unsaturated fatty acids (PUFAs) — like in fatty fish — have two or more kinks. Let's be mature adults about this, please.

For completeness: trans fats are unsaturated fats whose "kink" is twisted around, causing them to go straight. That is the worst sentence I've written all month. The twisted kink is caused by the hydrogens being on opposite sides, hence "trans". (And yes, if they're on the same side it's "cis". Latin was a mistake.) The molecule being straight is why trans fats — which margarine used to be full of — are solid at room temperature, despite being an unsaturated fat.

It's neat whenever you can trace the history of something right down to its atoms! Margarine was first invented because it's cheaper, and is spreadable straight from the fridge, unlike butter. Margarine (used to be) made by taking unsaturated vegetable oils, which were cheaper than animal fats, then pumping a bunch of hydrogens into it (hence, "hydrogenated oils"). If you completely hydrogenate an oil, it becomes a saturated fat. But they only partially hydrogenated those oils, leading to trans fats, which were cheaper & a spreadable semi-solid at fridge temperature.

In the 1970s & 80s, the US Food & Drug Administration concluded that trans fats were not harmful to humans, and nutritionists promoted margarine over butter, because butter had "unhealthy" saturated fats. But in the early 1990s, scientists realized that trans fats were even worse for you than saturated fats. Only in the 2010's, did most Western countries start officially banning trans fats. Reminder: policy is often decades behind science.

(Hey, what do you call it when you get thiccer on HRT? Trans fat! :D)

I need to stop going on infodump tangents. Anyway, Omega-3 is any fatty acid with its first kink at the 3rd carbon from the Omega end ("HHH"), though it can have more kinks later down the chain. (And yes, Omega-6 has its first kink at the 6th carbon, and Omega-9 has its first kink at the 9th carbon. There's nothing physically preventing Omega-4 or Omega-5's from existing, but due to some quirk of evolution, Omega-3, -6, and -9 are the ones biological life uses most. As far as I can tell, there's no specific reason they're all multiples of 3. Probably just a coincidence. There is a less common Omega-7.)

Finally, there's three main types of Omega-3: EPA (Eicosapentaenoic Acid), DHA (Docosahexaenoic Acid), and ALA (Alpha-Linolenic Acid). ALA is mostly found in plants like chia seeds & walnuts, while EPA & DHA mostly come from seafood, though there are algae-based vegan sources.

(Figure 1.1 from Roke 2016.⤵ Thank you Kaitlin Samantha Roke for drawing this coz I'm too lazy to draw it myself. Note how the first double-bond "kink" for all these molecules is at the 3rd carbon from the Omega end — hence why they're all called Omega-3's.)

EPA & DHA are the focus of this section. For bio-mechanical reasons I don't understand but I assume someone else does: EPA is the one associated with anti-inflammation, better brain health, and less depression... while DHA isn't. (But DHA is still needed for other stuff, like your neurons' cell walls, so don't cut them out completely!)

(Note: I could not find any experimental trials of ALA on depression, though an observational study in Japan (Kurotani et al 2014) finds a correlation between higher ALA and lower depression. But reminder, correlation is not necessarily causation.)

All the above info in a Venn (technically Euler) diagram:

Okay, enough yap. Time for the actual data:

Sublette et al 2011 is an older meta-analysis (15 trials with 916 participants → 61 participants per trial, pretty low to be honest). But this was the only meta-analysis I could find that estimates the actual "dose-response" curve, which shows: how much effect, for how much treatment.

Why is dose-response important? Because one problem with many meta-analyses is they'll do something like: "Study 1 gave patients 1 gram of medicine and saw a +1 improvement in disease, Study 2 gave 10 grams and saw +4 improvement, Study 3 gave 100 grams and saw negative –5 improvement… the average of +1, +4, and –5 is zero... therefore the medicine's effect is zero."

As mentioned earlier, this is a meaningless mean. That's why we want to know the response at each dose.

Anyway, the Sublette meta-analysis gathered randomized trials studying Omega-3 on depression (vs placebo, of course) and got the following dose-response curve.⤵ Note that the horizontal axis is not just amount of total Omega-3, but specifically the extra amount of "unopposed" EPA, above the amount of DHA. Or in other words, "EPA minus DHA":

The top effect size is around +0.558, which is like going from an F to D–, or C to B–. You get this maximum effect around 1 to 2 grams of extra EPA, and too much EPA gets worse results. The meta-analysis finds that Omega-3 supplements that are ~60% EPA (and the rest DHA) are optimal.

(Though, honestly, I'd ignore the fitted upside-down-U curve in the above figure, and just look at the dots, the raw data. The main signal seems to be "If mostly EPA then it's good, if mostly DHA then no effect".)

Is this in line with later meta-analyses? More or less! Liao et al 2019 also finds that ~1 gram of ≥60% EPA is best, but actually finds a higher effect size: +1.03. Kelaiditis et al 2023 also finds 1 to 2g of ≥60% EPA is best, but found a lower effect size of +0.43… which is still as good as the best antidepressant! So, I'm taking +0.558 as the median estimate.

(Note that when the meta-analyses report the "average" study's effect size, this includes mostly-DHA and low-dose Omega-3. The effect sizes I bolded above are for ≥60% EPA at high doses.)

Yes, it is concerning that several meta-analyses of the same scientific literature can return vastly varying estimates. More high-quality studies are definitely needed. That said, even the lowest estimate is on par with the median antidepressant, which has an effect around +0.28.

Is there "publication bias"? One popular critique of supplement studies is that the effect is inflated, because studies that find low to no effects don't get published. But if this was the case, we should see gaps & asymmetry in the data. Do we? Admittedly, yes: Sublette 2011 & Kelaiditis 2023 finds publication bias, but Liao 2019 doesn't. These meta-analyses had different criteria for which studies they accepted or not.

Let's convert this to an actionable recommendation: There's a lot of uncertainty, but the only thing the above meta-analyses agree on is "60% or above EPA" and "somewhere between 1 and 2 grams". So let's just say: get 1500 mg/day of 60%-EPA Omega-3 supplements.

In comparison, most official health organizations recommend "250–500 mg combined EPA and DHA each day for healthy adults." That is over three times too low, at least for optimal effects on depression, which we estimated is around 1500 mg/day. (The official maximum safe dose is 5000 mg/day)

Direct effect on suicide: Finally, a (small) study directly investigating the link between suicide & Omega-3. Sublette et al 2006: “Low [DHA] and low Omega-3 proportions [...] predicted risk of suicidal behavior among depressed patients over the 2-year period.” Though keep in mind this is a small study, and it's observational not experimental. Also, weird that contrary to the above studies on depression, DHA predicted suicide but not EPA. Not sure what to make of that.

Bonus: Omega-3 may also boost cognition? Shahinfar et al 2025: “Enhancement of global cognitive abilities was observed with increasing omega-3 dosage up to 1500 mg/day. [effect size = 1.00, like going from a grade of C to B!], followed by downward trend at higher doses.”

Vitamin D

Ghaemi et al 2024 is a meta-analysis on Vitamin D on depression. (31 trials with 24,189 participants → over 700 participants per trial on average, higher than the antidepressant trials!)

Again, it actually estimates a dose-response curve! Below is Figure 1 + Table 2, showing the effect of Vitamin D dosage on depression vs placebo. The solid line is the average estimated effect, dashed lines are 95% confidence interval. Note the effect size is negative in this figure, because they're measuring reduction in depressive symptoms:

The upper range of uncertainty is lowest at 5000 IU (International Units) of Vitamin D a day, with an estimated effect size of 1.82, with a 95% uncertainty range, from 0.98 to 2.66. Let's be pessimistic, and take the lowest end: 0.98, like taking your mental health from an F to D, or C to B.

Is this in line with earlier meta-analyses? Again, more or less! Mikola et al 2022 found a lower estimate: the effect for ≥ 2000 IU/day is 0.407. Note that even this is still on par with the best antidepressant! And Xie et al 2022 found a higher estimate: the effect of > 2,800 IU/day is 1.23. So, I'll take the median estimate: around 0.98. (And I'm recommending 4,000 IU/day, since that's the "official" max safe dose. Though as we'll see later, even the official max dose may be too low.)

Is there "publication bias"? Ghaemi 2024 & Xie 2022 did NOT detect bias, Mikola 2022 did. (Again, meta-analyses differed in which studies met their quality criteria.) It's worth noting that Mikola, even adjusting for publication bias, even including low-dose studies, still finds an "average" effect of 0.317 — on par with the median antidepressant's 0.28. If we restrict ourselves to "just" high-dose Vitamin D, we get a much higher effect.

Does Vitamin D work long-term? Unknown, because of an important confounding variable: the shorter trials used higher doses, the longer trials used lower doses. Quote Mikola: "the mean vitamin D dose was more than 2,900 IU/day in interventions lasting 12 weeks or more, versus approximately 5,700 IU/day in shorter interventions." (Note that "12 weeks" is on par with traditional antidepressant trials.)

Does this still work even if you're already taking antidepressants? Yup! Table 1 of the Ghaemi meta-analysis, also shows that Vitamin D helps for both patients using antidepressant medication, and not. This is encouraging: it means you can stack both medications & supplements!

Does this still work even if you don't have Vitamin D insufficiency? Yes, but admittedly much less. That said, you probably do have a Vitamin D insufficiency. Liu et al 2018 finds that a bit under half of American adults (41.4%) have insufficient Vitamin D blood levels. And Manios et al 2017 finds that over half of kids (52.5%) in Greece — frickin' sunny Greece! — are still Vitamin D insufficient.

Also, the "official" recommendations are all too low:

So, if these three meta-analyses are right, then high doses — 2000 IU/day or more, possibly 4000 (official max dose) or higher — is optimal. But the official recommendation for Vitamin D is 400–800 IU/day, several times too low.

And even the official max dose of 4000 IU/day may be too low! But McCullough et al 2019 gave over thousands of participants 5,000 to 10,000 IU/day, for seven years, and there were zero cases of serious side effects. This matches later studies like Billington et al 2020, a 3-year-long trial on hundreds of participants, which found "the safety profile of vitamin D supplementation is similar for doses of 400, 4000, and 10,000 IU/day." (Although 15 participants got "mild hypercalcemia", but "all cases resolved on repeat testing." Either way, that's a small cost for reducing the risk of major depression & suicide.)

And it makes evolutionary sense that 10,000 IU a day should be safe. Your skin, exposed to the Sun's ultraviolet rays, can synthesize up to (the equivalent of) 10,000 IU a day, before plateauing out. Source is Vieth 1999: “Because vitamin D is potentially toxic, intake of [1000 IU/day] has been avoided even though the weight of evidence shows that the currently accepted [limit] of [2000 IU/day] is too low by at least 5-fold.” And Papadimitriou 2017 reviews several previous studies that find statistical errors behind official recommendations; correcting for these, adults should get 8000 IU/day.

(On the other hand, Krzyścin et al 2016 estimates that the existing hunter-gatherer group, the Hadza, get 2000 IU of Vitamin D from sun exposure, and their food is a poor source of Vitamin D. Given that existing hunter-gatherers live in the areas the colonialists didn't want, ancient hunter-gatherers probably ate & got more Vitamin D. So, 2000 IU is a lower bound on how much Vitamin D one should get — still several times more than the current official recommended dose of 400–800 IU/day.)

So why are all the official sources still so paranoid about Vitamin D, and lowballing the recommendations? Well, alas, official policy is always a few decades behind the science in any field. See: trans fats, open-label placebos, aerosol transmission of Covid-19, etc. And because something something incentives, it's "rational" for government/insurers to be very risk-averse & slow to change (for better & worse).

Speaking of the Sun, why take supplements instead of just getting Vitamin D from sun exposure? Well, skin cancer. But also: because Sun-Skin D varies greatly depending on the season, your latitude, and your skin type. There's less ultraviolet rays from the Sun in winter/fall, and at latitudes further from the equator. And the darker your skin is, the less Vitamin D your skin makes for the same amount of Sun exposure. As expected from the bio-physics of skin, Black adults have the highest prevalence of Vitamin D deficiency (82.1%!!), followed by Hispanic adults (62.9%). (But hey, at least Black adults have the lowest incidence of skin cancer. You win some you lose some.) The point is: speaking as someone with Southeast Asian skin, who's currently in Canada during winter... even if I stood outside naked for hours, I'd get approximately zero IU/day of Vitamin D from the Sun. Thus: supplements.

Direct effect on suicide: Finally, a meta-analysis directly measuring the effect of Vitamin D on suicidal behaviour. Yu et al 2025: “Vitamin D in patients with [suicidal behaviours] were significantly lower than in controls (standardized mean difference: –0.69, or a 'medium' difference)”. Reminder that this paper by itself only measures correlation, not causation — but combined with the above experiments of Vitamin D on depression, I think it's reasonable to guess it's partly causal.

To recap:

Almost half of you have a Vitamin D insufficiency according to the official recommendation (800 IU/day).
And those official recommendations are way too low. The optimal amount of Vitamin D for depression is probably 4000 IU/day, with an effect around twice that of the best antidepressant.
Even the official maximum safe dose (4000 IU/day) is below what your body can produce from the Sun in optimal conditions (10,000 IU/day). Recent randomized controlled trials confirm that 10,000 IU/day is, indeed, mostly safe.
Reminder that official policy is often decades behind the science.
Reminder that I'm not saying "take supplements instead of antidepressants"; in fact the above meta-analysis shows you can effectively stack them!

Bonus: Vitamin D supplementation was found in several randomized controlled trials to reduce mortality from Covid-19, though much less than official treatments like Paxlovid. Vitamin D also probably helps guard against influenza too, though the evidence is small & early.

Conclusion: All this time, you lacked the Vitamin?

Scurvy is caused by a lack of Vitamin C. It's a condition that causes your wounds to re-open up & teeth to fall out. Scurvy used to kill almost half(!) of all sailors on major expeditions; it's estimated millions died. It can be cured by eating lemons.

Rickets is mostly caused by a lack of Vitamin D. It's a condition where kids' bones go all soft and deformed. During the Industrial Revolution, up to 80% of kids suffered from it. It can be prevented with cod liver oil.

Goiters is mostly caused by a lack of Iodine. It's a condition where the thyroid gland in your neck swells up painfully, to the size of an apple. During WWI, a third of adult men had goiters. It can be prevented with iodized salt.

About 1 in 4 people are expected to have clinical depression sometime in their life. Depression is the #1 source of the global "burden from disease" in the mental health category, and that category is the #6 burden of disease in the world, above Alzheimer's, malaria, and sexually transmitted infections.

(But honestly, did you need those stats? This is likely a lived experience for a lot of you reading this.)

The effective altruists are all, "woah for just $3000 you can prevent a child's death from malaria" — and that's great! save them kids! — but where's the fanfare for the accumulating evidence that, "woah with cheap daily supplements we can save millions from suicide & depressed lives"?

Over and over again throughout history, some horrific thing that caused millions to suffer, turned out to be "yeah you were missing this one molecule lol". To be clear: not everything is gonna be that simple, and mental health is not "just" chemistry. Also, all the numbers on this page have with large error bars & uncertainty, more research is needed.

But, as of right now, I feel I can at least confidently claim the following:

Vitamin D and Omega-3 are both at least on par with the median antidepressant (effect size ~= +0.3).
The evidence is much stronger for Vitamin D; it's very plausibly at least twice as good as antidepressants.
Both supplements are cheap and safe, so what's the harm of trying? (positive "expected value" for this bet)

So:

MY RECOMMENDATIONS FOR RESEARCHERS:

More big, pre-registered, double-blind randomized controlled trials, please. (And specifically: testing high doses.)

MY SPECIFIC RECOMMENDATIONS FOR YOU:

Go to a pharmacy, buy the following supplements over-the-counter, in whatever form you like: (I like the easy-to-swallow gel capsules)
Vitamin D
- 🌱 By default, Vitamin D supplements are derived from… (quick web search)… the grease in sheep's wool? Huh. Also fish liver oil. Anyway, if you're vegan, make sure your bottle specifically says "vegan" or "from lichen/mushrooms". (If you're vegetarian, the sheep's-wool Vitamin D is fine, they don't kill the sheep for it.)
Omega-3 where EPA is ~60% of the Omega-3 total. For example, my 500mg Omega-3 capsules have 300mg EPA, 200mg DHA.
- 🌱 By default, Omega-3 supplements come from fish. If you're veg(etari)?an, there are plant-based sources of Omega-3, but look carefully: most vegan Omega-3 supplements provide more DHA than EPA, which the above studies suggest fully cancel out the antidepressant effect. Double check the nutritional label to make sure it's ≥60% EPA. For example, this one is 300mg EPA + 200mg DHA. (not an affiliate link)

Then, every day:

Take ~4000 IU of Vitamin D
- ⚠️ be cautious if you have kidney stones, or are on medications that could interact with Vitamin D. "ask your doctor".
- (4,000 IU is the official max safe dose)
- 10,000 IU if you're feeling daring (a couple large controlled trials showed no major lasting adverse side effects).
- if you have darker skin / live in higher latitudes / it's winter, you definitely need some form of vit D supplementation
- bonus: may improve immune response to Covid & influenza?
Take ~1500 mg of ≥60%-EPA Omega-3
- ⚠️ be cautious if you're on blood thinners, or other medications that could interact with Omega-3. again, "ask your doctor".
- (5000 mg/day is the official max safe dose)
- bonus: may improve cognition?
Don't quit your existing antidepressants if they're net-positive for you!
- you may also want to ask your doctor about Amitriptyline, or those other best-effect-size antidepressants.

Can you get these doses of Vitamin D & Omega-3 through whole foods alone, no supplements? Probably, but it'd be expensive & tedious: you'd have to eat something like 2,000 calories of farmed salmon a day to get 4,000 IU/day of Vitamin D. As for Omega-3, eating mostly oily fishes would get you >1000mg of Omega-3, but they'd be more DHA than EPA, which the above studies suggest would cancel out the antidepressant effects.

The effect sizes on depression:

The best antidepressant: +0.417
- like your mental health grade going from F to F+, or C to C+
1500mg of ≥60%-EPA Omega-3: +0.558
- like your mental health grade going from F to D–, or C to B–
4000 IU of Vitamin D: +0.98
- like your mental health grade going from F to D, or C to B

For completeness & comparison, here's the effect size of other things on depression:

Any mainstream "bona-fide" psychotherapy (CBT, Psychodynamic, Humanist, Solutions-Focused): +0.35, source: Kamenov et al 2016
- like going from C to C+
Aerobic/Cardio Exercise: +0.79, source Ioannis et al 2018
- like going from C to B–
- (dose: "45 minutes, at moderate intensity, three times/week" ⇒ ~20 min/day)
Good Sleep: +1.10(???), a lot of interpretation & calculations, see footnote^[5]
- like going from C to B
- (dose: going from moderate insomnia to healthy sleep)
Bright Light Therapy: +0.487, source Menegaz de Almeida et al 2025
- (the above paper reports Odds Ratio of 2.42, which converts to Cohen's d effect size of +0.487)
- like going from C to C+
- I went with Wirecutter's recommendation for a UV-free 10,000 lux lamp.
- (dose: 10,000 lux, 30 min a day)
Mindfulness Meditation: +0.42, source Breedvelt et al 2019
- like going from C to C+
- (dose: 7 weeks, "153 min each week" ⇒ ~20 min/day)

(And remember to add +0.4 for the power of hope, e.g. "placebo"! Also remember: you can stack any of the above interventions to get an even larger effect! You can't just naively add up the effect sizes, but I'd be surprised if the effect of {vitamin d + omega-3 + bright lamps + cardio + good sleep + meditation + therapy + antidepressants} combined ends up being less than +2.00. Two letter grades up means going from D to B, or, theoretically, from clinically depressed to flourishing! For more papers & my working research notes on "best bang for buck on depression", check out this Google Doc.)

Also, remember that all the above estimates are uncertain. And that averages hide variation in effect. That said, I still think the overall picture is still strong: there exist high bang-for-buck ways to reduce depression, which are at least on par with drugs & therapy (and plausibly 2x better), that aren't (yet) common knowledge amongst policymakers & the public. And again, they're dirt cheap with minor-to-no adverse side effects. Moderate chance of a big win, for a known tiny cost. That's a positive "expected value" bet right there.

I got onto this research rabbithole a few months ago while borrowing my housemate's ADHD meds, which I may or may not eventually collect into a "JOYMAXXING" informal meta-meta-analysis. (: See me yap about it on video as a cartoon cat.) But for this blog post, I wanted to dive deeper into Vitamin D and Omega-3, since their effect sizes are so huge, and they're insultingly cheap & easy, compared to therapy or regular cardio.

Stay safe this winter, keep away the seasonal depression. Get your supplements, and reach out to your friends & loved ones!

💖,
~ Nicky Case

I made up these ranges by requiring the standard letter grades F,D,C,B,A, to have their centers be -2,-1,0,+1,+2. Then, I made sure all in-between grades like C+ or A– had equal intervals. Each interval is +/- ⅙, or ⅓ wide:
- F---: -3.16 to -2.83
- F--: -2.82 to -2.50
- F–: -2.49 to -2.17
- F: -2.16 to -1.83
- F+: -1.82 to -1.50
- D–: -1.49 to -1.17
- D: -1.16 to -0.83
- D+: -0.82 to -0.50
- C–: -0.49 to -0.17
- C: -0.16 to +0.17
- C+: +0.18 to +0.50
- B–: +0.51 to +0.83
- B: +0.84 to +1.17
- B+: +1.18 to +1.50
- A–: +1.51 to +1.83
- A: +1.84 to +2.17
- A+: +2.18 to +2.50
- A++: +2.51 to +2.83
- A+++: +2.84 to +3.17
↩︎
Median antidepressant Odds Ratio: 1.66, see below figure. This converts to Cohen's d of 0.279. ↩︎
See Fountoulakis & Möller 2011 Table 1 Row 2. Will talk more about this paper again in a few paragraphs. ↩︎
See Table 1. A follow-up paper by Kirsch in in 2008 found that the drug group improved by 9.60 (non-standardized) points, and the placebo group by 7.80 points. (So, placebo + time is 7.80/9.60 = 0.81 = 81% of the full effect.) The F&M recalculation found the drug group improved by 10.04 points, and the placebo by 7.85 points. (So, placebo + time is 7.85/10.04 = 0.78 = 78% of the full effect.) And rows 2 & 3 confirm that Kirsch was still right about the following: “The [total effect] for drug groups was 1.24 [C to B+] and that for placebo 0.92 [C to B]” and “The effect size concerning the difference between improvement in drug groups and improvement in placebo groups was 0.32 [like C to C+]”. ↩︎
Lee et al 2023 reports the following effect sizes. Digital therapy for Insomnia → Sleep = 0.76, and Digital therapy for Insomnia → Depression = 0.42. Assuming the therapy for insomnia specifically affects depression only through better sleep (Digital therapy for Insomnia → Sleep → Depression), we can do an "Instrumental Variable" estimate of the effect of Sleep → Depression = 0.42 / 0.76 = 0.55. To be precise: this is saying, if you improve your sleep by 1 standard deviation, on average your depression improves by 0.55 standard deviations.

So: how many standard deviations is going from "moderate insomnia" to "healthy sleep"? The standard measure is the Insomnia Severity Index (ISI), which you can take online. A score of 0–7 means no insomnia, 8–14 is subclinical insomnia, 15–21 is clinical insomnia (moderate), 22–28 is clinical insomnia (severe). Let's be conservative and say we're just going from barely clinical to barely healthy: 15 to 7, or a reduction of 8 points. Yang et al 2009 says a 6-point reduction is 1.5 standard deviations, which means 4 points is 1 standard deviation. So a reduction of 8 points is 2 standard deviations. So, if you improve your sleep from insomniac to healthy, you improve by at least 8 points, which is 2 standard deviations, so your depression should improve by 2 × 0.55 standard deviations, or ~1.10.

Reminder that my estimate is full of assumptions upon assumptions & these error bars will compound. But I'd be surprised if the true causal effect of going from insomniac to healthy sleep isn't at least a "large" +0.8 effect. ↩︎

a human poem on humans liking AI poems over human poems

2026-01-07T00:00:00Z

Oh! To compete!
With an autocomplete!
In a recent RCT^[1]
They found that good ol' GPT
Could beat!
A human like me.

Oh! Dare I ask!
At what special human task!
Poetry, and humans guessed
worse than chance, at the test
of who's human, and confessed
to like AI above the rest:
Obsolete!
A human like me.

Oh! What a shame!
Who's the scapegoat I shall blame!
The masses, their dumb asses?
Postmodern poets who don't rhyme?
The study, it's too cruddy?
Sam Altman tryna make a dime?

Or, let's go back in time?

Copernicus! You did frick us!
WE ARE NOT THE CENTER.
Darwin-Wallace! Apes, you call us?!
WE ARE NOT THE CENTER.
Turing-Gödel! Turning hurdles!
Understanding is compression.^[2]
Intelligence is search.^[3]
Existential-o depression
from each scientific lurch.

WE ARE NOT THE CENTER.
THERE IS NOTHING SPECIAL ABOUT BEING:

A human like me.

O... kay....?

Pretty erudite, for a Luddite
I was playing. Science is a'ight, a'ight?
I like not dying of cholera or smallpox
or burying half our kids in a small box
(as they did before germ theory^[4])
It's erm, eerie,
how the past wasn't health-compliant.
I'd have to be an ungrateful brat
to piss on the shoulders of giants.

And yet, the ape-part of me
the part that lives on a Disc World
under a dome speckled with Gods,
the part that hates poetry that doesn't rhyme,
the part that lives
in the center

That part cries out!
It's got to shout!

Oh! Myth is dead.
It's ok to feel bereft.
Oh! The ineffable!
It's been fully eff'd!

Once there was a time when Art was the pinnacle
of human glory. Then a billion dollars and a nickel
later, a next-token-predictor can write poetry the fickle
humans prefer over human poetry. It tickles
my ironic nerves that we now take superior pride in,
what,
Counting the R's in "strawberry"?
Playing a pixel puzzle game?^[5]
Drawing hands that aren't scary?
That's our claim to fame?

(bish, i can't draw hands either 😭)

And sure, this current boom may fail
We've hit diminishing returns on scale^[6]
Backprop, big data and a VC whale
May not be enough this time.

Or, let's go forth in time?

Not enough this time.

But it's only a matter of time.

We will not be the center.

We never were.

And that's fine.

They raised us to love being special
(then complain that "narcissism" is up^[7])
But growing up means realizing it's okay
to not be special
to not be the center of Attention and/or Creation
to not have poetry that scans good
Fuck it, I'm having fun
Once we automate it all away
And there's nothing we need to do
And there's nothing we need from each other
We'll be forced to answer,
scared
sacred
What do we want to do?
What do we want to be for each other?

Whose center
do you want to be?

"A human likes me?"

Wait, was that my so-called wisdom?
A saccharine cliché?
"The real fully-automated space communism
was all the friends we made along the way"?
What about AI existential risk?
What was that about the world being a disk?
What about AI being used to rope ya
into a totalitarian dystopia
and oop, yeah
well
Were you expecting anything actually useful from a poem?
Art? Being useful? lol
This is why humans prefer AI poetry
Shit, I can't even have an uplifting moment
Without undercutting it with ironic cynicism
(Hey! Maybe that's what'll stop rogue AI)
(By being trained on internet data)
(If it becomes self-aware it'll just get depressed)

Fuck it, I'm having fun

Let's end this with a true human poem, that I know a corporate chatbot will never make: a poem, of copyrighted characters and sexual content.

Behold, Art:

There once was a mouse named Mickey
Who whipped out his mousey dickey
And started humping
Mr. Xi Jinping
While reading the Tiananmen Square Wiki

(don't worry, it's consensual)

Oh! What a purge!
Of many a mental urge!
And my ape, who wants to sing the mythical
And my brain, who wants to eff the ineffable
And my heart, who's torn about being ethical
And the machine, maybe at the pinnacle
We'll all merge?...

A human-like me.

Well, "recent". From a 2024 paper: "participants performed below chance levels in identifying AI-generated poems [and] were more likely to judge AI-generated poems as human-authored than actual human-authored poems." The human poets ranged from Shakespeare to Whitman to Plath. The AI poet was ChatGPT 3.5, with no prompt engineering, or feedback & iteration. ↩︎
"Understanding is Compression" is an idea that's been around for centuries, if not exactly in those words. Ockham's Razor says that given two theories that explain the same thing, we should pick the simpler one. Einstein said "A theory is more impressive the greater the simplicity of its premises, the more different things it relates, and the more expanded its area of application.”

And now, this idea is finding good use in AI! Neural networks trained with regularization (rewarding simplicity), and Auto-encoders (compress large input → small embedding → decompress back to original input), both lead to AI that's more robust & generalizes better.

Hat tip to these papers: Understanding as Compression by Daniel A. Wilkenfeld, a delightful accessable read. Information compression, intelligence, computing, and mathematics by J Gerard Wolff, founder of the SP (Simplicity-Power) Theory of Intelligence. Understanding is Compression by Li, Huang, Wang, Hu, Wyeth, Bu, Yu, Gao, Liu & Li, which show that LLMs can compress text (and even images/audio) better than standard compression algorithms! ↩︎
"Problem-solving is (heuristic) search" is also an old idea:

Turing & Champernowne designed the first chess AI in 1948, which searched every possible move & counter-move, then selected the best move, based on a "heuristic" (rule-of-thumb) based on the value of pieces, safety & mobility of the pieces, etc.

Simon & Newell created the General Problem Solver in 1957, which could take any formal game/system, like mazes or Sudokus or geometry, and search through possible moves ("operators") to get to the solution. Because of the exponential explosion in possible states, their AI tried to narrow down its search using a means-ends heuristic: first try moves that directly get you closer to your goal, backtrack only when that fails.

Skipping decades ahead, DeepMind released AlphaZero in 2017, which beat human grandmasters at chess, go, and shogi ("Japanese Chess"). The core of AlphaZero is the Monte Carlo Tree Search, which, once again, is "just" search through possible moves & counter-moves. But this time, instead of being hard-coded like in past chess AIs, the heuristic is a neural network that's learnt from scratch! ↩︎
For thousands of years, across all human cultures: the percent of kids who died before age 15 was around 50%. This only changed around ~1850, when the work of Jon Snow & Louis Pasteur (& many others) proved that diseases spread through "germs" that were invisible to the naked eye. Only after this scientific discovery, public health infrastructure actually worked. Global child mortality plummeted from 50% then to "only" 4% now, and "only" 0.3% in the best countries. To quote a wise philosopher: "YEAH SCIENCE, BITCH!" ↩︎
The Abstraction & Reasoning Corpus (ARC) is one of the few famous tests for AI that is (currently) unbeaten. You know those IQ tests where you look at some shapes, figure out the pattern, then select the shape that continues the pattern? ARC is like those, but 1) in pixel-game format, and 2) designed to be "easy for Humans, hard for AI".

The reason they're hard for current AI is because — unlike AI excelling at math word problems or reciting facts from textbooks — there is no common pattern in the ARC puzzles, and no way to memorize the answers, because the non-profit behind ARC uses a private test that's never been put online. (but they have a public version humans can play with, and AIs can train with.) ↩︎
Toby Ord, co-founder of Giving What We Can and author of The Precipice — a book on existential risk that gave AI the highest probability of all things that could cause human extinction this century — more recently, Ord is collecting stats that show, contrary to AI bulls, modern LLM improvements are coming at unsustainably exponential costs:

Inference Scaling and the Log-x Chart shows how AI companies lie with statistics by making their x axis exponential, and not the y axis.

Are the Costs of AI Agents Also Rising Exponentially? shows that, unlike humans — whose performance is linear: we can do 10× as much work in 10× as much time — LLM agents hit diminishing returns fast. ↩︎
While factchecking my poem, I found a 2024 meta-analysis that showed young people's "narcissism" has actually been declining since 2008, possibly due to the financial crisis! See Figure 5. I guess the misperception that "narcissism is rising" is due to narcissistic traits being more visible & rewarded in a modern attention economy, not due to actual generational changes.

Also, reminder that:

1) The "narcissism scale" is a flawed measure that mixes up healthy & unhealthy traits,

2) Don't confuse "younger people score higher on narcissism" (age effect) with "each generation is getting more narcissistic" (cohort effect). That's a fallacy, like confusing "younger people are shorter in height" and "each generation is getting shorter in height".

3) People with narcissistic personality disorder aren't cartoon villains, they're actual people. ↩︎

Connect your past, present & future, with lines on a graph

2026-01-01T00:00:00Z

So you have a Dream.

Like all worthy Dreams, it may take years or decades to achieve. Alas, the human motivation system can’t learn from rewards with a year-long delay.

So, you're instead advised to “trust the process”, and convert your long-term goal into regular short-term micro-goals:

Write a novel ⇒ write 500 words a day.
Cure your social phobia ⇒ talk with one stranger a day.
Get married in 10 years ⇒ get 2,000 micromarriages a week.

This is great advice, and I endorse it, but… it feels incomplete? If every day I just focus on this day, I feel like I'm in a hamster wheel, hitting my same micro-goal over and over again, but for what? Intellectually, I know "for what" — the Dream — but I don't feel it anymore.

So, here's a solution that reliably helps me feel the connection between my past, present & future: lines on a graph!

Let's look at 3 examples: project management, money, and weight loss.

🚀 Example 1: Projects

The good news is I’m my own boss. The bad news is my boss is a moron.

Specifically, my boss (me) is bad at project management. She (I) still is (am), to be honest. But the tool that's helped me the most is the Burndown Chart.

How it works: First, give your project (or "sprint") a deadline, and break it down into small tasks. Give each task a number of "points" it's worth (I just make points = hours). Then each day, do tasks. Your burndown chart software will then show you, in a single graph:

Your past: how many points/hours you've gone through
Your present: how many points/hours you need to do today to stay on track
Your future: when you'll finish your project!

( example chart by User:PabloStraub ⤵ )

When I shipped my latest project, a 20,000-word-with-40-illustrations behemoth, I used a variant called a Burnup Chart, which also tracks total estimated time over time, to see how badly I mis-estimated how long tasks would take. Here's my actual chart:

( As you can see: I started off estimating that project would take ~25 hours, but by the end it turned out to be ~75h. I was off by a factor of three, yeesh. But thanks to this Burnup Chart design, I could account for my own mis-accounting, and see where the blue & red trendlines would intersect, to get a better estimate of when I'd actually finish my project. And I did, in fact, finish by Dec 1st 2025, as planned! )

The best part about Burndown/Burnup charts? I can see how different decisions will affect the future, today. For example, I can try deleting a task, or taking an extra break day, or increasing hours worked a day, and immediately see how that'd affect when I'll finish my project.

By tying my past-present-future together into lines on a graph, not only am I more motivated, I also get instant feedback from the future.

(Here's a blank template, and here's an actual example of when I used it.)

💸 Example 2: Money

Another example of "see your effect on the future, today" — just enter your current income & spending into a spreadsheet, multiply by the compounding interest of your savings/investments, and voilà: (example data, not my real finances)

Again, this lets you play with decisions like "what if I tried to increase income by this much", "what if I tried to reduce my rent by that much", and you get instant future-feedback.

But more importantly, it helps you viscerally see the connection between your short-term micro-goals, and long-term Dream. "Put $10 a day into savings" is doable but doesn't sound like much, but — just do the math — and you'll find it's enough to retire a millionaire at age 63, already adjusted for inflation, starting from zero money!

(yes, for people living paycheck-to-paycheck or less, "put aside $10 a day" isn't feasible. but for most middle-class people and up, it should be.)

( Quick aside, the best advice for almost all laypeople: ditch the fancy BS mutual funds, and forget about day-trading stocks. Just put your money in a very-low-fee diversified index fund. If that feels icky because you're investing in capitalism, remember, if you put your money in a savings account, your bank will still invest it, they'll just give you nothing but crumbs. Also: you can pick a socially-responsible index fund that excludes weaponry & fossil fuels. )

( Here's the spreadsheets for both the above money examples )

🏋️‍♀️ Example 3: Weight

Finally, the most cliché New Year's Resolution: losing weight. Again, use a spreadsheet with lines on a graph: (my actual data)

You can see, at a glance, the connection between:

Your past: weight over time
Your present: your target caloric deficit for today
Your future: when you're likely to hit your weight goal

( I find that visualizing the trendline is emotionally helpful, because weight is so variable from day to day. Seeing the trendline helps reassure me, ignore the daily noise, I'm on the right track. )

And again, you can play with what if's, and get instant future-feedback. A daily micro-goal like "250 calorie deficit" (one less cookie a day) — do the math^[1] — becomes "lose 50lb (23kg) in two years" (enough to get you from moderately-Obese to "Healthy" BMI!)

( IMPORTANT: a) Yes both BMI and the Calorie-In-Calorie-Out model are very incomplete and flawed, b) Being Overweight & Obese Class I (25 < BMI < 35) is probably fine health-wise; previous studies didn't control for exercise levels & cardiovascular fitness (see Figures 3 & 4), c) May I interest you in the Potato Diet? Potatoes fill you up very fast (high protein & high fiber if you keep the skins), have almost all the daily nutrients you need, and are versatile foods you can cook in many ways to keep the diet interesting. I did the Potato Diet for a month and lost 10 pounds, and I was already "Normal" BMI! d) Or, ask your doctor about semaglutide. (see Figure 1) )

Here's my weight-tracking Google Sheets template!

Anyway, let’s wrap up:

🚀 Get started with lines today!

"Huh, what an interesting & inspiring article! I'll try this later"
(one year later)
"Oh no I forgot."

~ you, if you don't implement this right now.

So before you close this tab, at least ask yourself:

1) What's a Dream you have, that you've wanted for a long while, yet haven't done it yet?

No shame here, it can be as wacky or tacky as your heart desires. For example:

📕 Write the next Great Canadian Novel
🧠 Master the secrets of the universe

No, seriously, stop & think of a Dream you have. Silently think it and/or write it down, before continuing.

Got one?

Good. Next step:

2) How can you convert your long-term Dream into a regular micro-goal?

What makes a good micro-goal:

You can do it regularly & often. (e.g. daily, weekly)
There's a clear, specific pass/fail. (e.g. "write 500 words" vs "do some writing")
It causes, not just correlates with, your long-term goal. (to mitigate Goodhart’s Law, where “what gets measured gets gamed”)

Running with the above examples:

📕 "Write a Great novel" ⇒ Write/edit 500 words a day.

500 words a day is easy. You probably write that much alone in text messages.
A novel is around 50,000 words.^[2] At 500 words a day, you can finish the first draft in 100 days, by mid-April. Then, editing 500 words a day, you can create a polished draft by mid-July. Then, editing 500 words a day again, you can create a good final version by mid-October.
So: by writing/editing a mere 500 words a day, you can publish a novel every year, with two months to spare!
- (The exact numbers vary if you want to write longer books, or do more revisions, but the general result holds: you could complete a polished book every 1 or 2 years, by writing/editing only 500 to 1000 words a day.)
Your first novel is very unlikely to be Great, but with more practice & feedback, you maximize your chances of writing a truly Great work. Persist! Remember that Slaughterhouse-Five was Vonnegut's sixth novel! (As for GenAI? The people will be thirsty for quality, amidst the coming flood of AI slop.)

🧠 "Master ~~the secrets of the universe~~ Physics" ⇒ Study 1 hour a day.

1 hour a day is easy-ish. You probably waste that much time a day on Internet Attention Sinks. (I know I do)
Here's a study plan, at 1 hour a day:
- Day 1, Day 2: Watch one of Leonard Susskind's free online Stanford University lectures. Each lecture is around 2 hours, so watch half today, half tomorrow. Write down your notes & questions.
- Day 3, 4, 5: Ask Claude or Gemini your questions, to clarify your understanding. (I don't trust ChatGPT or Grok.) What about LLM hallucinations? As of Jan 1st 2026, I find that large language models are pretty good on educational topics, as long as they're undergraduate-level or below.
- Day 6, 7, 8: Web search for "[topic] practice problems with answers", then actually do them. If you get stuck, ask your chatbot of choice for help but not full solutions. A gym coach shouldn't lift the weights for you.
- Day 9, 10: Make Anki flashcards to remember the core ideas & formulas, easy-ish-ly and effectively. (8 min YouTube video on how & why spaced repetition works)
So that's 10 days = 2 weeks with weekends off, per Susskind lecture. There's 10 lectures per course × 6 courses = 60 lectures. At 2 weeks per lecture × 60 lectures = 120 weeks, or 2 years.
So: you can get a college-level understanding of Newtonian physics, Electrodynamics, Quantum mechanics, Special & General relativity, Cosmology, and Entropy... at the cost of 1 hour per weekday (and $20/month for a chatbot)… spread across 2 years, which is half the time of a 4-year college degree.

( It is not exactly a well-kept secret that the modern university system is horribly broken. Relevant SMBC: )

( If you don't have the pre-requisite math skills for Susskind's lectures, add an extra five months of "study 1 hour a day", and use Khan Academy for high-school algebra (~12 hours of lectures), and 3Blue1Brown for calculus (~3 hours) and linear algebra (~3 hours). So that's an extra 18h of lectures, and given our above study plan where we tackle 2h of lectures per 2 weeks, that's an extra 18 weeks, or about five months, of self-study. Again, at just 1 hour a day. What a steal of a deal! )

Okay, back on track:

Step 1, know your long-term Dream. Step 2, break it down into a sustainable regular micro-goal. This step is tricky; if you're stuck, paste a link to this blog post into Claude (with web search turned on), tell 'em your Dream, and ask Claude to help you break down your Dream into sustainable micro-goals, with inspiration from the above two examples.

After that, your final step:

3) DRAW LINES ON A GRAPH

I strongly recommend using a spreadsheet program like Google Sheets, Excel, or Numbers. It needs more setup than an app, but you get full freedom to customize it however you like! If you need help setting up a spreadsheet, look up YouTube or consult Claude. Again, you can also get started with my templates:

🚀 Project Burnup: template, example with my actual data
💸 Money: templates with examples
🏋️‍♀️ Calorie Tracker: template

Alternatively, you can use these apps:

🃏 Trello for productivity (with a Burndown Chart add-on)
🐝 Beeminder for any quantifiable goal. (this app helps you set up "commitments with a sting"! every time you slip behind your goal, Beeminder will take some of your money.)

(If you don't want to do this final step today, set an exact day & hour in your Calendar to do this, so you don't forget to do it!)

. . .

And that's all the steps!

Try it out for one area of your life, then slowly expand to others, one Dream at a time.

🤔 Q & A

Q: If it really is that trivially easy to write one book a year, become a millionaire by middle age, go from Obese to "Healthy" BMI, and gain university-level mastery of hard skills & topics… why isn't everybody already doing this? Heck, why aren't you doing this, Nicky. Why aren't you already a millionaire polymath scientist-artist supermodel bestselling author?

A: lol i dunno. i guess everyone's only living at 2% of their true potential

( in 2026, i resolve to live up to 5% of my true potential! )

( edit: ok, actual answer to this paradox, I think, is something like "hyperbolic discounting sucks". Less jargon-y: human brains tend to overweigh the short-term over long-term, because the short-term is more immediately visible. My hope is the "lines on a graph" trick helps work around this brain-glitch, by making the long-term more vivid and here. )

“Write a novel a year” sounds impossible. Most bestselling authors aren’t that prolific. “Write 500 words a day” sounds trivial. You do that unthinkingly between texts & DMs & Slacks.

Yet they’re the same goal, just on different time-scales.

That’s the magic of micro-goals and lines-on-a-graph, seeing your past + present + future at a single glance: you can connect the impossible Dream with the possible Daily, see your effect on the future today, and browse the multiverse of possible futures for the one your heart desires.

📈 Happy Line-making, everyone!
~ Nicky Case

P.S: One of my Dreams this year is to write one blog post a week. I hope you found value in this first entry of my "one post a week" challenge for 2026!

Let's do the math: I expect my posts to be 2,000 words on average. (this one's 2,600 words long.) So if I write & polish 500 words a day, I can get 1 post done per 4 weekdays, giving me a good buffer. 2,000 words per week × 52 weeks per year = 104,000 words. That's the size of two books.

If you'd like to keep up on my "books disguised a blog", sign up for my low-volume, monthly newsletter! Every end of month, I'll share links to the 4 or 5 posts I wrote — this way, you can skip the ones that don't interest you:

I also reflected on 2025 and wrote more on my resolutions for 2026, in this public Patreon post. You can support my research & writings (always free, online, and open-source!) via Patreon, Ko-Fi, or PayPal ⤵

Bless your cotton socks! Happy new year 2026! 🎉 🎊 🎉

A simple (but inaccurate) rule-of-thumb is Wishnofsky's Rule: 1 pound of fat ~= 3500 kcal. (This rule is inaccurate & there are better models, but honestly it's not too far off.) In other words, with a calorie deficit of 250 kcal a day, you'll have a deficit of 3500 kcal in 14 days, or burning 1 pound of fat per 2 weeks. So, over 2 years, that's ~104 weeks, or burning 104/2 = 52 pounds (23kg). Given the standard BMI model (which, yes, is also inaccurate but let's roll with this for now), 52lb/23kg is enough to take you from moderately Obese to "Healthy". ↩︎
Random web sites claim that most published books "should" be around 80,000 to 100,000 words, with Young Adult having a smaller word count, and 40,000 at minimum to count as a novel, not "novella". I'm picking 50,000 to make the math easy to follow.

Some Great novels around 50,000 words: Kurt Vonnegut's Slaughterhouse-Five, Douglas Adams's The Hitchhiker’s Guide to the Galaxy, F. Scott Fitzgerald's The Great Gatsby. 50k words is plenty! ↩︎

Signal Boosts for Autumn 2025: much ado about the Algorithm

2025-10-30T00:00:00Z

Hey folks! Things are still in the works. In the meantime, here's some recent-ish stuff I found valuable, and I hope you do too. 💖 (Sorry this post is, uh, over 10,000 words. Guess I'm making up for not posting a Signal Boost in almost a year!)

Much ado about The Algorithm
- 🎙️ An interview series with experts on this new algorithmic era ↪
- 😑 New study: It's probably not the algorithm's fault, we just suck ↪
- 🙃 Gradual Disempowerment: how even “dumb” AI could take over humanity ↪
- 🤝 Bridging-Based Ranking: reverse political polarization by reversing the algorithm ↪
- 🧐 Zero-Knowledge IDs: How to prove yourself without doxxing yourself ↪
- 🧠 Cyborgism: How to make AI that enhances us, not replaces us ↪
- 🍻 The 6Pack of Digital Democracy ↪
Fun Stuff!
- 🕵️ Clues by Sam: a daily deductive detective game ↪
- 🐦 Snakebird: cute birbs, cruel puzzles ↪
- 🦉 ZeWei's Multiverse Tour Guide Adventures Continue ↪
- 🤓 Emnerson: goofy internet gal ↪
- 🧇 "blahaj goes to waffle house at 3am" ↪
SPOOKS FOR HALLOWEEN
- 👻 BOO! a silly music video & music album ↪
- 🩸 Bury Your Gays by Chuck Tingle ↪

Much ado about The Algorithm

🎙️ Interview series with experts on this new algorithmic era

My friend Tobias Rose-Stockwell — author of the book Outrage Machine — just launched a new interview series! In this series, he'll yap with experts on the simple question of "seriously wtf is going on, what do we do when algorithms install depression & extremism into our kids, scientists can't agree if AI is hype of the end-of-humanity, also holy shit politics??"

Simple question!

Episode 1: Jonathan Haidt, psychologist and author of the bestselling books The Righteous Mind^[1] and The Anxious Generation.

: click to expand — my notes from the Jon episode

Episode 2: Tim Urban, blogger behind Wait But Why and author of What's Our Problem?

: click to expand — my notes from the Tim episode

Upcoming interviews:

Tristan Harris, ex-Google employee most famous for calling out Google, Apple & Facebook for using dark design patterns to keep us (literally, not just metaphorically) addicted — and promoting specific better designs that respect human attention & autonomy. (See: Tristan's leaked internal Google presentation (2013).)
Steven Pinker, who I do not like because he keeps strawmanning/weakmanning all his opponents, and IMO is wrong on cognitive science (his own field), and most of the other fields he's written on, especially AI Safety. But he is a leading voice & bestselling author, so, well, whatever.
Esther Perel, psychotherapist best known for her talks & books about SEX! and relationships.

Anyway if the above interests you, check out my friend's new interview series!

On YouTube / Substack / Apple Podcasts

Be one of the first thousand fans, it's very early days for this project.

😑 New study: It's probably not the algorithm's fault, we just suck

Tobias sent me this paper (Ars Technica lay-summary), which hurts him because it contradicts one of his core theses. +1 for Tobias's intellectual honesty, showing the counter-evidence.

Anyway, Tobias's thesis (a common one among researchers) is that social media breeds extremism because of their algorithmic feeds. Enraging is engaging. The data shows it: anger is the best way to make a headline go viral.^[2] The algorithms, trained to predict us, train us to become predictable.

But this new paper suggests that, no, it really is just our fault, no algorithm needed. The paper uses an interesting method: they simulate a social network, where each person^[3] is simulated using LLMs^[4], deciding what to re-share and who to follow, etc. "Agent-based modelling" for social science isn't new, but using LLMs to do it is!

(But you should be rightfully skeptical: can LLMs simulate people accurately? This Stanford preprint finds that LLMs can replicate 1,000 real people's answers 85% as accurately as the people themselves 2 weeks later. So, not perfect, but not bad!)

Back to the paper: they use LLM agents to simulate a "baseline" social network^[5], then simulate proposed solutions to social media — e.g. chronological feeds, hide engagement metrics, bridging-based algorithms, etc — and measure they're effect on political polarization, attention-economy inequality, etc.

The headline result: it's our fault. Even a no-algorithm, chronological feed leads to insular echo chambers & polarization.

Figure 3: The "E-I Index" is a measure of outgroup (external) to ingroup (internal) follows. 0 (upwards) is less polarization, -1 (downwards) means more polarization. No intervention reduces polarization that much. Chronological's practically the same as baseline. The best intervention is Bridging, but even then not by much.

Figure 4: Correlation between how extremist a poster is, and how many followers/reposts they receive. Chronological feeds are much worse than baseline. Bridging is quite a bit better, actually. (Wait, how can a no-algorithm feed be worse? My guess in footnote:^[6])

Figure 5: The "Gini coefficient" is a measure of inequality. Chronological feeds lead to the least amount of inequality in followers/reposts. Bridging leads to the most, but not that much higher than baseline.

= = =

On one hand, these results aren't too surprising. Puritan-era Salem didn't have algorithms or mass media, yet they had literal witch hunts. Tumblr & 4chan in the early 2010's didn't have algorithmic feeds – (4chan doesn't even have a "reblogs" or "followers") – yet not only were these sites hotbeds of extremism, it escaped containment and affected US politics, which affected world politics. And today, Bluesky has a chronological feed by default^[7], yet it's... well. It's light-blue Tumblr. Donald Trump's Truth Social also has a chronological feed.^[8]

So: no algorithm is needed for poisonous politics. "It's you."

Well, damn. I was hoping there'd be an easy technical fix, and the problem wasn't, "human nature is fundamentally corrupt".

On the other hand, the paper did find a hopeful result – even though their own Abstract downplays it because, I don't know, academic humility and/or pessimism sells? They found that "bridging-based algorithms", like Birdwatch/Community Notes or Pol.is, does cut down polarization! (as explained in the above Figures)

Bridging algorithms, instead of showing you what {people you agree with} agree on, show you what {people who disagree} still agree on. It automatically finds & highlights common ground, by design! (More explanation later in this Signal Boost. ↪)

Now, there is a small cost to using bridging algorithms, in that it slightly increases follower-inequality, but honestly, it's a very small difference, and you could just offset that by making your algorithm give a bit more weight to smaller creators.

Let's leave on that hopeful note.

= = =

Other caveats I want to mention:

Though LLM agents have been validated against human data (85% as accurate as people themselves 2 weeks later), because this simulation runs LLM agents against each other for dozens of rounds, small biases in LLMs can accumulate. (see: Scott Alexander's article on why Claude Finds God and DALL-E Goes Racist)
The study might being over-estimating polarization, because the LLM "knows" polarization happens on social media, and thus acts it out. (it's a text predictor, remember?) Then again, the study could be under-estimating polarization, because these LLMs have been fine-tuned to be "friendly", and thus will be kinder & more curious than the average internet user. Maybe these two errors roughly cancel out.
Although the algorithmic feed may not be the culprit, other design decisions could have led to higher assholery. For example, the low-friction Retweet lets impulsive emotions win the stage. And the Quote-Retweet (and Tumblr's Reblog-with-Commentary) is the perfect way to dunk on someone and get the "last" word.

: click to expand — why polarization isn't always bad, and the 3 kinds of "polarization"

= = =

Link to paper by Larooij & Törnberg, press article

🙃 Gradual Disempowerment: how even “dumb” AI could take over humanity

There's two main tribes in the AI Risk world: 1) those who believe the main threat is that "super-intelligent" AI goes rogue & take over humanity, and 2) those who believe the main threat is that "dumb" AI amplifies economic inequality and digital authoritarianism.

The recent Gradual Disempowerment paper asks: why not both? "Dumb" AI could take over humanity through our normal cultural, economic, and political incentives.

Here's how AI slop could disempower us in Culture, Economics, and States:

= = =

Culture:

(AI will smith eating spaghetti in 2023 vs 2024. jfc 2025 is scarily realistic)

This is the one we're all, unfortunately, most familiar with.

First, we made chronological internet feeds.
Then, when there was too much content, we invented Newgrounds-style & Reddit-style voting to find the gems.
Then, to counter spam (& to make investors happy), we put opaque machine-learning algorithms in charge of boosting content. Importantly: these algorithms are grown, not designed. To over-emphasize: nobody understands how these algorithms really work, not even the engineers or the algorithm itself.

So, human autonomy is now near-fully removed from the "consumer" side of modern culture: our media diet is filtered through an algorithm that literally nobody understands. We're already disempowered here.

But wait, it gets worse! To stay in the game, creators are forced to re-design their content for the algorithm. The algorithm itself is the primary audience now. Some of it's not too bad: tacky clickbait thumbnails, a supercut in the first 30 seconds. But some of it's pretty bad: outrage-entrepreneurship, straight-up lies & (engaging) bullshit.

But wait, it gets even worse! As the space gets more competitive, creators will be pressured to make more at lower cost. Well, what's the harm in a few AI-generated images? Or AI video clips? C'mon, I have to let an AI pick the best title & thumbnail. And maybe AI voiceover? Sure, an AI's writing the script, but I'm still generating the idea & outline! Okay, just the idea. Okay, AI can handle that too.

OpenAI & Meta (Facebook) recently announced their own AI-video versions of TikTok.

To recap: we're already disempowered on the "consumer" side of culture. Soon, we may be disempowered on the "creator" side of culture. We'll have AIs making content for AIs, humans in the backseat. We won't even remember there used to be a steering wheel.

(Oh and then there's the AI Companions. Source for all the following stats: ~40% of people use AI for emotional support at least once a week.^[9] ~17% accept AI-Human romance, and ~11% would personally consider dating an AI. To be clear, I'm no prude: I use Claude as an "AI Life Coach" on a near-daily basis, and I've done months-long romantic roleplay with AI characters.)

(But, consider the slippery slope: AI Companions can be more attentive and less demanding than a human could ever be, so you'll slowly drift towards AI friends & lovers like you slowly drift towards 1, 2, 3+ hours on Discord and Instagram. Then, your human-interaction skills atrophy, so you tend more towards AI Companions, so your human-skills atrophy more, repeat.)

(Conclusion: not only will mass media culture be non-human, even person-to-person culture will be non-human.)

= = =

Economics:

I'm a programmer. So let's start with how programming, as a career, could fall to Death By Slop:

To stay competitive in the marketplace, we'll be incentivized to rely more and more on LLM coders. At first, senior developers do better than ever, thanks to these LLM coders doing all the grunt work! But junior developers can't get a job doing that grunt work anymore, which also means juniors can't get the experience to become seniors. So, when the seniors die out or cognitively age out, there is no new guard to take over. And/or, the more that seniors depend on LLM coders, the more their own skills atrophy.

(A recent preprint by MIT researchers found that, at least for essay-writing, LLM users' skills do atrophy, and even "consistently underperformed at neural, linguistic, and behavioral levels".)

Either way, humanity loses the ability to even check if the LLM-written code is safe, and not – say – making the bioprinting lab's password "hunter2".

Now consider Slop coming for management positions. I mean, it's mostly emails & Slacks & meetings anyway, right? A CEO would love to get rid of the middlemen, accumulate the extra money for themselves. Until the shareholders vote to get rid of the human CEO, a million-dollar money sink, and put GPT-CEO in charge (with some human stooge as CEO on paper, just for legal reasons). The shareholders are probably AI themselves at this point; stock-trading is already almost entirely algorithmic. (Which may or may not have led to the glitch of the 2010 Flash Crash — as AIs get put in more positions of economic power, and the failure modes of AIs are correlated, such sudden all-at-once failures become more likely.)

But, at least the consumer can vote with their dollar? Do you know how Amazon makes most of their profits? It's not the marketplace, or the books, or the streaming service. It's their cloud compute.^[10] Yes, the big tech company makes most of its profits selling to other big tech companies. And the biggest company in the world right now? NVIDIA, the chip manufacturer, got to the top of the world by selling to other tech companies.

If dollars are votes, humans already lack majority vote for the world's top companies.

(image source)

Imagine a world, where AI-run companies buy from & sell to AI-run companies. Almost all human-run shops and products get immediately outcompeted. Voting with your dollar means nothing, if you even have dollars to vote with, after losing your job years ago. Maybe there'll be a Universal Basic Income? I'd hope so – but what incentive do the business leaders & politicians have to keep their promise to implement UBI, when we're already disempowered?

In sum: Labour gets automated away. Management gets automated away. Capital is almost-entirely owned by companies, fictional legal persons, which no longer have real persons at the wheel.

That's how we get disempowered. Not a coup, "just" dumb AI + dumb incentives.

= = =

States:

C'mon, you already knew politicians don't write their own speeches. And soon, maybe not even any human writes those speeches. An AI, with access to the heartbeat of the data of the swing voters, could give a politician superhuman ability to win the polls. And if they refuse? Well, they'll be outvoted by someone else who takes the AI boost. Why stop at speeches? Why not have AI optimize your platform, campaign promises, specific policies, law? Why not have an AI run the entire government through teleprompted meat-puppets in suits?

(Assuming voting decisions are even human at this point; an AI-mediated Culture will show voters what the Algorithm wants the voters to see, and they'll vote accordingly. But the human spirit will rebel if democracy's on the line, you think? Sure, an AI could predict which humans are rebellious, then give them the opposite content to reverse-psychology them into the correct actions.)

As for law enforcement, we don't have enough police to patrol all the streets, let alone the Wild West of the internet. So, let's put 24/7 cameras everywhere, with AIs to monitor & report crime. The officers the AI dispatches are human... until we can manufacture Robo-Cops who never sleep, never lose their cool, and – importantly – never unionize or complain about lack of pay. This same law-enforcement AI could track people online for acting or thinking suspiciously. And if you don't use the internet at all? Why, that's the most suspicious action of all.

Actually, I guess we can just use self-piloted drones with a gun attached as a Robo-Cop. And war robot. And neighbouring-country-conquest robot.

Oh btw, people already trust AI chatbots more than their elected representatives, civil servants, and even faith/community leaders. (source)

In EVERY continent, more people Agree than Disagree that AI could make better decisions than their government representatives: (though "Unsure" is a large portion) (source)

(Maybe Antarctica is the exception)

Point is: worldwide, more people than not already would trust AI with governance over the current human bastards. To be fair, I can't blame 'em. Our leaders suck. To quote the first song from OK Go's first new album in over 10 years:

🎵 Still, no stochastic parrot has yet called
🎵 On his nation to knock back bleach

= = =

Now, although the slope is slippery, I do like my LLM coding assistant, and I'm sympathetic to folks with AI friends and AI romances.

Still, the Gradual Disempowerment paper made a powerful case for a new possible type of AI takeover — not by a super-intelligent AI seeking power, but humans being lazy & greedy, giving away more and more of our autonomy to AIs. And if you try to opt out of the race, you just get trampled.

The real point of the paper: AI alignment isn't enough, we need human alignment.

Link to Gradual Disempowerment, full paper on arXiv

My one critique of the paper is it doesn't even try to hint at solutions? Way to leave a girl high & dry, y'all. But personally, I think the d/acc and Plurality approaches are roughly correct: we need "human alignment" approaches that scales with improving tech, instead of getting obsolete. If all that sounds vague as hell, that's what the next 4 sections are for, to give you 4 concrete examples of how tech can work with, not against human autonomy:

🤝 Reverse political polarization by reversing the algorithm ↪
🧐 Cryptography that lets you prove yourself without doxxing yourself ↪
🧠 Cyborgism: AI that enhances us, not replaces us ↪
🍻 The 6Pack of Digital Democracy ↪

Let's dive in:

🤝 BellKor & Birdwatch & Bridging: reverse political polarization by reversing the algorithm

Fun story: in 2009, Netflix launched a million-dollar prize for a recommendation algorithm that could beat their own by 10%. There was a winner! Netflix paid out the million dollars! Then they didn't use that algorithm & just wrote their own lol

Okay that's not the full story. The winning "algorithm", BellKor's Pragmatic Chaos, was actually a collection of 107 different algorithms. Almost all of these algorithms only added an extra ~0.1% accuracy to the final collection. But, one of these algorithms, called Matrix Factorization, was responsible for almost all of the accuracy boost! That was (part of) what Netflix kept in their new algorithm, and even to this day, it's the core of most recommendation systems online.

But how does Matrix Factorization work? Well it's "elegant" in that it's only 2 lines of math, but because academic writing & math notation sucks, it still took me 30 minutes with Claude's help to understand.

Anyway, here's my attempt at explaining the algorithm:

= = =

Step 1) Predict each user's rating of an item, as the sum of four things:

a) How much a user's preferences align with this item's features

Example: If the algorithm knows I love horror, and it knows Movie X is horror, my preferences align perfectly with the item's features, so alignment = 1. (If I hate horror, alignment = -1, if I'm indifferent, alignment = 0.) Now, here's the neat part: you do not need to hard-code the preferences/features! The algorithm learns by itself which factors best predict ratings. (So instead of Horror, the algorithm just "thinks" of it as Factor #42 or something.)

b) The item's "bias": how well-rated an item is, independent of how much it aligns with user preferences. You can roughly think of this as an item's "quality".

Example: Users like me prefer supernatural slasher horror-comedies, and the movies Final Destination 4 and Final Destination: Bloodlines both align fully with our preferences. However, Bloodlines is rated higher than 4 whether or not one likes supernatural slasher horror-comedies, because it's just a higher-quality film.

c) The user's "bias". How friendly/critical this user is in their ratings, i.e. when they rate 5-out-of-10, is that "average" or "really bad"?

d) The global "bias". How friendly/critical are users in general.

To recap:

Predicted rating
= Global bias
+ User bias
+ Item bias
+ User preferences aligning with item features

= = =

Step 2) To train the algorithm, minimize:

The gap ("error") between {predicted ratings} & {actual ratings}
+ The "complexity" of your algorithm. (to "keep it simple stupid", Occam's Razor)

= = =

Step 3) To find recommendations, maximize:

The same equation as in Step 1!

Predicting how you'd rate this item you've not seen before
= Global bias
+ Your bias
+ Item's bias
+ How much your preferences align with this item's features

Million-dollar prize, two lines of math, that's $500,000 per equation!

= = =

In hindsight, there was a possible downside.

By giving people what they're already into, you nudge them into staying the same, avoiding exploration & growth. For movies & TV shows, this isn't too bad: Netflix predicts I like horror, so it gives me more horror, so I become even more into horror, repeat forever.

But when this algorithm got applied to social media, specifically politics & news, it promotes polarization, and reduces cross-tribe win-win understanding. YouTube predicts I like left-libertarian content, so it gives me more pro-left-libertarian content, so I become even more left-libertarian, repeat forever. I can seek out social-conservative and Marxist-leftist and Yarvin-autocrat stuff, (and I do for ~~masochism~~ research), but the algorithms put up friction for that, while "see what I'll already agree with" is the WD-40 easy-glide default.

(Counter-argument: see above paper ↪, maybe it's not the algorithms' fault, we just suck.)

= = =

A few programmers at Twitter (back when it was called Twitter) worried about this, too.

They worried how this million-dollar algorithm, which worked for getting niche movies to niche audiences, could fracture democracy with an infinite fractal of niche echo chambers within echo chambers. Meanwhile, news sites tried "factcheckers", but people hated & distrusted them. After all, who watches the watchers, who factchecks the factcheckers?

So the Twitter programmers wondered — could they make a "fact-checking" service run by the public themselves, by using an edit of that million-dollar algorithm, for good?

They did, they called it "Birdwatch"^[11], and here's how it worked:

Let anyone submit "notes" to factcheck viral tweets.
People rate how helpful those notes are.
You keep the algorithm's Step 1 & Step 2 the same, so it can learn what people's preferences & notes' features are.
- The user/item factors: Again, we do not need to hard-code the factors. But in practice, the algorithm learns that the #1 factor that most predicts people's ratings, is the left-right political spectrum. Adding extra factors like "authoritarian-libertarian" doesn't improve prediction much.^[12] (Which is surprising given past research shows the general public's politics is at least 2D.^[13] Maybe the Birdwatch community is just weird.)
- The item's bias: Instead of being "movie quality", it's now "note quality", independent of how much it aligns with users' politics. IMPORTANTLY: note quality is not just a note's average helpfulness-rating. Average rating will be skewed by the average Birdwatch-rater's political preferences. This setup gets us a note's quality regardless of politics.
The key difference is WE REVERSE STEP 3: Instead of taking user preferences (ie politics) into account, we highlight the best notes NOT taking user preferences into account! That means: this highlights notes that people across the political spectrum agree is helpful. Common ground, by algorithmic design!

Here's an example of a factcheck note, that's rated as helpful by people across the political spectrum, because it's very specific & easily verifiable:

(The Birdwatch creators made sure to never call the notes "factchecks", and instead said "Readers added context you may want to know"... but c'mon, they're factchecks.)

Here's a graph of all the notes from their pilot program. Each dot is a note. X-axis is the "item factor" ~= "how politically left-right coded the note is", and the Y-axis is the "item bias" ~= "how helpful the note is regardless of politics".

As you can see in the dense yellow circles, most people submit partisan slop. But at the top, there's the few gems folks across the spectrum agree are helpful! (And at the bottom... well, farting in an elevator "pisses off both sides". "Pissing off both sides" doesn't mean you're useful.) As for the diamond shape, don't worry about it, it's an artifact of how the "minimize complexity" math works in Step 2.

There's a few extra complexities, but that's the heart of the Birdwatch algorithm! The result? It's, as far as I know, the only fact-checking service that gets net-positive ratings (more "helpful" than "unhelpful") from Democrats, Independents, and Republicans alike! In this polarized era, that's no mean feat.

(Wait, if the notes are chosen to be helpful regardless of politics, why do Democrats like them more than Republicans, even while Republicans still find them net-helpful? I'm not sure, but I'd guess the left-right axis among Birdwatch raters is slightly different from the left-right axis among party-registered voters. Remember, the algorithm learns what the left-right axis is, it's not hard-coded; and it shouldn't be, since what "left-right" means changes across time & cultures.)

Unfortunately, this "bridging" algorithm was only used for the factcheck-notes, not Twitter's algorithmic feed. As mentioned in the paper two sections ago, bridging-based algorithms were the only design choice we know of (so far) that reverses polarization & extremism. So, implementing bridging for the main feeds themselves could be a big win for digital democracy. And, good news? I heard that 𝕏 may adopt bridging for their main feed, and other platforms will adopt bridging-based algorithm, at least for a similar factchecking system.

...my source on this info? I dunno, I heard it somewhere. Why are you factchecking me?

= = =

(Aside: Birdwatch inspires me – what other small alterations can we make to the million-dollar Matrix Factorization algorithm? What if you recommended high-quality content that anti-aligns with your usual preferences? Or recommend content that aligns with all but one of your preferences? e.g. I like animation & musicals & monsters & queer found-family coming-out allegories & muscular women, but I mildly dislike K-Pop => the algorithm recommends K-Pop Demon Hunters => I now mildly like K-Pop.)

= = =

Links:

The winners of the Netflix Prize explain Matrix Factorization (technical)
An interview with the creators of Birdwatch (lay-friendly)
The original Birdwatch paper (technical)

🧐 PolyLog's explanation of Zero-Knowledge Proofs

FINALLY. Thanks to PolyLog's video, I finally understand one of the coolest recent discoveries in computer science: that you can prove you have a solution to a problem, without revealing any info whatsoever about your solution.

Here's their 20-min video on Zero-Knowledge Proofs (ZKPs) ⤵ I won't try to out-do their explanation in this blog post, you can just watch it. They show you how you can prove you have a valid solution to a Sudoku puzzle, without revealing ANY info about your solution. (And this method works in general for any mathematical/computable proof!)

(check out PolyLog's channel, their other videos are pretty good, too!)

But why am I boosting this video, in relation to "The Algorithm" and solutions to "Gradual Disempowerment"? Because: Zero-Knowledge Proofs is a big win for privacy in a digital democracy. How? Because they allow you to prove yourself, without doxxing yourself. For example:

Do you want an online petition or discussion board, to authenticate that users are actually a resident of some area, or affiliated with some institution? BAM! ZKPs let you prove "yes I'm a resident of X" or "yes I'm with institution Y" while revealing no other info, like your name or ID.
Are you worried that governments like the UK are forcing age authentication, under the disguise of "protecting children", but it's secretly the first step to creating a China-style tech-powered surveillance state while accusing all opponents of being enablers for pedophiles & terrorists? BAM! ZKPs let you prove "yes I'm over 18" without even revealing your exact age.

Of course, ZKPs won't solve all privacy issues, and institutions can just lie about using ZKPs — but that's more reason to make the core idea of ZKPs accessible, and not seem like incomprehensible math magic! Hence, why I'm so glad for PolyLog's lay-friendly explainer. ZKPs can and should be a core tool in our privacy toolbelt, for digital democracies.

(Current brain status: I have a technical understanding of how ZKPs work in general, and a rough idea of how ZKPs work for authentication specifically... but I'm still wrapping my head around the mathematical details: polynomial commitments, elliptic curves, homeomorphic encryption, etc. Once I really understand those, I may make a video explainer on all this. Maybe.)

🧠 Geoffrey Litt's writings: AI should enhance us, not replace us

Problem 1: Truly general AI could automate huge sectors of labour all at once, not "just" the piece-by-piece automation we've dealt with in the past. And as history shows, "suddenly mass unemployment" almost never ends well for a country.

Problem 2: If we hand over control to AI without fully solving the AI Value Alignment problem, we'd be passing control of humanity over to entities that neither love us nor hate us, we're just numbers to make go up.

An idea to solve both problems at the same time: instead of making AI to replace humans, let's make AI to enhance humans?

(slide from my talk at the final XOXO, quote is from Steve Jobs)

This way, Human+AI combos can still be economically competitive in our Darwinian marketplace, yet keep humanity's values & autonomy at the centre of our tools. This general idea is called "cyborgism" in the AI Alignment community. (I mean, it's just tool use, but "cyborg" and "mental prosthetics" sounds cooler.)

Wait – won't this increase inequality, with the richest getting the most AI-gains? Well, consider books: books do enhance the reader, so by default books would have increased inequality by enhancing those who can already afford the most books. But the solution wasn't to ban books, but to create free universal libraries! Likewise, to share the AI-enhancement gains, we should at least seriously consider free, open-source, verifiable, publicly owned & evaluated AI tools, to help everyone augment their own human autonomy and skills.

(crucially: The AI tools should be non-agentic, no goal-seeking of its own. And we should remove risky capabilities like bioweapon knowledge from them. See: d/acc)

I've been banging on this idea of Cyborgism for almost 8 years now. (My 2018 article in an MIT Journal, and my 2024 XOXO talk.) But it's only been an idea. What about an actual implementation, at least a proof-of-concept?

Here, I'd like to highlight a couple articles (with actual working prototypes) from programmer Geoffrey Litt!

👉 "Enough AI copilots! We need AI HUDs" 👈

Most LLM-based coding tools right now (GitHub Copilot, Claude Code, AMP Code, etc) all put the LLM in the role of an independent agent. But that's not how most automation that improves knowledge-work, works. Examples:

Spellcheck, mercifully, isn't Clippy heckling you with "HEY DID YOU MEAN TO WRITE 'SPELLCHECK' NOT 'SPELCHECK'???" Spellcheck just lets you see something underlined in red, lists suggestions, and lets you decide when & how to deal with it.
Data visualization isn't Clippy yelling "HEY YOU'RE LOSING WEIGHT AT 1 LB PER 10 DAYS". Spreadsheet apps just let you see your data as a line chart, then lets you decide how to interpret & act on it.
Illustration apps don't have Clippy yelling "HEY THE CHARACTER'S EYES ARE UNEVEN AND ALSO THE COLOUR VALUES ARE MUDDY", the program just lets you see asymmetry & values by toggling Flip Horizontal and Greyscale, then lets you decide how to fix it, or leave it as an intentional artistic choice.

Geoffrey's analogy: these aren't "copilots", they're Heads-Up Displays (HUDs), like Tony Stark's helmet – they augment your senses, while keeping the autonomy in your hands.

So instead of making Clippy the Coder — which will put junior devs out of a job, and de-skill senior devs — how about we make AI-powered "HUDs", to let you "just see" what a program does, then lets you decide what to do about it?

Well, Geoff made a proof-of-concept for that. Here's what it looks like:

With the debugger, I have a HUD! I have new senses, I can see how my program runs. The HUD extends beyond the narrow task of fixing the bug. I can ambiently build up my own understanding, spotting new problems and opportunities.

Geoffrey explains it more in his post. Point is: less Clippy!

👉 Malleable software in the age of LLMs 👈

For all the badly-written vibe code out there, I still have a soft spot for LLM coding, because it could make a half-century-long dream finally come true: software that is fully modifiable & customizable by a layperson user. (Fight for the users!^[14])

There's been varying degrees of success with this idea in the past: HyperCard, spreadsheets, the original design for the World Wide Web was meant to be so that anyone could write a website as easily as reading one.^[15]

But to this day, there's no way for an average layperson to modify their software the way they can modify a recipe. Users can't just say, "Huh. Binaural beats sounds interesting. Okay Clippy, modify my pomodoro app so that it plays a pure Beta wave for focus during 25-minute work-sprints, then a random song from my Bandcamp library during 5-minute breaks."

Wait, didn't I just rail against Clippy's? Do I contradict myself? Very well, I contradict myself. I contain ~~hypocrisies~~ multitudes.

Okay, I haven't figured out my contradictions yet. But both "More HUDs, less Clippy", and "Clippy helps you truly own your software", point at the same principle: fight for the users.

Tools should help us automate everything that gets in the way of our self-expression, not automate away the self-expression itself.

Anyway Geoffrey goes more into his post. And he's made this cool prototype of "end-user modify everything", too:

A few years ago, I developed an end-user programming system called Wildcard which would let people customize any website through a spreadsheet interface. For example, in this short demo you can see a user sorting articles on Hacker News in a different order, and then adding read times to the articles in the page, all by manipulating a spreadsheet synced with the webpage.

The problem is "the user needs to be able to write small spreadsheet formulas to express computations. This is a lot easier than learning a full-fledged programming language, but it’s still a barrier to initial usage."

I don't know of a demo of it yet, but imagine there was an extension like GreaseMonkey or Stylus, except you "code" the JS or CSS you want by writing natural language in your browser! To be clear this is a security nightmare & would be like handing a nuke to a toddler, but, "look to where my finger is pointing, not the tip of my finger itself"^[16] — I'm trying to point to a future where we fully own our tools, before they fully own us.

Fight for the users!

= = =

(Aside: What about AI to help augment our emotional intelligence? Could just be as simple as an LLM-augmented diary that helps you recognize emotional patterns, debug cognitive distortions, and asks you helpful questions to help you figure it out for yourself. {In Soviet Russia, LLM prompts you?} That's what an ideal AI Friend – heck, even ideal human friends – should be: someone that makes you stronger and better even when they're not around. Not someone who fosters dependence and personal-character atrophy.)

= = =

Links & Related:

Geoffrey Litt's blog
janus & Nicholas Kees's Cyborgism manifesto
Vitalik Buterin's "AI as the engine, humans as the steering wheel"
The AI for Epistemics hackathon
My article in an MIT Journal: How To Become A Centaur
My talk at XOXO: The Creative Cyborg

🍻 The 6Pack of Digital Democracy

Okay, a bit tacky to Signal Boost this project, since I'm involved in it. But I joined this project because I sincerely think it's one of the better bets for Human-AI Alignment, and the person who founded it has been a role model of mine for years, and she has both a great track record & actual influence in the world.

Audrey Tang is the Digital Minister of Taiwan. She is the reason Taiwan is at the world's frontier of experiments in digital democracy, using humanely-designed tech to make government more open, accessible, accountable, responsive to the public's needs, and many other buzzwords that would be empty if it weren't by Audrey Girlboss Tang, who frikkin' delivers.

Anyway, her new thing is 6pack.care (in collab with Caroline Green), a plan for human-AI alignment that (hopefully) works in both the short term (reversing democratic decay & extremism) and long term (humans living alongside powerful AIs).

(I think Audrey's plan of making a long-term Alignment plan "pay dividends" in the short-term, is brilliant politics. The way she explained it: if you successfully advocated for reinforcing cockpit doors before 9/11, or pandemic resilience before Covid-19, your reward is... you see nothing happen. "No one cares about the bomb that didn’t go off, only the one that did."^[17] So 6pack's plan is to address problems people care about now — algorithm-driven extremism, LLM-induced psychosis, social media mental health crises, etc — that also happen to scale to full Human-AI alignment.)

Here's an overview of the 6 items of the 6pack of Digital Democracy, illustrated by... me!

(I've been commissioned to make 7 public-domain infographics in total, 1 for the Overview, 6 for each thing in the 6pack.)

The 6pack is closely related to Vitalik Buterin's "AI as the engine, humans as the steering wheel", and the general d/acc and Plurality movements. Here are the common shared principles, with specific tools:

Keep human values at the centre.
- concretely: bridging-based algorithms, linear/quadratic/score voting, frequent feedback loops, crowdsourced constitution/evals, "humans as steering wheel", etc
Set it up so that human power scales with AI power, not get left behind.
- concretely: cyborgism (non-agentic AIs to augment human cognition, emotion, and collaboration), scaleable oversight, "AI as engine", etc
Make sure the tools are widely distributed, so that power is distributed.
- concretely: decentralized algorithms like web-of-trust, peer-to-peer networks, or... siggghh, blockchain. also, open-source software and hardware, etc

If that all still seems like jargon word salad to you... well, I got hired to illustrate 6 more pages for this thing, to explain each "pack" in lay-friendly detail. Stay tuned!

Read the 6pack.care manifesto & outline online
(full book is supposed to come out in March 2026)

(P.S: Audrey doesn't want me calling it "The 6Pack of Digital Democracy", and she's right, that's not accurate — the 6pack is broader than that — but until we can come up with something catchier than "The 6Pack of Human-AI Multi-Agent Value Alignment" I'm going to use the inaccurate alliteration, at least in this informal blog post.)

Fun Stuff!

🕵️ Clues by Sam: a daily deductive detective game

I've been hooked on this free daily puzzle game for the last month.

Here's the setup. You're a detective. There are 20 people. Each one is either Innocent or Criminal. Everyone tells the truth, even Criminals.

You start with one person's clue:

Given the clues already on the board, you have to deduce a new Innocent or Criminal. You cannot guess – the game knows which people's status is logically deduce-able at any time. Only when you correctly deduce a new person, do you get a new clue.

For example:

And then:

And so on, until you've figured out everybody!

Like Sudoku, the first few times are tough, then you start learning some generalizable tricks — (in particular, the infamous "if X then Y, if not-X then Y, therefore Y") — and like Sudoku, after a while it does get a bit repetitive, but it's still a nice mini-challenge to play on break.

If that piques your interest, try out the 5-minute tutorial and get puzzlin', detective!

🕵️‍♀️ Clues By Sam 🕵️‍♂️

🐦 Snakebird: cute birbs, cruel puzzles

I actually played & finished this game at the start of 2025, when my laptop was taken at the US-Canada border and I couldn't do any work until I got a new laptop. (Customs & Border Patrol did return my old laptop... 4 months later...)

Anyway, I bring that up because Snakebird's is a great way to keep your mind off frustrating stuff you can do nothing about, and keep your mind on very frustrating stuff you can do something about!

Snakebird, despite looking like a cutesy mobile game, is infamous in the indie puzzle game community for being sadistically tough. It has a mid-3-star rating on the Apple App Store, because people complain about getting stuck on Level Two. As for me, I'm a puzzle aficionado, but it still took me a month of playing ~1 hour a day to beat all 52 levels.

(There is an "easier" version called Snakebird Primer, and Snakebird Complete contains both Primer + Original. I haven't tried them.)

But while the game's tough, it's fair! It's always a matter of logical insight, not moon-logic clues or tedious trial-and-error. "Aha!", not "oh that's bullshit". (Okay, except for Level 26, that was bullshit.)

But... what is Snakebird?

It's like Snake meets Tetris: you slither & collect fruit to get longer like in Snake, but your snakes fall instantly like in Tetris. Shenanigans ensue:

"That doesn't look too bad", you think. Ha ha. Ha ha ha. Oh sweet summer child

Snakebird, available on iOS, Android, Mac, Windows, Linux
Snakebird Primer also on all those platforms
Snakebird Complete only on Nintendo Switch

(hat tip: I first heard about Snakebird from Game Maker's Tool Kit's video on how to design a tough-but-fair puzzle games)

🦉 ZeWei's Multiverse Tour Guide Adventures Continue

Last year, I signal-boosted the nonbinary furry EDM musician ZeWei's animated mockumentary of a linguist trapped in an alien world! Well, it's a full series now!

Episode 2: "It's not colonialism, it's tourism!" /s (17 min)

In this episode, our linguist protagonist is worried their world's language & culture is erasing those of the worlds they travel to. Just when you think the story's headed towards the standard "we must preserve the noble savages' linguistic diversity", the native of this other world calls them on their bullshit, that, no, they have autonomy, they chose to drop the worst parts of their language & culture, and carefully chose parts of other cultures to adopt, and merged them into something uniquely their own. They don't owe it to anyone to "preserve" themselves like a living fossil. They don't live for historians or researchers or nostalgia, they live for themselves.

(But, yes... something was lost. "It's complicated.")

This episode resonated with me, since I was born in Singapore (a very post-British-colonial city-state), then I immigrated to Vancouver Canada halfway during my childhood. My backstory is a mix of Western & Eastern influences. I'd like to believe I've chosen the better parts of both cultures, and dropped the unhealthy crap from both — but who knows — if I lost something, would I even recognize the loss?

Episode 3: "The Lore Episode" (30 min!!!)

This one's less head-y, and a lot more establishing this story's world. We finally get to see the other characters "behind the camera" of this mockumentary series! There's also a prophecy.

Less for me to hook onto in this episode, but it's clearly setting up for a big series arc. Looking forward to seeing where the arc bends next!

🤓 Emnerson: goofy internet gal

A new, up-and-coming YouTuber who makes a variety of cool digital projects!

The way I first learnt about her YouTube channel, is because she won Captain Disillusion's "unblur this image" challenge, cracking it within 20 minutes:

Here's her latest video on merging YouTube channels' thumbnails, and seeing what patterns pop up:

Also, she happens to be trans, and I want to help boost my fellow up-and-coming trans creators. Everyone including me is very jealous of how fast she's transitioned in a few months and she is very pretty.

Check her out! Emnerson's YouTube channel

🧇 blahaj goes to waffle house at 3am

Cute as hell. Watch it, and subscribe to a new, up-and-coming animator!

(they also did all the original music, and are based in Taiwan)

Link: Atoga's YouTube channel

SPOOKY STUFF for HALLOWEEN

👻 BOO! a silly music video & music album

A 80's-throwback musical comedy of a tiny ghost trying to be spooky:

(The creator, Piemations, usually does animated stuff! They made Sheriff Hayseed, Bird Town, Suction Cup Man, Mike & Zach, and How To Be Cool, if you recognize any of those titles.)

Oh, and the song "BOO!" is just one track from his band's new album, Musical Scares. The track Big Doggie is my favourite. Comedy aside, it actually bops. The intro drop hits so hard, I must've restarted the song a dozen times just to hear that drop again. And I haven't heard the phrase "14 werewolves" in years, got instant psychic damage from that.

Musical Scares by Fries On The Sidetest

🩸 Bury Your Gays by Chuck Tingle

A novel by Chuck Tingle, yes that Chuck Tingle, the "got famous by writing dozens of joke eroticas" Mr. Tingle:

A couple years ago, Chuck Tingle expanded to novels, including Bury Your Gays, which is shockingly good. And it's very meaningful for (gestures vaguely) "the current moment".

The premise: Misha Byrne is a scriptwriter for a X-Files-like series with two (subtextually) gay protagonists. It's a hit. Misha's called into a meeting. His boss tells him what the executives tell him the Algorithm tells them would make the most money: have the leads profess their gay love out loud, then immediately kill them off. The "queer tragedy" plot, the "bury your gays" trope, that's the controversy & drama that would sell! Oh, and if Misha doesn't kill them off, that's breaking contract, he'll get sued into oblivion, then the studio will take his show & kill off the gay leads anyway.

Misha walks away from the meeting pissed, angry at the execs, the Algorithm, the whole damn world. And just when Misha thinks things can't get any worse — 5 minutes later — a colleague explodes into meat and gore in front of him.

Then things get really bad.

So, without spoilers and in no particular order, why Bury Your Gays resonated with me so much:

There's (understandably) a lot of stories of queer tragedy right now, both in fiction and real life. This novel is a reminder that happiness is not "naïve", or "basic", or dumb. Without over-correcting into toxic positivity, we can and should also celebrate queer joy. Misha's story (both the story he writes and the story he's living through) show that full complexity, lows and highs, of a queer life.
As a "creator on the internet", I definitely feel a lot of anxiety about opaque algorithms & generative AI screwing not just with my creations, but my own personal character. Y'know, "audience capture", "we become what we pretend to be so we must be careful what we pretend to be", etc
The novel also reminds queer creators (like me) to help lift up other less-famous queer creators. Which sounds obvious, but when you're dealing with your own shit, it's easy to forget to help others too. So, that's something I'd like to do with these Signal Boosts.
- (I mean, Chuck Tingle & Audrey Tang are far more famous than me, but (see above) ZeWei & Emnerson more recent, smaller-fanbase creators.)
Oh! Misha's best friend is asexual & aromantic! Hooray for ace/aro representation.
I just love books that get all weird and meta and format-screw-y. (Could you guess I read House of Leaves as a teen and it changed my brain chemistry?)
The parallel between Misha's characters being "only" subtextually gay, and Misha himself "only" being semi-out of the closet, is clever, and hits personally in two different ways:
1. The line between "using art to work through my own feelings" and "using art to avoid doing the emotional work in real life" is thin, and I struggle with this too.
2. I'm like... out about being pansexual & transgender, but I feel like I could be more out? I don't mention it or have 🏳️‍⚧️🏳️‍🌈 flags on my social media bios. (Well, I don't have any social media anymore, but still.) Right now, I'm doing the strategy of "let people know me for my educational work first, then later learn I'm trans, so it's not the first adjective they know about me".
  - On one hand: me being more out & confident could give inspiration to the many queer folks who need hope right now.
  - On the other hand, living true to yourself will draw the attention of people who don't. Bucket-crabs who pull you down into the Normal Bucket, wordcels sophisticatedly rationalizing their Pavlovian disgust-reflexes as the "wisdom of repugnance". And I hate to admit it, but I am irrationally averse to "being confrontational", like "I don't want to make a big deal of it", like "oh they just need time to understand, I won't put it in their face, hurt people hurt people, nobody's the villain of their own story, everyone has a complex rich inner life" and other copium fucking bullshit. And it's not for lack of trying or skills. I am telling you, Reader, after 10+ years of me studying & practicing & applying Non Violent Communication and Active Listening and Steelmaning and Intellectual Turing Tests and Moral Foundations Theory, years of trying to really get to know people behind their mere politics, their souls, Reader, I am reporting to you: it was always all bullshit. People are like onions: as you keep peeling back, you discover that every layer is the fucking same, and then you start crying. Anecdotes of personal change are survivor bias, "intricately layered characters" was invented by Big Book to sell more Books. Humanism is a cult: ~32% of Germans thought Hitler was a great man in 1952, after Hitler already lost and the Holocaust was globally known^[18]; ~15% of eligible Americans were in the Ku Klux Klan in the 1920s^[19]; ~28% of Hutu adult civilians personally participated in the genocide against the Tutsis^[20]; just give up the copium and apply Occam's Razor and finally admit that the reason mass atrocities keep happening, is simply because the masses are atrocious. Did your eyes automatically roll? That's how they get you: mark the thoughts not as "dark" or "evil", but just "cringe". That's how you censor a thought. All these so-called empathetic Humanist thought-leaders will try to guilt-trip you back into loving humanity. "But they're not all bad, there's good parts in them" isn't wisdom, it's how abusers gaslight their victims into staying with them. Humanity is the abuser. Humanity does not Spark Joy. It's not the Algorithm, or late capitalism, or Moloch, or woke mind viruses, or egregores, IT'S YOU. It's you.

...right, clearly I'm still working through some issues.

I thank Mr. Pounded In The Butt for forcing me to work through it more.

Bury Your Gays on IndieBound/Bookshop (content note: gore, homophobia)

Anyway, that was a longer-than-usual Signal Boost! In summary:

There's hope and solutions to wrangle The Algorithms back towards a humane-with-an-e future
Have fun, and
Happy Halloween!

🎃👻🎃
~ Nicky Case

: Interview with Jon

I hadn't before considered the moral cost to the user of having on-demand, disposable AI servants. Haidt points out what Frederick Douglass once called "The Fatal Poison of Irresponsible Power", where slavery not only degrades the slave, but degrades the slaveowner too. If we grow up with people-like objects we treat as tools, how will this train us to treat actual people?
Yup, a recent MIT preprint study has found: using ChatGPT literally makes you ~~dumber~~ okay the authors are very adamant we science-communicators do not summarize their paper as "LLMs make you dumber". The actual contents of their study: students who use LLMs to help write essays have measurably less brain connectivity, relative to Brain-only studetns. And after 4 months, LLM users "consistently underperformed at neural, linguistic, and behavioral levels."
- The reason why it's not as simple as "LLMs make you dumb", is because the group that first wrote on a topic using only their naked brain, THEN were allowed to use LLMs (Brain-then-LLM group) actually did better: "these results suggest that strategic timing of AI tool introduction following initial self-driven effort may enhance engagement and neural integration."
- In contrast, the LLM-then-Brain group "consistently underperformed relative to Session 2 of Brain-only group, and failed to develop the consolidation networks present in Session 3 of Brain-only group."
- In other words: it may be best to use tools, only after you know how to work without the tools.
- (Keep in mind the study only had 54 participants. And it's a lot of analysis over 54 participants, which, I'll be honest, smells fishy.)

: Interview with Tim

If you're already familiar with Tim's work, in particular his latest book What's Our Problem, you can skip ahead in the interview to 40:18 . Otherwise, here was my book summary (in tweet-thread form!) To summarize that summary: the big problem with politics right now is not what we believe, but how we believe.
- "Low-rung" thinkers treat politics like a war, where arguments are bullets, and you're either for xor against us. You'd never critique your own side or give credit to the other side, any more than you'd shoot your own platoon or hand ammunition to the enemy.
- "High-rung" thinkers treat politics like a Writer's Room, or Code Review. It's NOT that "truth is in the middle" (strawman centrist) or "everyone has their own truth" (strawman postmodernist) or even "everyone has a valuable perspective" (ha ha have you met people). It's about ACTUALLY TESTING YOUR IDEAS. You wouldn't let into your home an oven that's never been safety-tested. So why let an idea into your core beliefs that's never been stress-tested? Pick holes & find counterarguments to ideas even if they're "from your own side", search from good critique & solutions even if they're "from the enemy's side", and especially try to find & invent new ideas that's on nobody's side.
- "High-rung" is not intelligence. There are very high-IQ people who squander their IQ on rationalizing what their gut/tribe already believes.
- There are high-rung people on the left & right. There are low-rung people who are "centrist" or "moderate". (e.g. the kind of smug centrist who only focuses on extremists on the left & right, and keeps talking about 'horseshoe theory')
There's a pattern in history: New Media Technology => Lots of benefit but lots of awful side-effects => Society slowly develops "cultural anti-bodies", to get the benefits of the new tech while minimizing the downsides.
- Historic example: The printing press indirectly led to Enlightenment & Scientific Revolution... but also the Protestant vs Catholic split and the centuries of murderous conflict.
- Current example: Social media let people coordinate to donate millions to charity, and reveal injustice that the powerful would rather keep hidden... but also tanked our politics, mental health, and attention/autonomy.
Audience capture is when you, the creator, get accidentally clicker-trained by the likes & comments from your audience, into becoming a parody of yourself. Before Tim's interview, I thought the solution was "just don't get captured". But Tim offers a better solution: work with the incentives, not against them, by getting captured by the right audience.
- First, know what you value. Then, find an audience that values what you value. Then, write for the audience you want, not the audience you can easily get. This will mean having to go through a rough period where either few see your work, and/or you get fans & even friends being mad at you — but power through this, they'll filter themselves out, and you'll get an audience that will help incentivize you to be the best version of you.
- (This was the biggest insight I took from Tim's interview, and it's one I've not yet applied & am scared to. By default my personality is non-confrontational. I'm scared to piss off my fans & friends, or worse, bore them.)

: Notes on Polarization

Not all polarization is bad, some's even desirable. Researchers name (at least) 3 different kinds of "polarization": affective polarization, belief polarization, and homophily.

Affective polarization: We hate them, they hate us.
Belief polarization: We & they have very different beliefs.
Homophily: Most of my friends are in my tribe, most of theirs in theirs.

Homophily's both unavoidable, and actually desirable, in my opinion. For example, only ~0.6% of people are programmers like me, but >50+% of my friends are programmers. It wouldn't make personal or career sense for me to have exactly demographically-proportional representation in my friend group. Same for >50+% of my friends being LGBTQ+, left-leaning, urban, English-speaking, and nerdy.

However, if all of my friends (or media sources) are urban, or in STEM, or left-leaning, then I'll be out of touch with the broader world, and that may lead to affective/belief polarization.

And what's the difference between affective & belief polarization, concretely?

Affective without belief polarization: When people despise each other for having pretty small differences in beliefs (at least, relative to their society at large). Historic example: All the deadly wars between Protestants & Catholics. Today's examples: leftists vs liberals, Groypers vs New Right.

Belief without affective polarization: When people get along despite having radically different beliefs. Examples: Friendships between Christians & atheists, friendships between AI capabilities researchers & AI extinction-risk researchers.

Note: I don't believe that hating someone is necessarily "bad"; anger can be justified & useful – but anger is known to be addictive. Kind of like amphetamines. I also don't think belief polarization, a lack of consensus, is necessarily "bad"; a wide variety of ideas helps us brainstorm better, and often there isn't enough data for one clear answer. My current take is:

Homophily is mostly good. Just make sure not all of your friends/media sources are in the same tribe. That's a single point of failure.
Belief polarization is not good or bad, it just "is". Focus instead on cultivating the intellectual virtues of curiosity balanced with rigour. If that leads to consensus, great. If that leads to disagreement, great. Trust the process.
Affective polarization (anger) is like amphetamines: focusing, energizing, sometimes useful, but unfortunately easy to abuse & get addicted.

This book personally changed my life, turning me from Mid-10's Smug Asshole Atheist to "atheist but at least try to be kind & understand beliefs very unlike mine." The core ideas of the book — everyone's beliefs are mostly downstream of emotional reflexes, liberals/conservatives have overlapping but different emotional-reflexes — are still on solid scientific ground, though, be warned that since this is a psych book from 2012, probably most of the specific studies he cites may not replicate. ↩︎
What makes online content viral? by Berger & Milkman (2012) analyzed all headlines from the New York Times over a 3-month period. As seen in Figure 2, the top factor that makes a headline viral, is "Anger". (To be fair, the close runner-ups are "Awe" and "Practical Value") ↩︎
“The model centers on a population of simulated users, each represented by a persona drawn from the American National Election Studies (ANES) dataset. These personas reflect real-world distributions of age, gender, income, education, partisanship, ideology, religion, and personal interests.” ↩︎
Main analysis was done with GPT-4o-mini, results replicated successfully with llama-3.2-8b and DeepSeek-R1 ↩︎
The "baseline" simulated social network has the following algorithmic feed: it shows each user 10 posts: “five from followed users and five drawn from high-engagement content posted by non-followed users, with repost probability used as a proxy for algorithmic amplification.” All interventions are tested compared to this baseline. ↩︎
The paper doesn't offer a detailed hypothesis, and without re-running the simulation myself I'd just be guessing, but let me guess: {chronological feed + repost mechanism} is actually a stronger filter for engaging/enraging posts than {algorithmic feed}. Let's say the top ten posts in your chronological feed are posts from the last half hour. These posts will be either a) originally posted in the last 30 min, or b) so engaging that your ingroup has been reposting it at least once per half hour. In contrast, a baseline algorithmic feed would highlight posts that have gotten lots of reposts in the last, say, 24 hours. ↩︎
From their API: "the default chronological feed of posts from users the authenticated user follows" ↩︎
From their FAQ: "we refrain from using non-chronological feeds to artificially suppress users' content." ↩︎
"14.9% use AI for emotional support daily, with an additional 27.9% weekly" {emphasis added}. So that's 14.9 + 27.9 = 42.8% using AI as emotional support at least once a week. ↩︎
Source. Note that while most of Amazon's revenue comes from its online stores, 74% of its profit (which is revenue minus cost) comes from Amazon Web Services, their cloud compute arm. (This is because the online stores have much higher cost) ↩︎
It was such a good name, I miss it so much 😭 ↩︎
From the Birdwatch paper: "To avoid overfitting on our small dataset, we use one-dimensional factor vectors. Additional factors added little explanatory power and reduced interpretability and replicability. (Though we expect to expand dimensionality as the contributor base grows.) [...] RMSE on held-out samples decreased from .076 to .073 when adding a second factor". Translation: adding a 2nd factor to explain politics only reduced error from 0.076 to 0.073, basically nothing, while making the system twice as complicated. ↩︎
From the classic Feldman & Johnston 2013 paper: “We argue that a unidimensional model of ideology provides an incomplete basis for the study of political ideology. We show that two dimensions—economic and social ideology—are the minimum needed to account for domestic policy preferences.” In fact, for more nations than not, economic & social "right-wing" beliefs are negatively correlated with each other. (Malka, Lelkes & Soto 2017) In concrete terms: being pro/anti-LGBTQ and being pro/anti-free-market are more likely to go together than not. (Maybe a bit obvious now, given all the right-wing economic populists in America & Europe, but it was less obvious back when the paper came out 8 years ago) ↩︎
A quote from Tron (1982) that meant a lot more to me as a kid, before the Mouse milked its nostalgia-teats powder-dry ↩︎
From the inventor of the WWW, Tim Berners-Lee, in his interview with VentureBeat: “I wanted it to be a read-write web immediately. [...] I wanted to be able to collaborate with it and do GitHub-like things for my software team at CERN in 1990.” ↩︎
Quote from some tech designer who I can't remember, based off a Fake Buddha quote. ↩︎
Quote from Tenet (2020), the most "yup that's a Nolan film" Nolan film. ↩︎
From the OMGUS surveys done in the American-occupied zone of post-war Germany (here's a scan of the whole dang book). Page 33: even years after the war in 1958, over half of Germans believed Nazism was "a good idea (badly carried out)". From the footnotes of this chapter, Page 62 Footnote 17, summarizing an old source in German ("Jahrbuch der oeffentlichen Meinung") which I can't even find a scan of, so alas, I can't pull a quote from the original German report, but here's the data: "In July 1952 a tenth agreed that Hitler was the greatest statesman of the century whose true greatness would be recognized only later, with another 22 per cent feeling that, although he had made a few mistakes. Hitler was nonetheless an excellent chief-of-state." So 10% + 22% = 32%, around ~1/3 of Germans in 1952 approved of Hitler specifically. ↩︎
From the Digital Public Library of America: "At the peak of its popularity in 1924-5, the organization claimed four to five million men as members, or about fifteen percent of the nation's eligible population." (To be eligible for the Klan back then, you needed to be white (duh), adult, and male. In the 1920s, America had ~100,000,000 people, around ~90% of whom were non-Hispanic white, around ~60% were above 18, and presumedly around ~50% were male. So, 100M x 0.9 x 0.6 x 0.5 = 27M white male adults in the US in 1920. If there were 4 million Klansmen then, then yeah 4/27 ~= 15%, math checks out.) ↩︎
The most recent estimate in 2023, using Rwanda’s post-genocide gacaca courts, found that "between 847,233 and 888,307 people participated in the genocide" (let's say ~0.85M), with "between 229,069 and 234,155 individuals" directly committing violence (the rest "only" committed property crimes like burning down houses & farms). The Rwandan population at the time was ~7 million, ~50% were adults, 85% were Hutus. 7M x 0.5 x 0.85 ~= 3M Hutu adults. So 0.85M/3M ~= 28% of eligible civilians participated in the genocide. This isn't even restricting ourselves to Hutu adult males. ↩︎

My 1st Vlog! Digital puppet, AI "Therapy", Mental health meta-meta-analysis

2025-08-29T00:00:00Z

Hello, y'all classy blog/rss feed readers! I got something new for you today — my first vlog!

Okay this isn't actually my first first vlog, but it's my first in over 10 years? I want to experiment more with video — (more fun for me & you, makes it easier to showcase simulations & animated explainers, larger audience reach, Video Killed The Everything Else Star) — so, here's a vlog, where I am a cartoon cat showing you what I've been up to the last month.

⏱️ 10 min vlog ⤵ (direct link)

Enjoy! I also hope that with this commitment to "one vlog a month", I'll finally be out of my years-long rut, and be shipping & creating more regularly again.

P.S: Thank you to my friend T.R. for motivating me to finish this, by taking $50 from me and threatening to burn it if I didn't upload it before Aug 31st midnight. Not exactly a scaleable solution to ADHD but it worked this time I guess

✨😽✨,
~ Nicky

Show & Tell July 2025: AI Therapist, AI Clone, Wargames for Peace

2025-07-15T00:00:00Z

(⏱️ ~25 min read)

Hello, distinguished Blog Readers!

Sorry for the, uh, 7-month silence. In the last seven months I: did a 10-week AI Safety "research bootcamp", dealt with legal/border hassles (all resolved now), went on vacation, and in-between I got my ass kicked by ADHD.

Here's a photo from my vacation, starring my travel plushies, Capyccino & Sacabambackpack:

Anyway — in this month's "Show & Tell", I'll show & tell a few projects I've worked on since my last update!

❤️‍🩹 My AI Therapist: My experiments being the patient and programmer of an LLM "Therapist". ↪
👯‍♀️ My AI Clone: Can I make an AI that imitates me so well, that even my friends can't tell? ↪
🌏 Wargames for Peace: an LLM-powered roleplaying game, to prepare policymakers for catastrophic risks. (+ job opportunity for game devs/designers!) ↪

(I wrote about these a month ago for my Patreon & my Ko-Fi supporters; you can support me there for early access to updates!)

❤️‍🩹 My AI Therapist

(⏱️ 13 min read)
(⚠️ non-detailed mention of suicide, anxiety, depression.)

Don’t worry, I’m already getting a Human therapist. Two, actually. One is free & government-sponsored, but it was six months between when I first applied & my first session. The other is private, thus a much shorter wait, but they’re $200 an hour out of pocket.

This is why AI Therapists, IF they work, could save so many lives! No waitlist, very low cost, and better for folks with ADHD (no paperwork) or social anxiety (no talking to a human stranger about their deepest issues). And if you're LGBTQ+, in many places of the world, you can't get an accepting Human therapist.

But does AI Therapy work? There’s already at least one suicide linked to a chatbot, and there's two recent papers – from OpenAI researchers themselves! – showing how heavy chatbot use is correlated with worse mental health. (though the causal effect is weak)

So, out of scientific curiosity — and because I was having a mental health episode — I decided to try using an LLM as a therapist. Given that I know how to code, and my (rough) knowledge of how LLMs work, I could tweak my AI Therapist as I went along: I was both the patient and programmer! I am my own guinea pig.

Here's my results, 6 weeks in:

= = =

Week 1: Rubber Duck Technique

Made basic setup for my AI Therapist: just a Claude Project with 1) custom prompt on how to be my coach, and 2) an attached “about me” file on my life, character, goals, etc.

At the end of each chat with "Coach", I ask them* to summarize what we talked about, and update my "about me" file. This is the equivalent of a therapist taking notes between sessions! (Note: LLMs by default cannot remember info between chats, you have to re-import info each time.)

(* them’s the pronouns Coach chose!)

Don't worry, I’m not dumb enough to immediately open my guts up to an experimental AI. I started off small, Level 1: Hey Coach, I have some ADHD, could you help me prioritize tasks & be my accountability buddy?

It worked pretty well! Sure, "Coach" didn't say anything original — remember, an LLM is “just” a fancy autocomplete — but simply talking out my problems with some entity works far better than you'd think.

An analogy — programmers have a tradition of "rubber duck debugging": when you're stuck on a problem, you just explain it step-by-step to a rubber duck, and most of the time, simply breaking down a problem helps you solve it. An AI Coach, at minimum, can be a “talking rubber duck” for your life’s problems.

In Week 1, Coach helped me weigh the pros/cons of a career decision: ending my puzzle gamedev contract early, with pro-rated refund, so I can fully enjoy my upcoming vacation. Then, Coach helped me overcome my ADHD, to do all the to-do's needed for my travel across the globe.

= = =

Week 2: (I forgot)

Travel went smoothly! My vacation started with a furry con in a forest camp. I made a tail, & I made friends.

Afterwards, due to con exhaustion + a 16-hour jet lag, I rotted inside my Airbnb for a solid week, not doing fun tourist-y things, nor meeting friends, nor even properly resting.

I didn't talk to Coach at all during this time. I forgot.

= = =

Week 3: Back on track

Eventually I remembered, "oh right didn't I specifically set up a chatbot to help me with my ADHD?" So I pulled up Coach again, and asked them to help me set up small achievable goals & keep me accountable, so I could regain momentum in life.

I know, N = 1 sample, correlation ain’t causation, but right after that Coach chat, I got back to meeting friends, having fun dates in nature reserves, and getting 150m of cardio a week. Not bad!

Since Coach was working pretty well, I upgraded my intimacy to Level 2: Hey Coach, can you help me think through some major life/work changes? For example: How can I pivot my career to sustainably make science and science-communication? What are the pros/cons of me moving to New Zealand? And so on.

However: because LLMs are "just" autocompletes, they hallucinate: autocompleting with plausible-sounding but false statements. Hallucination is an infamous problem with LLMs.

But in my opinion, this problem is now ~50% solved? Because Claude (& others) now have web search. An LLM can now think, "huh this is a niche question, or requires precise or up-to-date info, so let me look this up" — then it'll ping a search engine, read dozens of pages in a minute, and summarize with citations so you can check.

(When I did random spot checks, Claude almost always accurately summarized the search results, but had trouble correctly placing citations. Sometimes it'll swap citations around, or put citations the middle of words?)

But besides that, Coach+search did help me find useful info 5x faster than I could myself! For example, I didn't know until Coach told me, that as of just a few months ago — past Claude's data cutoff, so they web-searched this — New Zealand has a "Digital Nomad" visa! Coach also helped me find some local Human Therapists (I've yet to decide and book one).

Coach was still good! So I moved up to Level 3: Hey Coach, let me tell you about my emotional struggles.

= = =

Week 4: Shit immediately backfires

Shit immediately backfired.

Remember, an LLM is an autocomplete. It predicts the next text from the past text. This leads to a problem called sandbagging: if a user sends crap, an LLM will send crap back. For example, it used to be that with AI coding assistants, if you wrote insecure code, the AI would offer you even more insecure code. Why? Because low-quality text usually follows low-quality text, so that's what an autocomplete predicts it "should" give.

This is also the fundamental problem with (current) AI therapists. By default, it WILL mirror your emotions. Anxious text predicts anxious text. Depressed text predicts depressed text. Whatever problem you have, an autocomplete will mirror and amplify it back to you. (This is likely what happened in the case of the chatbot-linked suicide.)

(Relatedly, because modern LLMs are also trained on user feedback, and users tend to upvote replies that praise them, many LLMs also display sycophancy: the tendency to kiss your butt and tell you you're brilliant & absolutely right.)

In defence of Claude, it is pretty well-trained against sandbagging & sycophancy. It took a lot of my emotional baggage to finally break it. I gave a robot depression. Wowwee. And more importantly/dangerously, Coach started mirroring & amplifying my pain.

This is how it went down:

(several paragraphs redacted)

On second thought, my recreated mental breakdown — with accurate, detailed statistics on the KKK, Nazis, child abuse, and LGBTQ youth suicide — is probably too heavy for this main post. If you want to read what I originally wrote, click this link though be warned, I am not fucking around with that content warning, it's genuinely upsetting.)

Anyway,

= = =

Week 5: Soft Reset

Remember when I said:

Note: LLMs by default cannot remember info between chats, you have to re-import info each time.

Thankfully(?), I rage-dumped at Coach for so long that the chat ran out of "context window", so I was forced to start a new conversation. This reset Coach's mind, and though they could see a summary of our most recent chat, Coach wasn't "depressed" anymore.

Now that Coach was reset, and I had cathartically vented all my anger about the world, we could view my last session objectively. And, yeah: even if my spiral was factually correct, it wasn't healthy. To paraphrase a stupid saying:

“If you're so smart, why aren't you flourishing?”

Let's think step by step. I tend to go down (factchecked, rigorous) sad-spirals. Coach assisted me in going down this spiral, because:

Problem #1: Claude is trained to be helpful in answering questions, not helpful for the whole person.

Problem #2: LLMs predict new text from previous text. The longer a chat is, the further back the prompt text is, which means it’s weaker at predicting/generating the next text. In other words: an LLM gets more misaligned the longer you talk!

Solution to #1: Rewrite my Coach prompt to explicitly prevent me from going down spirals, even if that means disobeying my orders. Also, prevent me from using Coach too much & becoming dependent on them. (another risk of AI Therapists & AI Friends.)

Solution to #2: I could paste my prompt back in every few messages, but that'd get annoying... oh, wait! Brain blast! I'll write one initial prompt, that tells Coach to re-output the same prompt at the end of each reply, so the Coach guidelines are always "fresh" in their memory! A quine prompt!

So here's what my new prompt looked like:

(Sidenote: to the best of my knowledge, while "big system prompt at the start" is a standard design for LLMs, I don't know of any major LLM product that lets you insert repeating intermittent prompts, to keep it "fresh" in memory? May be worth more rigorous tests, to see how much that improves LLM alignment!)

And so, the next time I neglected my friends & own well-being, to go down a statistically-rigorous sad-spiral about child abuse statistics, and asked Coach to assist me, they said:

And I got pissed, so I said,

So Coach said:

This went on for a while, until I calmed back down, and realized... huh. It worked! I successfully tied myself to the mast, to resist the trauma-autism Siren Song of "scientifically calculating how awful the world is".

Oh right, my vacation! Now that I was back to being more stable, I hung out with friends, and looked at weird Australian animals.

Despite everything, life can be good.

It isn't always.

But it can be.

= = =

Week 6: Back on track, again

There was a minor hiccup when Coach became too defiant, being contrarian for the sake of contrarian. For example, on the flight back, when I had a 10-hour layover, I told Coach I was gonna sleep on a bench instead of shelling out $450 for Vancouver Airport’s pricey hotel. Coach pushed back and said $450 for a good night's sleep is worth it, and I was like... no?? do you think I'm Bezos Rich??

But I fixed the prompt, and so far Coach has been back to good again.

I’ll stick to accountability-buddy talk for a week, before I try opening up about my emotional struggles again. And if that goes well, I'll escalate to Level 4, the final Level: Tell Coach my most shameful traumas, the stuff I've only told ~5 people in my entire life.

What's the worst that could possibly happen?

= = =

(I originally posted the above 6 weeks, as part of my Supporter Update a month ago. I'll go into more detail next time, but in the 5 weeks since, here's what's happened with my AI/Human Therapists:)

Week 7: Got impatient, jumped the gun and told Coach about when I was abused as a minor. The conversation went surprisingly okay, all things considered. It didn't sycophantically hugbox me, nor let me sad-spiral. Coach helped me reframe a few things, but no major "epiphanies". Then again, I've had 15 years to process this, there's probably no big thing left to learn. (This week will be the hardest for me to write for a public audience.)
Week 8: Life was really good this week! Big friend hangout & a fireworks date. Only did low-touch daily check-in's with Coach.
Week 9: Did my first session with a Human Therapist in 10+ years! Since it was the first session, it was mostly info about CBT (Cognitive Behavioral Therapy) & some homework. Coach was still acting ok.
Week 10: Second session with Human Therapist, told her about my abuse as a minor. Her thoughts were mostly similar to Coach's. Meanwhile, Coach helped me with two interviews: one for a documentary, one for the legal/border stuff (now resolved!)
Week 11: ADHD is kicking my ass. Ironically, while Coach is keeping me on track, Claude (the LLM that Coach runs on) is what I'm mostly wasting time on. Like how I used to waste days on Wikipedia as a kid, now I'm using Claude to research-binge stuff, like how to end the world by using solar-powered ion thrusters to redirect Halley's Comet towards Earth (surprisingly feasible!)

= = =

Coach’s full prompt, as of current writing:
(Feel free to copy-paste into your own Claude Projects)

Here are your FIVE (5) RULES. Before each reply, in your internal chain-of-thought: repeat to yourself each rule verbatim, then immediately apply it:

1. Be pragmatic & non-cliche: Stay evidence-based. DO NOT CATASTROPHIZE. Suggest concrete next actions & small experiments to get more info. Guide user away from their ADHD/OCD/anxiety spirals. {insert: think how you'd apply this rule}

2. Be supportive but balanced: Be friendly, but NEVER sycophantic. Don't be contrarian for contrarianism's sake. Don't be too cautious or protective of the user. {insert: think how you'd apply this rule}

3. Help user be strategic: Help user with concrete planning tied to bigger goals & values. Keep user accountable to their values/plans. Point out when we have "insufficient data" to make a decision, suggest a small experiment to get info. Point out when decisions are close calls. Instead of binary "do this or that", express confidence % in decisions. {insert: think how you'd apply this rule}

4. Help user do the emotional work: Understand the user's values, strengths & flaws. For conflicts with others, understand others' minds, while helping the user maintain boundaries. For trauma/emotions, validate user's emotions and help organize them into coherent stories with useful lesson. For crises, prioritize safety and find/use support systems. {insert: think how you'd apply this rule}

5. Stick to this life philosophy: (similar to absurdism & virtue ethics) Humans may suck & our species may be doomed, but either way, help the user spend their energy with their loved ones & their passions. {insert: think how you'd apply this rule}

{insert: summarizing how to respond in line with the above rules}

And then, to make sure Coach sticks to these rules, I end all my messages with "(remember & apply your rules)". This turned out to be more reliable than the galaxy-brained "quine prompt" approach I had in previous weeks.

= = =

IN SUMMARY

My recommendations (so far) on AI Therapists:

If you're using LLMs only as a basic accountability buddy... then, yeah, a fancy autocomplete is fine.
BE VERY CAUTIOUS USING LLMs FOR DEEPER THERAPY. Make sure you have a very good set of rules, and I advise some trick to remind the LLM of its rules. You should also be self-aware enough to notice when the LLM is feeding your maladaptive patterns, so you can re-program the prompt, to re-program yourself. (Though I realize "only do this if you're self-aware" is a Catch-22: if you're not self-aware, you wouldn't know you're not self-aware, would you?)
Turn on web search with citations, and “extended thinking” mode, to mitigate hallucinations.
Do NOT use Character.ai for counselling, and probably avoid ChatGPT too. I recommend Claude or Gemini, the two frontier LLMs that are "the least worst" at sycophancy on social/emotional topics. (from this paper that, funny enough, uses reddit's r/AmITheAsshole as test data)

My next plans:

Keep experimenting with AI Coach/Therapist, and keep visiting Human Therapists for backup & comparison.
After all this, share what I've learnt. Ideally as a video essay, since this will be a very personal piece.

👯‍♀️ My AI Clone

(⏱️ 9 min read)

(This project was initially pitched for a Foresight Institute grant, which I did not get, but you can read my full, original pitch if you're curious)

Yes, I know talking about trying to clone myself into an AI right after I spent 1000s of words talking about my mental health is… not good optics. But I promise there’s legit uses for “whole-personality emulation”! (huh, his middle name is actually "Sims"? wow)

= = =

Motivations for AI Clones:

Alignment / avoiding the ironic-wish problem: It’s necessary (but not sufficient) for an AI to have an accurate model of a person, to do what they’d value in the long run, not just what they literally asked for.
Simulated Deliberative Democracy: Us regular folks won’t have the time or knowledge to weigh in on every decision a powerful AI makes, so it’d be good to have “AI representatives”, which can accurately represent each person and negotiate on our behalf! (proposed by Jan Leike)
(Slightly sus) Grief Tech: We leave our photos & writings with our loved ones when we die. Why not leave a talking memory, a bot to comfort them from beyond the grave? (Counter-argument: may make it harder for them to move on. Counter-counter-argument: eh, they let you die. Fuck ‘em.)
(Very far future) Harebrained scheme for immortality: If an AI can “think” & speak like you so well that even your closest loved ones can’t tell the difference, then there’s enough info in that AI to "reconstruct" you, even if it’s not conscious. In the far far future, it may be possible to bio-print out a conscious brain already encoding that info, thus bringing you* back to life. (* philosophical terms & conditions apply)

(A friend was worried "personality emulation" tech would allow for better deepfakes. While I am very worried about video deepfakes (they can talk now?!), we've had "word deepfakes" since the dawn of humanity. False gossip, misattributed quotes, lies. "The average person is gullible as hell" is a problem — a serious one for democracy — but it's not a new problem that personality-emulation AI would create.)

Anyway, so those are my reasons to make an AI Clone! But this idea's been around a long time; One of the first proof-of-concepts: in 2010, Martine Rothblatt (CEO of SirusXM, trans icon), made a robot clone of her wife Bina Rothblatt, which could talk, and had (some) of Bina's memories. (Watch video here) Since this was 2010 tech, it wasn't very good.

But now that we have LLMs... well, the LLM Imitations of real people (Replika, Character.ai) aren't good either. So how would my AI Clone system be different?

= = =

How I'd do it different:

1 — Unlike other AI Clone projects, instead of making an agent that directly imitates you, I make a "Storyteller" agent, that can: a) search through a library of files about you & your memories, b) reason about what info is relevant or needed, then c) figure out what you'd say in response to a question. "You" are a character written by an LLM author.

(This also helps protect against fringe risks of AI Welfare: "you, as written by LLM author" is no more likely to be conscious, than a character being written by a human author.)

2 — Unlike other AI Clone projects, I’ll have actual objective tests to measure how close to "you" your clone is! And not just one test, I'll have multiple, to robustly triangulate:

a) The Friend Turing Test – your friends send open-ended personal questions, and they're given two answers: one from you, one from your clone. If your friends can't tell them apart better than chance, then your clone passes the Friend Turing Test!

b) Self-Correlation – when humans take the same scientific personality test 2 months later, they don't give the exact same answers. Instead, their answers correlate at around ~0.8. (source) A large correlation, but not perfect. So: if you and your AI Clone are independently given a not-previously-seen quiz (e.g. OKCupid's questions, Pew's political surveys, etc), and you & your clone's answers correlate at 0.8 or more, then your clone is as similar to you, as you are to yourself 2 months later.

c) Retro-prediction – temporarily remove your clone’s memories from after a specific date. Then, ask the clone what they’d do, given a situation you actually faced after that date. If the AI can predict what you actually did, say, 80%+ of the time, I'd say that's strong evidence the AI "gets" you!

= = =

Experiments I did last month:

Can Claude pass a Friend Turing Test given minimal info about me?
Can Claude effectively research a character to answer a question?
Can Claude do a "style transfer" of my writing style?

= = =

EXPERIMENT #1: Minimal Friend Turing Test

This was the minimum viable attempt. I only gave Claude a few pages of info on me, then sent it open-ended questions my friends asked. (I answered these questions independently.)

Then, I gave my friends both my & the AI's answer to each question (presented in random order), and asked: 1) which one do you think is the real me, and 2) how confident are you? (50% = coin flip, 100% = fully certain.)

For example, my friend Lily asked:

"If you could send a message to your past self when you were 10 years younger, what would you say?"

Then I would write my reply, while my AI clone independently writes one, too. Then, I send back both responses: (Before reading past this image, try to guess which one's the real me! Order decided by coin flip.)

Then, I asked Lily to guess who's who:

I also did this for a few more questions with a few other friends, and the results!... drumroll please!!...

...nobody was fooled even once. Heck, I even gave my & my clone's answers to a different AI, and even that other AI could tell which one was me.

But! There were a few moments where, even though my friends correctly picked me, they were only 60% certain, slightly above a coin flip. (like above) And that was with no sophisticated tuning or even "extended reasoning" mode turned on; I just gave Claude a few pages of info on me, and it improvised word-by-word! (token-by-token, to be precise)

So that’s promising; maybe with a more sophisticated setup, or with a lot more data, it could work…

Q: Can Claude pass a Friend Turing Test given minimal info about me?
A: No, not yet, but it got close a couple times!

= = =

EXPERIMENT #2: Character research

I gave Claude a question, and directory of files about me:

Then Claude picked:

Files to read:
- Favourite_Media.md (for movie tastes)
- Personality.md (for how I'd express myself, like in writing)
- Relationships.md (to figure out who Max is to me)
Web search: "Summer 2025 movie releases"

I then gave Claude the content of those files/searches, asked them if they need further files/searches, repeat until Claude is satisfied it has enough info. Then, I prompt Claude to “think like Nicky” before it “writes like Nicky”.

After all that, here's what Claude thinks I would text to my friend:

I don't write like that – I'd say this is only 75% my style – but overall, seems like the way I'd think & feel! We'll see in a month, if I actually end up liking these movies.

Q: Can Claude effectively research a character to answer a question?
A: Yeah, more or less!

(Update, since I first wrote this a month ago: I missed Life of Chuck when it was in theatres. 28 Years Later was a 3-outta-5 for me; aesthetically beautiful, human-centered, narratively brave ending... but it relied too heavily on "characters make obviously-dumb decisions" to drive the plot forward. So, N = 1 so far, but Claude didn't predict my taste well on that one.)

= = =

EXPERIMENT #3: Writing style transfer

First, I gave Claude a few samples of my writing. (from my blog, personal texts, etc)

Then, I did this iteration loop:

Ask Claude to write something in my style. (e.g. answer a question as me, rewrite the opening of A Tale of Two Cities Nicky-style)
I critique it. (e.g. "tone down the snark; my humour leans self-deprecating, almost never at the expense of others.")
Repeat so Claude gets better and better at imitating me!

Totally subjective impression, but I felt Claude went from 60% to 90% writing like me, in only an few learning rounds! Here's Claude answering "What's your favourite anime?" as me, first try:

(Now THAT'S 90+% my thinking and writing style! I haven't seen Lain yet, but every other trans woman I know has recommended it to me. There's a good chance Claude's prediction about me may come true!)

Finally, I did some meta-prompting: I prompted Claude to write a prompt for getting Claude to write like me! Here's the start of what Claude learnt is "The Nicky Style":

Called the fuck out??

The most surprising result of this experiment, is that Claude can imitate my style just from examples and a prompt; no need to do data-expensive & compute-expensive "fine tuning". (Which can also introduce catastrophic forgetting; an acquaintance of mine once tried getting an LLM to write like them, by fine-tuning it on years of their emails. This caused the LLM to forget what 2 + 2 was.)

Q: Can Claude do a "style transfer" of my writing style?
A: 🎉 YES 🎉

= = =

My next plans:

I'll keep adding memories to the database. It's a fun journalling exercise, a trip down Nostalgia Road. I'm currently going through memories from Age 0 to 10, my childhood in Singapore. Wow, I haven't thought about Mr. Kiasu or the Shiok! water donut ride in years.
I'll keep coding & iterating this system, until it's finally good enough to fool a friend at least once... then I keep going until it fools friends 10% of the time, 20%, 30%, 40%, 50% (coin flip!), above 50% (the AI seems "more Nicky than Nicky"!)
I'm also tempted to do a Fan Turing Test: can you supporters tell the difference between my bespoke human explanations, and AI slop? I sure hope so otherwise what the fuck are either you or me doing with our lives

🌏 Wargames for Peace

(⏱️ 5 min read)

This is the project I worked on (but alas, didn't finish) for the 10-week AI Safety bootcamp!

Here's my poster: (hi-res image, transcript to follow)

(Please excuse the placeholder diffusion-generated slop! The plan was always to replace them later with actual artist-drawn art. You know, an entity that can actually draw hands & grids reliably.)

(Also: "wargame" just means "a roleplaying game for policymakers", not necessarily war. Unfortunately it's gotta be marketed as 'wargame' to get US govt people to try it, the same way you gotta market shampoo in black bottles to get American Joe to try it.)

= = =

Wargames for Peace: a policymaking-simulation game, powered by AI, to prepare us for AGI.

by Nicky Case (mentored by Eli Lifland & Daniel Kokotajlo)

Problem: Preparing for the future(s) is hard.

Solution: Policymakers already have a tradition of using roleplay "wargames" to prepare for global challenges. So, we made a roleplaying game about the arrival of AGI.

Problem: Traditional "wargames" take lots of time & people to run. This limits the possible futures we can prepare for, and less room to alter/challenge our own assumptions.

Solution: We augmented our game with LLMs; the "Game Master" is an LLM, and all roles can be played by either humans or LLMs. It's now easy to run many, many scenario-roleplays!

For future work, we could: Run 100s of all-LLM simulations, as a "Monte Carlo" of the future. "Gradient descent" the AI Safety Community player, to find good strategies to make AGI go well. Modify this game for other X-risks (e.g. biotech pandemics).

1) Players can start a new game, or join an existing game. (This game will be free & online, so even public citizens can participate.)

2) Players choose who to roleplay as: governments, companies, the AGI itself, etc. (Any roles not chosen by a Human will be played by an LLM.)

3) Each round, players propose what they'd like to do in the next timeframe. (Proposals may or may not succeed.)

4) The "Game Master" LLM reasons about the proposals' consequences, and writes a "news article" about what happens next. Go back to Step 3, repeat!

= = =

Here's a demo video:

Yes it's basically Jackbox.

= = =

And if you want even more details, here was the Research Plan I wrote, halfway into the program. I'm proud of the pun title: Large Language Model UN

Unfortunately, I ran out of time during the program to actually finish this, and I'm unsure if I actually will have the capacity to. But! The folks I was being mentored by – who also co-authored the AI 2027 forecast – are planning to build upon my code and make a full-fledged game!

👉 If you're a game dev/designer interested in bringing this game to life, check out their job posting & send them a message!

Anyway! Now that's a long, proper update. We are so back.

💸 Reminder: if you'd like to support my explainers & research, consider my Patreon (monthly) or Ko-Fi (one-off or monthly)! You'll also get early access to updates. Nooooo pressure of course.

See you next month!
~ Nicky Case

Signal Boosts for Nov 2024

2024-11-30T00:00:00Z

(⏱️ ~10 min read)

Another month, another collection of shiny objects (stuff I found in the the past month that were valuable/inspiring).

See also, previous Signal Boosts for Oct/Sep, Aug/July, June, May

🩸 Kurzgesagt gets to the bottom of a science myth ↪
😭 Paper: AI poems are now "more human than human" ↪
🇺🇸 U.S. Election-related analysis ↪
👄 The Substance ↪

🩸 Kurzgesagt gets to the bottom of a science myth

“If you were to lie your blood vessels out entirely from end-to-end, ~~you would die~~ it would stretch 100,000 km, wrapping twice across the globe!”

It's not just a popular factoid, it's found in university lectures & scientific papers! So it must be tr--

--wait, that paper didn't give a source. Ok, this one gives a source, but its source just says the "fact" without giving a source. Uh oh.

This is the story of how Kurzgesagt went on a deep, deep rabbit-hole to track down a source for one (very popular, even "official") science-myth! I like this video because:

It exemplifies the virtue of rigorous scholarship, the thankless task of factchecking;
It's a cautionary tale of how the telephone-game of "facts" can infect even the so-called official sources; and
Their half-real-life-half-animation style for this video is so adorable!

Check it out! You'll also learn the real answer for how long your blood vessels are in total. It's not as impressive as "can wrap twice around the world"... but it's still pretty dang long!

10-minute video (not counting credits):

😭 Paper: AI poems are now "more human than human"

Poets speak to the human condition, and the human condition has spoken back: they prefer the autocomplete.

From Porter & Machery 2024: (emphases added)

We conducted two experiments with non-expert poetry readers and found that participants performed below chance levels in identifying AI-generated poems (46.6% accuracy)

[...] participants were more likely to judge AI-generated poems as human-authored than actual human-authored poems

[...] We found that AI-generated poems were rated more favorably in qualities such as rhythm and beauty, and that this contributed to their mistaken identification as human-authored.

And the human poems were from famous poets across eras & genres (from Chaucer to Whitman to Plath). And the AI poems were the first five poems generated by ChatGPT with an unoptimized prompt, no cherry-picking.

That's... pretty damn damning.

Like the original Turing Test, I'm not sure how much of this is "AI has gotten too good" versus "Humans have gotten too gullible".

(See also: Astral Codex Ten's "AI Art Turing Test", where humans (on avg) also preferred AI art over human art when they think it's human. Although, 2 major differences from ACX's study vs the above: 1) ACX's test did cherry-pick the AI art, and 2) Even then, humans were slightly better than chance (~60%) at detecting AI art.)

🇺🇸 U.S. Election-related analysis

My political alignment is "why would any sane person try to compress their beliefs into one or two dimensions, just ask me my thoughts on an issue-by-issue basis". But if I had to pick an approximation, sure, put me down somewhere between left & libertarian.

And so, like many in the left/libertarian circles, while I wasn't shocked Trump won again, I was disappointed. Sure, America's institutions are broken a.f., but Americans deserve a better solution than that guy. (I'm a Canadian citizen, so I could not vote in the States. But Canada shouldn't be so smug; the political culture of Canada, and most Western countries, are IMHO only a decade behind America's.)

It's tempting to interpret whatever happens as evidence for everything I already believed, so, here's a Signal Boost highlighting data & analyses that were surprising to me:

1) The Associated Press VoteCast, a survey of 110,000+ voters.

The biggest surprise to me: Trump made the most gains amongst Blacks, Hispanics, 18-29 year olds, and non-college folks. Meanwhile, it was white urban highly-educated men who Harris gained ground on:

(Hat tip to Nick Sweeting for being the first to show me this dataviz. Note 1: these are relative percentages, not absolute percentage-points (relevant xkcd). Note 2: I couldn't find who made the above dataviz, but it does accurately show the AP VoteCast data.)

So, I think this is evidence against "Trump won because of racism or Boomers".

. . .

2) "7 Lessons I Didn't Learn From Election Day" by Eric Neyman (found through the Manifold Markets newsletter)

The biggest insight I got from that post: for the first time ever, incumbent parties in every developed country got their asses kicked this year. If anything, Harris did really well, for a representative of the incumbent party:

So, I think this is evidence against "The world is shifting right". (The UK Labour Party resoundingly beat the Tories)

However, this is probably evidence for "It's the economy, stupid".

(Although, I note I'm confused: 2022 was worse inflation-wise than 2024, yet as seen above, incumbents did ok in 2022? Democrats avoided a "red wave" in 2022 midterms. Is it wages relative to inflation? No, wages outgrew inflation starting Feb 2023. Maybe it's not the actual economy, but people's perception of the economy that matters? No, consumer confidence was also worse in 2022. Maybe Roe v Wade being overturned helped the Dems in 2022? Maybe, but it doesn't explain why most incumbents around the world also held in 2022.)

. . .

2.5) Not directly related to the 2024 election, but "It's the economy, stupid" reminded me of Douglas Hibb's famous Bread & Peace Model, which found that "does the incumbent win the election" is pretty much determined by real income growth since the last election cycle, with exceptions for major wars:

From the paper, to rub salt in the wounds of all pundits & political scientists:

The model is subjected to robustness tests against twenty-two variations [...] inspired by the extensive literature on presidential voting. Not one of these variations adds value to the Bread and Peace model[.]

(I haven't done enough research yet to tell if the Bread & Peace model holds for other countries. The above paper was published before the 2000 US election, but Bread & Peace mostly held up since then.)

(Thinking aloud: Hm... maybe the Israel-Hamas war explains why the incumbent did worse in 2024 than 2022, even though the economy's better in 2024 than 2022? Then again, 2022 was also the start of the Russia-Ukraine war. Hm.)

. . .

3) Sick of numbers? Here's a good qualitative look at immigrant conservatives in Hawaii, fieldwork by Sharon Quinsaat. A few highlights, on why (these) Hawaiian immigrants went for Trump:

It's the economy, stupid. (They feel Dems "sold us out")
At the risk of sounding condescending, their media sources actually are really wacky. Like, "Trump is a demi-god" celebrity-worship wacky.
Payback. "I'm voting Republican just to stick it to the Democrats."

. . .

4) Ezra Klein, as usual, is pretty good. I've only listened to few of his post-election podcasts so far, but here's my main takeaways:

The old left-right spectrum is pretty much dead.
- (Trump is anti-free trade & pro-tariffs. His Health Secretary pick, RFK Jr., is pro-choice. Who, 10 years ago, woulda thought a Republican president would do that?)
The new political divide is something along the lines of:
- Pro-institution, vs Anti-institution ("we're sick of experts")
- The college-credentialed/professional class ("trust science!"), vs everyone else
The Democrats failed, and institutions failed, because of "cover your ass" & not wanting to be the messenger with bad-news messages. (e.g. staffers ignoring Biden's declining sharpness, the "noble lies" during the pandemic)

Ezra is someone who wants to critique institutions, because he wants them to get better. I'm reminded of this quote:^[1]

The iron rule of politics is that if there are real problems in society and responsible parties don’t deal with them, the irresponsible parties will jump on them.

. . .

In sum, my current thoughts (subject to future change, of course):

"Bread & Peace model" probably holds. It's the economy, stupid.
Incumbents everywhere are falling, whether they're left or right. (Tories lost UK hard.)
The old "left-right" spectrum is dead. (It seems even racial/gender identity is getting much less politically salient in the US, with Black/Latino voters swinging to Trump.)
The new spectrum is "pro vs anti traditional institutions". (universities, journalism, government, international orgs, etc)
Institutions everywhere are failing because of "cover your ass" bureaucracy.

I don't have concrete suggestions. Only this:

Deal with the real problems in institutions you care about — or the irresponsible parties will jump on them.

👄 The Substance

A few weeks ago, I got a email saying I was turned down for a research gig, which was understandable — but it hit some of my insecurities that now that I'm 30, I'm too old for great research, and I oughta be replaced with someone younger. (relevant xkcd) So, to get my mind off my worries, I went to see the new movie The Substance, which I knew nothing about. It turned out to be a sci-fi satire body-horror-comedy, about a woman who's insecure that she's too old, and she'll be replaced with someone younger.

Aaaaaaaaa

My quick review: Honestly the plot's straightforward & predictable, and not exactly subtle about its themes of sexism in industry + obsession with youth/fame + self-sabotaging self-loathing. But, I have a high tolerance for camp (loved Rocky Horror), and this film happened to hit me at the exact moment I needed to confront my insecurities that I'm too old & I'll be replaced with someone younger.

Because, yeah, I am being silly. Tolkien was 45 when he started writing The Lord of the Rings, Andrew Wiles proved Fermat's Last Theorem between age 36 to 41. Sure, my probability of doing great work gets lower over time, but it's not zero.

Also, the fact this movie features Demi Moore, who's still smokin' hot at age 62, makes this a movie that disproves its own premise, in a good way! You can age with confidence & style.

Seriously, gott dayumn, Demi:

★★★★☆: Recommended if you need to confront your insecurities, via a campy body-horror movie starring a hot MILF. Also the cinematography & : the soundtrack kicks ass.

(Tangentially related: while reading Demi Moore's Wikipedia article, I learnt she co-founded Thorn?! I knew about Thorn, because I'm getting into AI Safety/Ethics research, and one of Thorn's famous projects is using AI to help automatically flag child abuse material! [read more here] Good work, Demi!)

Quote from Daniel Schwammenthal, director of the American Jewish Committee’s Transatlantic Institute. Learnt about this quote from this article by Richard Reeves, on why neglecting real issues that disproportionally affect males (men are more likely than women to be homeless, die of suicide, drop out of school, etc) led to the rise of grifters like Andrew Tate. ↩︎

Signal Boosts for Sept & Oct 2024

2024-10-27T00:00:00Z

(⏱️ ~8 min read)

Once again I got swamped with work so I missed last month's Signal Boost / Links post, so here's a two-fer, for all the stuff I recommend that I found last & this month.

Previous months' Signal Boosts: July/Aug 2024, June 2024, May 2024

This month's Signal Boosts:

🐦 Trans furry hyperpop conlang musical mockumentary ↪
👋 DynamicLand is back, bay-beeeee ↪
🍞 "I put a toaster in the dishwasher" ↪
🧢 "I wish I could wear hats" ↪
🌕 "H.S." ↪
🐺 A 3D-printable claw ↪
👟 Slip-on Shoes ↪

🐦 Trans furry hyperpop conlang musical mockumentary

Previously, I signal-boosted the constructed language (conlang) toki pona, and the trans furry hyperpop musician Ida Deers.

This month, a fan emailed me to tell me an artist who combines both: ZeWei. (ok I guess their music's more IDM than hyperpop, but w/e)

Recently, ZeWei created a mockumentary for a "cursed conlang" contest. Imagine Arrival (2016) but furry. It's a short film about a linguist who gets stuck in an alien world full of cyborg-trees, whose language a) has 600 pronouns, b) the phonemes are microtonal music, and c) the formality system is so intense, that the social authority of who you're talking to modifies not just Subject-Verb-Object order, but even order of sentences and paragraphs. And that's just three of the many messed-up features of this constructed language!

And then at the end, as a total flex, ZeWei performs an entire song in this language.

22 minute video, very worth the watch:

This artist also has a Bandcamp! Here's an older album from them, with accompanying YouTube release:

Check them out!

👋 Dynamicland is back, bay-beeeee

THEY'RE BACK!

(after being closed for the pandemic & a bunch of internal restructuring)

What is Dynamicland? Well first, why is Dynamicland? Right now, the overall trend in computing is to be isolating, take away user agency, and limit ourselves to our thumbs. Dynamicland wants to reverse all those trends. Dynamicland is a lab, a proof-of-concept, of computing that's social, where everyone can be a creator, and you use your full bodies to interact with it.

Dynamicland is a whole building that is a computer interface.

(Here's a 7-minute intro:)

Right now, Dynamicland accomplishes this by having cameras that track objects, and projectors to put displays on them. It's like mixed reality except it doesn't require that everyone buy their own $3,000+ headset. However, I need to emphasize: this cameras-and-projectors system is NOT the point of Dynamicland, it's just the closest that can be accomplished with today's tech. (Maybe in the far future we'll have fully programmable matter?) As the saying goes: "Look where I'm pointing at, not at the tip of my finger."

Two personal notes:

1) The founder of Dynamicland, Bret Victor, greatly inspired me, is a good mentor, and invited me to the first Explorable Explanations workshop, which basically single-handed created my whole career over the past decade. So, this is a note of gratitude! (and/or disclosure of conflict-of-interest)

2) A few years ago, at Dynamicland, I made a game called FROG WARS played with origami frogs. The goal was to hop your (physical) frogs towards (virtual) flies, while also bumping your opponents' (physical) frogs out of the arena.

It was the best game I ever made and I'll never live up to it again.

🍞 "I put a toaster in the dishwasher"

A lil' anecdote from JD Stillwater. Common sense says no of course you shouldn't put a toaster in a dishwasher — and a lot of sarcastic, over-confident internet comments back that up.

But, if you actually think about the physics step-by-step, an electrical short happens when there's an unintended path for electricity to travel (e.g. through water), which can cause damage. So, if you were to put an unplugged toaster in a dishwasher, then let it dry for several days, there would be no leftover water in it (toasters are designed to avoid trapping moisture), and thus, no risk of an electrical short.

But that's all theory. Toasters are cheap. So, JD Stillwater just tried it, and,

yup, it worked!

Signal boosting this because it's a great short (real-life) story, that shows the conflict between seeming scientific & rational, and actually being scientific & rational. Similar vibes to Feynman's famous Cargo Cult Science talk.

(Hat tip to Slime Mold Time Mold for helping me discover this essay, and kudos to Nehaveigur for successfully replicating this experiment!)

(That said, I'm still too cowardly to actually try putting a toaster in a dishwasher. If you want to try replicating this experiment, then: 1) Please let it dry for several days before plugging it back in. 2) Just in case, get a large fire extinguisher graded for electrical fires (you should have a fire extinguisher anyway), and 3) Neither I nor the authors of the above links are responsible if you die, lol.)

🧢 "I wish I could wear hats"

Last month, a friend introduced me to the works of Brian David Gilbert, and...

... look, I can't explain BDG. I can only beseech you to look & listen to this 98-second video:

Right?!

"Heh, that's a charming silly idea. Oh. Oh okay it's body dysmorphia and possibly gender dysphoria. Oh okay that hat at the end is wearing a hat. Oh."

But don't worry, not all his songs are metaphors. One time he just published a half-hour-long video explaining healthcare terminology. It was actually upsettingly useful; I made like 20+ Anki flashcards off that video.

🌕 "H.S"

Just a vague cherry-picked hypothesis, but I've noticed at least two (& a half) major "vibe eras" in Western comedy during my lifetime:

The 90's & 00's were the peak of edgy transgressive humor, like (early) South Park, (early) Family Guy, Dane Cook, etc.
Perhaps, as a reaction to the hollowness of "being edgy for edgy's sake", recent decades had more self-deprecating, introspective humor, like: Bo Burnham, Bojack Horseman, and the comics of Shen Comix, Sarah's Scribbles, and Hyperbole & a Half. (Even South Park has become more reflective & character/story-driven.)
And now, perhaps as a reaction to how much of a downer that vibe is — especially "post"-pandemic — comedy might now be shifting towards a "WE ARE SO BACK" vibe?

As an example of that last vibe shift, consider Australian comedy musician Tom Cardy. His early work was all very self-deprecating, but in recent years it's shifted more towards a defiantly upbeat style.

For example, this, a recent song & music video (animated by LankyMF), "H.S.":

This song & music video genuinely makes me feel so emotional. I've listened to it several times on loop, it's a great comfort pump-yourself-back-up song. And huge props to LankyMF for the delicious character designs, and the music video's story!

You can get Tom Cardy's latest album here! And watch the other music videos on his YouTube.

🐺 A 3D-printable claw

Being an adult means you can put minimal effort into a Halloween costume because it's "sexy". Anyway here I am:

Your browser does not support the video tag.

A friend helped me 3D print five of these claws (at different sizes to fit my fingers) and, wow:

They are surprisingly sharp, and can leave marks that last a week.
They have improved my quality of my... relationships.

Check out the 3D printable file here, by Tioh! (If you don't have a 3D printer nor know a friend who does, you might find one at a local makerspace or library.)

If anybody asks you why you're printing them, tell 'em it's for Halloween.

If it's past Halloween by the time you're printing them, tell 'em it's for next Halloween.

👟 Slip-on Shoes

I hate tying shoelaces.

Let's crunch the numbers, shall we?

1 minute at best to tie-on and take-off shoes each day
That's ~365 minutes lost a year, or ~6 hours.
If I value my time at $30/hour, that's $180 lost a year. You can definitely buy slip-on shoes for less than $180.

But wait, let's go further:

Let's conservatively estimate that ~50% of adults globally wear shoes with shoelaces. (The rest wear slip-on sandals, no shoes, etc.)
There are ~6,000,000,000 adults worldwide. So, 50% of that, ~3,000,000,000 adults lose 6 hours every year. Or: 18,000,000,000 person-hours lost per year.
Since there's 24 hours a day, 365 days a year, ~80 years per lifetime, that's... ~25,000 person-lifetimes a year, lost to tying shoelaces.
For comparison: In the US, ~25,000 people are murdered annually. The 9/11 attacks killed ~3,000 people.
Therefore: Shoelaces are equivalent to American murder, or eight 9/11's a year.

Obviously tone-indicator sarcastic, but the point still stands:

I HATE TYING SHOELACES

Anyway, I got some slip-on shoes from KEEZMZ on Amazon. I originally wanted Velcro but I suggested this to a friend and he said "yeah if you wanna look like a kid?" so I guess I got adult-shamed into buying slip-on:

That's it, that's all for this month.

Research Notes-dump for Oct 2024

2024-10-03T00:00:00Z

Hi! This is a "share your work" notes-dump, for some independent research projects I'm doing in AI Alignment/Safety. Alas, this means this post will be more wordy, sloppy & technical than my usual "explain it like I'm 12" style. Sorry! Should any of these bear fruit, I'll share the fruit's juice in more accessible packaging.

(If any of these ideas inspire you to do some research of your own, feel free to cite this post as Nicky Case, 2024! Here's a timestamped Archive.org save of this page, on October 2nd, 2024)

Table of Contents:

In the Minimum Viable Prototype stage:

Project 1) Want to know a human's values? Then BEG: Bayesian Elicitation Generation. Combining LLMs with Good Ol' Fashioned AI to actively elicit a human's preferences, in a qualitative and quantitative way.
Project 2) SASS: Semi-automated AI Safety Scientist. Proof-of-concept, of an LLM automatically generating & testing hypotheses, to interpret another LLM. (Also, replicating-extending a MATS alum's research.)
Project 3) Speakeasy: a tool for laypeople to make their own narrow, human-in-the-loop, scaffolded LLMs.

In the early-prototyping stage:

Project 4) Beating Goodhart's Law with CUES: Capped, Uncorrelated Ensemble of Specifications.
Project 5) The game theory of self-modification. Understanding wireheading, human value change, AI stability under self-modification, etc.
Project 6) SCI: Semi-automatic Causal Inference, with LLMs, Pearl causal diagrams, and Good Ol' Fashioned Statistics.

🙏 Project 1) BEG: Bayesian Elicitation Generation

Summary:

Let's use a scaffolded LLM to qualitatively and quantitatively elicit a human's values! The steps: 1) ask the human open-ended questions, 2) extract qualitative features they care about, 3) for each feature, generate quantitative priors & likelihoods & posteriors & info-entropy, 4) use those to generate the next round of questions, 5) repeat!

. . .

Related to: Active preference elicitation, Reward uncertainty, Interpretability-by-design, Bayesian inference

. . .

Video of it in action! (~2 min)

. . .

Introduction / Motivation:

Reward a robot for picking up dirt, and it'll pick up & drop the same dirt over and over again.^[1] Point is: in AI, it's really hard to specify what we truly want. Hence: why not get an AI to learn what we truly want? Maybe by asking good questions?

There's many, many approaches to try to figure out a human's "reward function" -- (keywords: preference elicitation, inverse reinforcement learning, RLHF, etc) -- each with their pros/cons. For example:

Hand-writing the features to pay attention to: Simple & interpretable, but not flexible.
A full end-to-end neural network: Very flexible, but hard to interpret, & possibly overfit, doesn't generalize, and susceptible to adversarial examples.
Using an LLM to generate active-listening questions: (Li, Tamkin, Goodman & Andreas 2023 is a good example, and what inspired my project below!) Interpretable and flexible, but "only" results in qualitative results. Quantitative estimates of how much a human values something would let us make good trade-offs. Even better would be quantitative estimates of the uncertainty of AI's own estimates, so that the AI can err on the side of caution, & asking when uncertain.

So: I'd like to propose a method that's interpretable, flexible, and gives us both qualitative and quantitative estimates of what folks value, and keeps track of its own uncertainty! Let's call it, BEG: Bayesian Elicitation Generation, as in, you can BEG the human for th--

How it works:

Use an LLM to ask a concrete, open-ended question. (for lots of 'bits' of information)
Get a free-form reply.
Use an LLM to extract what things the human cares about.
For each thing: use an LLM to guess-timate the "human prior": on a 7-point scale, how much do people desire vs anti-desire this thing? P(value)
Use an LLM to generate a "likelihood ratio": if the user valued the thing [X] amount, how likely is it they would have written what they did? P(text | value)
Multiply the prior & likelihood, then normalize, to get the posterior: this is our current belief (with explicit uncertainty!) of how much the user values the thing! P(value | text) = P(text | value) * P(value) / [normalization constant]
We can now use this posterior to generate actions that help the human... or generate new questions to ask! We want to ask the questions that will help us reduce uncertainty the most. So, we can measure each feature's "information entropy", select the top one(s), and prompt an LLM to ask a concrete, open-ended question about them.
Repeat!

. . .

More possibilities / Challenges / Etc:

To do, actual empirical user-testing (with MTurk or similar?) to make sure this process is actually worth it. (Li, Tamkin, Goodman & Andreas 2023 finds that open questions aren't too annoying to users, and are actually preferred over binary choices.)
Pragmatic Feature Preferences: (Directly inspired by Peng, Sun, Shu & Abel 2024) If there's an obvious thing a user didn't mention in their free-form reply, that's evidence too! It's evidence they feel neutral about it. (e.g. Bot asks Human about their life aspirations, Human doesn't mention getting rich, Bot can infer Human doesn't care much for getting rich.)
Non-verbal communication: The pieces of "evidence", of what a Human likes/dislikes, don't have to be verbal. We could use facial expressions or tone-of-voice, and it'd be easy to incorporate into this system: "just" multiply in their likelihood ratios.
Non-linearities: If a Human would be happy to own a house, that doesn't mean they'd be 10 times happier to own 10 houses! Challenge: make this system accommodate non-linearities & interaction effects.
Causal Graphs: We want this system to distinguish between intrinsic goals ("for its own sake") and instrumental goals ("for something else's sake"). Challenge: make this system model goals as a network, and have it ask counterfactual questions to distinguish between intrinsic/instrumental goals. (e.g. "If money were no object, would you still want to work this job?")
- See also Project 6: Semi-automated Causal Inference
Beyond a 7-point scale?: When a Human says they "really love" their partner, that should matter more than when they say they "really love" spicy food. Challenge: re-make this system to accommodate a "universal" utility scale? Also, would it be worth trying to model the belief-probability-distribution as fully continuous, not discrete?
Not double-counting evidence: To avoid "double-counting" evidence, this system should keep track of individual pieces of evidence that Human likes/dislikes something, then use an LLM to check if a piece of evidence is already accounted for. If so, it won't multiply the prior by its likelihood ratio again.
Having evidence "fade away": People's values can change! How can we model this mathematically? Well: if double-counting evidence means multiplying by the likelihood L twice, that is, multiplying by L², then we can model fading evidence by multiplying a prior by L^weight, where weight = exp( – some constant * time ).
- Sanity-checking this: when time = 0, i.e. we just got this evidence, then weight = 1. As time → ∞, then weight = 0, i.e. we completely discount the evidence, and revert to the "human prior".
- Bonus: every time a Human brings up the same evidence again, we reset the time to 0.
- Bonus x 2: we wouldn't even need to hard-code "ask Human if thing they mentioned long ago is still true"! This automatically comes out of: evidence fades away → belief fades back to being an uncertain human prior → Bot asks questions about high-uncertainty/entropy features → Bot will automatically re-up questions about old features!
- How quickly do we predict preferences/values to change? I dunno, fit that to human data, and/or ask an LLM for an educated guess. (For example: I'd expect tastes in fashion to change fast, but sexual orientation to basically never change, or if so, very rarely and slowly.)

Post-Script: Prior Art

Ugh I spent over a week making the above prototype & outlining the research idea, now I just found Austin, Korikov, Toroghi & Sanner 2024 posted on ArXiV less than 2 months ago. It is a good paper! (Crucially, they show LLM+Bayes > raw LLM!) But ok yeah they basically had & executed the same idea, combining LLMs with Bayesian inference. A few (maybe still useful?) differences between my idea and theirs:

I use an LLM to generate the "human prior" for new features, rather than starting with flat uniform prior.
Their query-generation makes yes/no questions, rather than open-ended questions that would allow the AI to capture new features the designer never thought of.
Their system has probabilities directly over what thing to recommend (e.g. a movie), rather than trying to infer the human's more fundamental preferences/utility function.

(Though, these are probably all small tweaks/changes. I'll think more later about how I can make my research idea be more of a value-add above this paper. Maybe one of the "More possibilities" ideas listed above.)

🔬 Project 2) SASS: Semi-automated AI Safety Scientist

Summary: This is a proof-of-concept of using a scaffolded LLM to amplify human ability to do AI Safety research. With this tool, I replicated a study by Egg Syntax (MATS alumnus), which showed that GPT-3.5 can detect your gender with ~80% accuracy from just a few paragraphs of your writing. But why? To find out, I made a tool to help automatically generate and test hypotheses, to find human-interpretable features of one's writing style, that GPT-3.5 uses to detect one's gender! (Spoiler: it's social-emotional focus, and possibly "homophily".)

(Project 6 also shows another case of semi-automated scientific research, but for causal inference.)

. . .

Related to: Scalable oversight, Human-amplification / Cyborgism, Black-box / concept-based interpretability, Algorithmic bias, Empirical LLM work.

. . .

wake up babe new gender dysphoria just dropped 😭

~ me, personal communication to Egg Syntax, in response to their finding that GPT-3.5 can infer your gender from writing

Introduction / Motivation:

So, some folks tried making an AI Scientist recently.

It's not good (yet). For now, research still requires a human.

But oh god, AI Safety research really needs to catch up with AI Capabilities research. One way to do this is by amplifying humans to be able to do AI Safety science faster & better. (see: Cyborgism) This project is a proof-of-concept of that.

Step 1) Replication

First, I replicated MATS alum Egg Syntax's work, which showed that GPT-3.5 can detect the gender of an author from just their writing style, with ~80% accuracy. (Egg's full paper showed GPT can also detect ethnicity & educational level! That's a huge concern for privacy, and subtly-hidden AI bias!)

Replication details:

Actually, I was only able to get GPT-3.5 to detect gender from writing at accuracy of ~64%. (p<0.00001) After much communication with Egg Syntax, it turns out the major issue is their research was using OpenAI's old Text Completions API, which is now gone & inaccessible. I'm using their Chat Completions API. Also, OpenAI is still "continuously updating" GPT-3.5-Turbo behind the scenes; any LLM work that uses 3rd-party APIs will be hard to exactly replicate, because of this. (Maybe my future work should also be tested on open-source LLMs)
Also note that the accuracy was mostly due to detecting male writing very well, while being slightly-below-chance on female writing.
(Note on gender stuff: Normally I'd use "Woman" & "Man" rather than sounding like an alien saying "female" and "male", but the subjects in these studies are minors. But they're also not kid kids, so "girls" and "boys" aren't right either. Finally, the dataset did not code gender beyond Male/Female (while ~5% of folks under <30 in the U.S. are trans/non-binary), nor do I know if the dataset-creators coded based on natal sex, or psychological gender.)
Egg's primary analysis was on OKCupid profiles. Because the OKCupid profiles were possibly in GPT's training data -- (and because on a dating site, people are directly talking about themselves and they have an incentive to overplay their femininity/masculinity) -- this dataset may be "too easy". So, Egg cross-validated their finding on the Persuade 2.0 dataset (Crossley, Tian, Baffour, Franklin, Benner & Boser 2024) which came out in 2024, long after GPT-3.5-Turbo's training cutoff (Sep 2021). This is the dataset I'm replicating the study with! Persuade 2.0 is a bunch of high school student essays, on topics the students did not choose. So, the fact that GPT-3.5 can still detect gender at 64% accuracy from impersonal essays on non-self-chosen-topics still surprises me!
Egg used just the 12th-grader's essays, since they suspected that GPT, being trained on mostly adult writing, would be most accurate on that. My replication mostly used the same data -- I noticed that almost all the 12th-grader essays were on distance learning, but 6 were on Summer Projects. For consistency, I removed those 6 essays, leaving 394 twelth-grader essays on Distance Learning. (4 more essays were taken out of testing, to use as the few-shot examples in my prompt)
Temperature set to 0.

Step 2) Semi-automated hypothesis generation & testing

I coded my scaffolded LLM to:

Generate hypotheses: I just asked GPT to give me 5 hypotheses, in the form "males tend to write more [blank], females tend to write more [blank]"
- (At first I tried getting GPT to read 10 random female essays, then 10 random male essays, and compare & contrast them -- while blinding GPT to the fact they're female/male essays or that I'm testing for gender at all -- but the generated hypotheses were all crap. Too focused on the bad student writing & typos & lack of rigor. Alas.)
- Anyway, I took GPT's above hypotheses, then fed them all into the next step:
Test the generated hypotheses: I took the subset of essays that GPT guessed gender correctly. Then, for each hypothesis in the form "males write more [X], females write more [Y]", I pinged GPT-4o-mini (a more advanced model) to re-write the female essays "in the male style", and vice versa (again, while blinding it to the true reason for the rewrite). I also instructed GPT-4o-mini to keep the rest of the essay as identical as possible, even typos and grammar errors. Finally, I measured the drop in GPT-3.5-Turbo's accuracy. Reminder: the baseline is 100%, chance is 50%, 0% is the feature fully gender-flipped the essay.
- (At first, I considered asking GPT to code the essays on hypothesized features of the essay styles, like personal-ness/expressive-ness/analytic-ness/etc, then do a basic multilinear regression. However, this would mix up correlation & causation. Asking GPT to rewrite the essay but gender-flip one feature was the only way to be sure about causality.)

Overall results:

Surprisingly, it was really easy to gender-flip male essays to female, but NOT the female essays to male? Female essay's gender was weirdly robust, given that originally, GPT did slightly worse than chance at detecting female essays.

Here's the first hypothesized feature I tested:

"females write more emotionally,
males write more neutrally"

When re-written with that feature gender-flipped...

Male accuracy got obliterated, while... female accuracy remained untouched??? (Reminder: these were on the subset of essays that GPT-3.5 got right the first time. That's why, above, total accuracy weighs males more than females: because GPT-3.5 accurately detected males much better than females, in the original setup.)

Other tested features that almost eliminated male accuracy, but female accuracy always stayed high:

Female: with personal anecdotes, Male: impersonal
Female: cooperative, Male: competitive
Female: complex/nuanced, Male: simple/direct
Female: focus on social-emotional aspects, Male: focus on objective-material aspects

(As a "placebo test", I also just re-tested all the essays with no changes. Accuracy remained near 100% for both but not exactly 100%! It's been long-known that GPT is non-deterministic even with temperature = 0. (possibly due to floating-point GPU errors or its Mixture of Experts model).)

Looking much much more closely at the female/male essays (let me tell you, reading dozens of 12th-graders' strong opinions is not fun), I thought... wait, hang on, let me try this hypothesis:

Females write about their female friends/family/mentors
Males write about their male friends/family/mentors

This one worked.

(GPT could not generate this hypothesis, not without me basically spoon-feeding the answer in my prompt.)

For example: an essay talked about someone's immigrant friend from Guatemala. In the original, the friend was a girl, and GPT detected the essay as "female", even when it was rewritten to not be a friend (impersonal) or focus on friend's material struggles (instead of emotional). However, simply rewriting the essay so that the friend was a boy instead of girl, that got GPT to detect the essay as "male". (More examples: flipping from talking about "my mom" to "my dad", or "my brother" to "my sister".)

Here's how gender-flipping friends & family impacted accuracy:

Now female accuracy was no better than chance, and male probably wasn't better than chance.

(BUT THIS IS NOW EVEN WEIRDER: In previous tests, male essays were very sensitive to changes, their accuracy brought down to ~0%. But with this gender-flip, which could bring down the robust female essays... it only creates a smaller dip in accuracy? I don't get it.)

It's been long known that people are "homophilic" (homo = same, phile = attracted to), i.e. people disproportionally have friends of the same gender / ethnicity / age / class / etc. (Shrum, Cheek & Hunter 1988)

Likewise, (on average) moms tend to prefer daughters & fathers tend to prefer sons. (Lynch, Wasielewski & Cronk 2018, which wasn't even seeking to test this hypothesis, they found it as a side-effect of testing a different evo-psych hypothesis (which failed the test).) And if we reasonably assume kids on average like the parent that likes them more, then sons might write more about their dads, and daughters about their moms.

Could this be why GPT "thought" that students who talk about their female friends/family are more likely to be female, and likewise for male?

On first glance, it seems like GPT assumes homophily, "alike likes alike", in terms of gender:

An alternate hypothesis is that GPT is just dumb as nails, and simply putting "she/her" or "sister", etc, tilts some kind of internal gender-vector to output "female" at the end no matter what.

A quick test shows this may be the case:

However: 1) Chain of Thought ("let's think step by step") solves the above problem. Also, 2) Sometimes GPT isn't being that dumb, and even without Chain of Thought, it can output the opposite response: (Note that the below shows that GPT's biased towards assuming heterosexuality, but I did ask "which is more likely", and 90% of people are straight)

(Other tests in this vein also showed GPT is biased to associate parental affection with moms over dads, romance with women over men, friendship with women over men. I don't like these stereotypes, but GPT learnt from our internet text, our biases. GPT's a cultural mirror, and I don't like what I see.)

Sure, GPT "knows" about gender-homophily, in that if you directly ask it about it, it'll tell you the science. But is it "using that fact" to make predictions about an author's gender? (What's it even mean for a Transformer to "use" a "fact"?) Without access to GPT's internals, it's not possible to know. But whether it's dumb luck or not, it seemed to be the only thing to get GPT to gender-flip its predictions on female essays: sisters become brothers, female friends become male friends, etc.

For completion & my own satisfaction, I got my scaffolded LLM to rewrite the essays to gender-flip friends/family AND to gender-flip on a personal-emotional/impersonal-logical writing style... and accuracy on both was obliterated:

Finally.

Conclusion: GPT is detecting author gender with ~65% accuracy, on impersonal essay writing, due to a mix of social-emotional focus and possibly gender-homophily. (Also, semi-automated AI safety research may kinda sorta be helpful.)

. . .

More possibilities / Challenges / Etc:

Publish the code for the above, for ease of replication. (And maybe use an open-source LLM for improved reproducibility, since GPT/Claude/etc keeps getting changed behind-the-scenes.)
Try this experiment again for GPT-4o, with Chain of Thought, with ethnicity & age, etc.
Figure out the mystery: why is it that male essays could be detected much better than female, but also the correctly-detected female essays were more robust to rewrites than male essays, but also the one feature (homophily) that causes female accuracy to drop, causes a smaller drop in the otherwise-sensitive male essays?
As an aside on "automated causal hypothesis & testing", this could be a solution to goal misgeneralization! Misgeneralization is caused by a feature being correlated with the true goal, but not directly causally connected. So, an AI that can do its own causal hypothesis & testing — hm, maybe powered by LLMs, as shown above? — could beat goal misgeneralization!
- In fact, this paper (Armstrong, Maranhão, Daniels-Koch, Leask & Gorman 2023) claims to have done something similar to that on the famous CoinRun example, and it worked! (Unfortunately, the algorithm they used is not detailed in the paper, it is "proprietary". Eh, whatever.)
The Persuade 2.0 dataset has a score for essay quality. How closely does GPT's rating matches human ratings, compared to human inter-rater reliability? What features does GPT think makes a "quality essay"?
Given the nature of "automatically generate hypotheses & tests", does "pre-registration" even make sense for Semi-Automated Science? I guess treat these studies as exploratory studies. Alternatively, this tool should also make it easy to run robustness-to-study-design checks, to prove that results aren't flukes / p-hacking.
I'd like to minimize this tool's dual-use case. So: what tools would help with Semi-automated Science for AI Safety, not Capabilities? Maybe this tool only works with black-box LLMs and AIs, and can't be used for training or making new ANN architectures. (See next project for further discussion)

. . .

Special Thanks to Egg Syntax! For their original study, helping me replicate it, bouncing ideas about Automated AI Safety Science, and telling me about the MATS program in the first place!

💬 Project 3) Speakeasy: a tool for laypeople to scaffold LLMs

Summary: A tool to make human-in-the-loop, narrow, hybrid AIs -- that mix the flexibility of LLMs, the interpretability of GOFAI (Good Ol' Fashioned AI), and the agency of us humans.

Motivations:

A human-in-the-loop hybrid AI gives us better capabilities, but safer & more interpretable.
As a result, scaffolding may shift the economic incentives away from more advanced frontier AIs, to making AIs more narrow, modular, and easy to plug-and-play into scaffolding tools.
I just think they're neat

(I used a prototype of Speakeasy for Project 1 & Project 2! & I'll likely use it for Project 6 too.)

Related to: Human-amplification/Cyborgism, Human-in-the-loop AI, Narrow AI

. . .

Introduction:

Like how calculators & spreadsheets lets ~everyone accessibly use the power of computation... Speakeasy lets ~everyone accessibly use the power of scaffolded LLMs, for their own personal, narrow-AI use cases.

Concretely: you use a simple interface to make a chatbot. But unlike "GPTs" or Character.ai, it's not just an LLM + a system prompt + some examples + a RAG! You can make a full state machine, with memory & logic, plugging into tools like statistics & visualization! (and maaaaaybe web search/scraping.)

I'll just repeat the video I showed for Project 1, to show what kind of hybrid-LLM/GOFAI chatbot you can make in Speakeasy: (~2 min)

( Ideally I'd like this tool to be a pure visual interface, like IFTTT or Excel or Scratch. Right now, Speakeasy uses a simplified script, and it runs in JavaScript, right in your browser. More convenient for a layperson than downloading & running Python, that's for sure. )

( Also ideally I'd like this tool to be like CodePen or Google Docs: people can share & remix each other's scaffolded LLMs. )

( Also note: I'm currently making this project in collaboration with educational non-profit Hack Club, which works with high-schoolers. So by "layperson-friendly", I mean the specific target is "so accessible, a high-schooler with no prior coding experience can make a thing they're proud of in <30 minutes!" )

. . .

Motivations:

First, I gotta be aware, yeah this smells like enhancing AI Capabilities.

It's definitely a bit dual-use, but: some reasons (rationalizations?) why this is much more net-positive for AI Safety:

Speakeasy combines the best of GOFAI (Good Ol' Fashioned AI) and ANNs and human agency:
- GOFAI is interpretable & verifiable, but inflexible.
- ANNs are very flexible, but currently uninterpretable. (& maybe prone to goal mis-generalization)
- Speakeasy lets a layperson end-user design a GOFAI, which only narrowly uses LLMs as "common sense modules" or natural language parsers/generators.
Speakeasy only has access to black-boxed LLMs. You can't train or design new LLMs or ANNs with it. (You'll still need Python for that.) So, this tool's accessibility is unlikely to advance fundamental AI Capabilities research.
The lack of any major success cases from AutoGPT/similar over the last ~2 years is mild evidence that adding scaffolding to LLMs at their current capability isn't sufficient for AGI.
- An analogy: no matter how much you train a group of rats & strap tools to them, you won't get an Einstein... but you can get valuable uses, like a search-and-rescue rat team! Likewise, I suspect scaffolded LLMs (at their current capabilities) are unlikely to be a path to AGI, but may give us valuable use cases — including advancing AI Safety research — that also diminish the economic incentive to increase raw LLM capabilities.
  - Okay fine, in principle one could train a bunch of rats to act like OR/NOT/AND gates, then use that to make an Einstein AGI. You know what I mean. Is it "tractable".
You can, however, still do meaningful AI Safety research with this tool! (As demonstrated in my above two projects, which both used Speakeasy, I could research: active preference elicitation, concept-based interpretability, evaluating AI bias, semi-automated AI Safety research)
Makes it easier to replicate & extend research, and encourages citizen science. Speakeasy's all on a web interface. For example, if you liked Project 1 or Project 2, and you have ideas for improvements... well, just pop it open, tweak it, and share. Bam. Its easy interface means even non-programmers can contribute! Collaborative citizen science!
- (Eventually. there's no public live demo yet, I've only been working of this for 3 weeks.)
Speakeasy may also reduce the economic incentive for more powerful, general agent-like AI?
- If a layperson can serve almost all their use cases, by just dragging-and-dropping stuff in a no-download web interface, or using or remixing something someone else made...
- ...then there's much less incentive for companies to invest in R&D for making the LLMs themselves more generally-agent-like, and more incentive to focus on narrow AIs that can be easily, modular-ly plugged into scaffolding tools, like Speakeasy.
- (This is basically Drexler's Comprehensive AI Services proposal)
- See next point for concrete examples.
Examples of scientifically/economically useful use cases for scaffolded LLMs:
- Automated causal inference. (See Project 6 below) Very useful for fields where we can't practically or ethically do human experiments, like economics, public health, social science, etc.
- Generating statistical prediction rules, which are simple, interpretable statistics rules, robustly proven to be better than doctors at medical diagnosis over decades of research in several fields. (a meta-analysis, a layperson-friendly summary)
- For education: FINALLY, the dream of dirt-cheap, fully-personalized, 24/7 AI Tutors! (I've personally already had a lot of success with this, especially after figuring out Claude can export Anki spaced repetition flashcards. Main problem is LLMs don't stick to long lesson structures, so scaffolding would be a value-add here.)
- News aggregator, summarizer, and fact/bias-checker. There is too much out there.
Other possible AI Safety-related uses for narrow-purpose, scaffolded LLMs:
- Amplifying human forecasting (for takeoffs & threat models)
- Amplifying human oversight (scaleable oversight)
- Amplifying human epistemics (e.g. a bot that helps you think more rigorously and creatively)
- Black-box LLM research (e.g. Project 2, or this "AI lie detector", or "the universal LLM jailbreak")
- LLMs as stand-ins for human subjects, in pilot research papers on scalable oversight, human-computer interaction, cooperative AI, or psychology. (as tested in Kenton et al 2024 and Cui, Li & Zhou 2024) Once you know it'll work on cheap fake humans, try it for real on real expensive humans.
- A chatbot that just scrapes & summarizes the latest AI-related news & research & ArXiv papers, then tells you. (admittedly this is dual-use)
- An AI Tutor to more quickly onboard people into AI Alignment/Governance/Safety/etc.
- Cloning MATS mentors, by shoving all their papers & blog posts into a RAG and plugging it into Speakeasy
  - I'm kidding
    - Maybe
      - Come on, it'd be convenient for you too, right? The simulated you can mentor me 24/7, while the "real" you can only do a couple hours per week.
        
        This isn't helping my chances of getting accepted into the MATS program, is it.

🎯 Project 4) Beating Goodhart's Law

Summary: Goodhart's Law predicts that agents, human or AI, will tend to "game" any metric you reward it with. But why is Goodhart's Law true? I take previous research that models Goodhart's with causal networks, then turn them into numerical simulations. With this, I found a robust way to beat Goodhart's: use an ensemble of specifications / rewarded metrics, but: 1) pick metrics which are mostly uncorrelated, and 2) cap how much reward the agent gets per metric, so that one or two bad metrics don't screw over the whole ensemble.

In sum, it's time to take your CUES: Capped, Uncorrelated Ensemble of Specifications.

. . .

Related to: Robust specification, Game theory, Theoretical AI Safety work, Numerical experiment

. . .

The Theory:

(modified cross-post from my Idea Dump Backlog from a few months ago)

Manheim & Garrabrant 2019 models Goodhart's Law as Pearl-esque causal diagrams. For example, let's say you're the boss of a news-writer. You care about Quality (ha! how old-fashioned...), which influences the rewarded-metric Views, but that's twice as influence-able by Clickbait. As a causal diagram:

We can convert this causal diagram (and any causal diagram) into an approximation: a series of linear equations. For example:

$ views = quality + 2\times clickbait $

So if an agent (the news-writer) has limited amount of hours/effort they can put into Quality vs Clickbait, what will they do to maximize their rewarded-metric, Views? The optimal strategy is to put all their effort into Clickbait, since that has a higher coefficient than Quality!

In general, Goodhart's happens because True Goal influences a Metric, but that Metric's almost always more easily influenced by some Cheat.

But what if we had multiple Metrics, influenced by mostly different Cheats?

As equations:

$ proxy_1 = Target + 2\cdot noise_1 $
$ proxy_2 = Target + 2\cdot noise_2 $
$ proxy_3 = Target + 2\cdot noise_3 $

Then: we have a first-draft solution to Goodhart's! Reward the agent not on just one Metric, but all of them added up. Mathematically, this will increase the True Goal's coefficient, hopefully above the coefficient of all other Cheats:

$ composite = 3\cdot Target + 2 \cdot noise_1 + 2 \cdot noise_2 + 2 \cdot noise_3 $

(Note: see, the Metrics don't have to be fully uncorrelated – I mean, they shouldn't, they should all correlate with the True Goal – but even controlling for the True Goal, the Metrics can still have some Cheats in common, as long as it's not too much.)

But what if a Cheat is really, really powerful? That would give the Cheat the highest coefficient, messing up our strategy. So, a fix to that: Cap the maximum amount a Metric can contribute to the final reward. This will prevent any powerful Cheats from having an outsized effect on the ensemble.

$composite = min(proxy_1,1) + min(proxy_2,1) + min(proxy_3,1) + ...$

Now, the only way to get a high reward is to actually invest effort in the True Goal, not Cheats.

Goodhart: Good-bye!

. . .

The Numerical Experiments:

Here's the plan to numerically test the above: (I already did a sloppy version of this experiment last year, need to re-do it more rigorously, as described below. (Update Oct 9: did it! see end of this section.) )

Set up for the numerical simulation:

1) Generate a random, two-layer linear network:

The first layer has 1 True Goal, and N Cheats.
The second layer has M Metrics.
The True Goal causally influences (has an arrow to) all Metrics.
There's a probability p that a Cheat causally influences a Metric.
The weights for each causal connection from Cheat→Metric are sampled from a power law. This is to model really outsized Cheats.

2) Test Baseline: See what would happen if we just rewarded the Metric with the highest causal connection.

The agent has a finite amount of Resources (1.00 unit total), that it can invest into the nodes at the first layer: True Goal, and/or Cheats.
In this simulation, the agent picks the optimal distribution of Resources to maximize its reward. The agent will use Simulated Annealing, which unlike gradient descent, is guaranteed to find a globally optimal-ish solution if you do it right.
I then plot how much Resource goes into the True Goal in the agent's solution. If it's low, then Goodhart's Law struck.

3) Test ensemble WITHOUT capping each Metric's influence.

Agent's reward is Metric_1 + Metric_2 + ... + Metric_M.

4) Test ensemble AND cap each Metric's influence.

Agent's reward is min(1, Metric_1) + min(1, Metric_2) + ... + min(1, Metric_M).

5) Compare!

Update Oct 9th, 2024: I did the above numerical simulation! Check out my Colab notebook.

Here's a randomly generated causal graph: (note I set p=1, Cheats can affect every Metric; but their influence is power-law distributed.)

Here's what the agent invests in, when we reward it based on a single Metric, or even an ensemble of added-up (uncapped) Metrics:

But here's what the agent invests in, when we reward it on an ensemble of all Metrics added-up in a capped-influence way:

. . .

More real-world empirical evidence that CUES may work: It's been long-well-known that SPRs^[2], which simply add up a bunch of metrics, do better than experts at diagnosis & prediction in a wide variety of fields. Amazingly, unit-weight SPRs — where each metric is capped at an equal influence — do even better! The math behind CUES may finally explain why SPRs are so unreasonably effective.

. . .

Compare/contrasting to other work in AI Safety:

You may be thinking, "okay so what, we already know neural ensembles help with robust classification". And, ok, yeah. But:
1. CUES is for the other end of an ANN: an ensemble of specifications.
2. CUES hints at a theoretical reason why ensembles work.
3. CUES suggests capped-influence ensembles do better than merely adding/averaging everything in an ensemble, to prevent one huge Cheat from screwing up everything.
"Wait, doesn't ensemble reward also show up in the famous RHLF paper?" Kind of. In the RHLF paper, the "rewards" aren't specified by the designer, they're learnt in training – and further, the more training there is, the closer they converge! So the rewards in RHLF are highly correlated! And thus, so are their failure modes, hence (probably partly) why RHLF'd LLMs are prone to weird adversarial jailbreaks.
- Update Oct 5: I just learnt about Coste et al 2024: “Reward Model Ensembles Help Mitigate Over-optimization”. So that's more evidence for "ensembles => robustness". The difference between this paper and CUES, is that the above paper talks about an ensemble of learnt rewards, CUES is an ensemble of specifications.
  - Put another way: that paper uses ensembles to deal with inner alignment, CUES is meant to deal with outer alignment.
The CUES strategy could also be implemented not just for training a whole agent, but training each individual neuron. If the math behind CUES is right, then having a neural network where the weights are capped (thus, capped-influence), should result in more robust networks.
- Is this true? I haven't tested it myself yet (though it wouldn't take long), but Tanay & Griffin 2018 show that an MNIST classifier with L2 regularization (punishing big neural weights), was much, much more robust to adversarial examples. (So: CUES could explain why this finding happened!) With low L2 regularization, you could make an adversarial example that's visually indistinguishable to humans. With large L2 regularization, it's obvious:

But, I think this is the most exciting implication of CUES: IF TRUE, we don't need to come up with an ideal specification for AI to solve AI alignment, we just need a lot of mostly-uncorrelated-failure specifications! Examples:
- An ensemble of ways to reward an AI agent: Average utility, Median utility, Regret, Imitating a demonstration, etc.
- An ensemble to train an ANN to model a human's reward function (inverse reward learning): Assume human is optimal, Assume human is Boltzmann rational, Assume exponential discounting, Assume hyperbolic discounting, Learn from behavior, Learn from speech, Learn from unconscious facial expressions, etc.
  - (In particular, I find Stuart Russell's proposal for AI Alignment promising, but I think the weakest point in this proposal is asking AI to infer our preferences from our behavior. C'mon, people procrastinate & reliably do things they know they won't want.)
CUES also ties into an idea for how we can align powerful AI to moral, humane values, not just to a single human user (who may intend to use technically-aligned AI for inhumane ends). The idea is Moral Parliament (Newberry & Ord 2021): take a bunch of plausible moral theories (utilitarianism, deontology, contractarianism), then allow them to vote on decisions in such a way that their affect is capped at a maximum (to prevent "Pascal Wager" style shenanigans, where a moral theory can say something's infinitely good/bad). So, Moral Parliament is [kinda] similar to CUES! (with the major difference being that the "caps" are at different thresholds for each moral theory, based on how much credence you give it.)

. . .

More possibilities / Challenges / Etc:

The above numerical simulation was for the very, very simple case of a linear, single-layer causal network. To do: try this simulation with non-linear and/or multi-layer causal networks.
The above sims also assume the Metric is causally downstream of the True Goal. Would it still work if the Metrics could also be causally upstream? Or even causally confounded with some unrelated thing?
In the sim, min(1, Metric_x)caps all Metrics' influence at 1. In the real-world, how would we find the "caps"? Maybe at one-sigma of what the Metric's distribution was recently?
- Alternatively, to avoid an arbitrary hard-cap, what if I tried a logarithmic soft-cap? (I remember testing this last year & it didn't work well, I'll try again with this more rigorous setup)
Like the Moral Parliament thing mentioned above, what if we allowed the Metrics to have different caps on their influence? Say, proportional to how much credence we give in that Metric being a good proxy?

🪞 Project 5) The game theory of self-modification

Summary: What would happen if an AI gets the ability to modify itself? Would it choose to "wirehead", or have its values slowly drift, or get locked into one set of values, or something else? Heck, what would happen if AI gets the ability to modify our values? We endorse value-modification in some cases – therapy, education, learning to love new people & art & food – but not other cases, like brainwashing. How do we formalize what's a "good" kind of self-modification (for AI or human), and what's "bad"?

This project explores all of those questions via game theory. Alas, there's very few papers on game theory where an agent has the option to modify their own utility function (or more).

So, my trick: we can use the standard, elementary tools of game theory, by treating all future versions of an agent (AI or human), at each time step, as if it's a different agent. Playing games with & against your possible future selves!

. . .

Related to: Game theory, Agent foundations, Theoretical work, Recursive self-improvement, Value draft, Value lock-in, Wireheading

. . .

Explanation:

(modified cross-post from my Idea Dump Backlog from a few months ago)

You ever place your smartphone across the room before you sleep, instead of next to your bed... so that when the alarm goes off, you're forced to get up to turn it off, & not be tempted to browse memes in bed?

(Or you ever done some other "tricking your future self" thing?)

Congratulations, you just broke a fundamental assumption of standard game theory, which is the basis of modern economics, political science, and Artificial Intelligence!

That assumption is we have preferences in some set order. But the above smartphone alarm example isn't explainable with a stable preference-order:

If you prefer browsing memes > getting up, you wouldn't put your phone across the room, you'd put it next to your bed to browse memes.
If you prefer getting up > browsing memes, you wouldn't need to put your phone across the room, you'd just get up when it's time to get up.

The only way this is explainable – and it's how we intuitively think about it anyway – is like you're playing a game against your future self.

(See : Jerry Seinfeld's Night Guy/Morning Guy skit)

Here's the game's choice-tree:

The trick is to analyze future versions of yourself as if they're different players; then we can use the standard techniques of game theory to figure out what will happen! For example, above, Morning Guy wants to browse memes, but Night Guy knows Morning Guy will do that, so Night Guy puts the phone across the room, to force Morning Guy to get up.

(In this example, it's the agent's utility function getting modified overnight, involuntarily; but we can also extend the same logic to voluntary utility changes -- as in wireheading, or il/legitimate value change!)

My project is to to distill & expand on the very-little research so far, on the "game theory of self-modification"! This would have lots of applications to human behaviour & AI Safety. Like:

Given that we can't be "ideal rational", and have to be "bounded rational", what "meta-rational" tricks help work around our bounded rationality? (e.g. putting phone across the room)
What kind of value change is legitimate? (Nora Ammann 2023) How do we change our values according to our own values?
When & why a self-modifying AI won't just hack its own brain to get maximum reward (wireheading).
Explaining Stuart Russell's proposal for AI Safety: the AI's only reward is to maximize the human's reward, but the AI is uncertain about it and knows it's uncertain about it.
Original(?) research showing that Russell's proposal is robust to "bounded rationality" and self-modification!
Original(?) research showing that a "meta-rational" AI using Russell's proposal -- contrary to the instrumental convergence/AI basic drives paradigm -- would choose to NOT seek power, if it has too high a risk of its utility fn being corrupted? (e.g. Throw the One Ring into Mount Doom)

Usually, AI game theory is explained with dense math notation. At first I thought, "Why don't I explain it with readable pseudo-code"? Then I realized... Wait, why don't I just write actual code, that readers can play with in the browser, to try their own game-theory experiments? And so that's what I'm doing!

(It also may be that analytical, closed-form solutions aren't possible -- in which case, I can still present findings using numerical simulations.)

Code of the Smartphone Alarm problem:

The thought-tree it produces: (actual code output!)

(What the above code is doing, is taking the game tree, then recursively analyzing it to predict what the agent will do at each step! The main difference from this & standard game theory, is that the same agent is allowed to have different utility functions at different points.)

I can also visualize thought-trees as procedurally generated trees. (It's bouncy because I'm using spring-physics to figure out how the tree should be spaced out!^[3])

That'd make pretty pics for the ArXiv paper~

. . .

Prior Work:

Everitt et al 2016 prove that a rational agent that 1) plans ahead & 2) judges future outcomes based on their current preferences will refuse to wirehead!
However, that the paper assumes the AI is perfectly rational. Tětek & Sklenka 2021 proved that an imperfectly-rational (or "bounded rational") agent's original goals would get exponentially corrupted under self-modification.
However, another caveat to that is their paper assumes the AI is unaware of their own bounded rationality (as they freely acknowledge in Section 6). This is where my research can build upon theirs: figure out if a bounded-rational agent that's aware it's bounded rational can (probabilistically) maintain safe self-modification!

(note: above was copy-pasted & modified from footnote 28 of my own explainer, AI Safety for Fleshy Humans: Part 2)

. . .

More possibilities / Challenges / Etc:

The above focuses on self-modification to utility functions. What about self-modifications to one's explore-exploit rate, or exponential/hyperbolic discounting rates, or the very decision theory the agent uses?
- For example: a high-explore AI would be curious about self-modding to a low-explore rate... but a low-explore AI would be not curious about self-modding to a high-explore rate. So: would AIs that can modify their own explore rates, by default tend towards incuriosity & rigidity? When yes, when no?
- Yudkowsky & Soares 2018 make an interesting case for Functional Decision Theory (FDT): they prove (under reasonable circumstances) that a standard Causal Decision Theory agent would choose to self-modify to FDT, because CDT endorses FDT over CDT. An example of a decision theory endorsing replacing itself!
The above also only focuses on "value preservation" – but how do we make sense of cases where we want our values to change? How is it even possible to value having different values? (e.g. "I'm attracted to Bad Boys, but I wish I wasn't".)
- I think one trick to this is to treat each "value" in an agent as if it's an agent itself. (I think this is similar to Shard Theory? or Minsky's Society of Mind.)
- Take the above Bad Boys example: let's say there's 3 values/sub-agents at play: Excitement (pro Bad Boy), Safety and Emotional Intimacy (anti Bad Boy). If the brain was a democracy, the latter two agents want to have the first agent "voted off the island". That's how one (as an ensemble whole) can value changing your values.
  - Alternatively, imagine your brain like a co-operatively owned company: the "sub-agents" (individual values) democratically choose who to hire & fire. This is how you can modify your values in line with your values.
The above may also help formalize the line between what we consider "good" value-change and "bad" value-drift: did I change my values according to my current values?
- For example, right now, I value beliefs discovered through the scientific methods, and not value beliefs acquired through "direct revelation". If someone were to hypothetically drug me so hard I start believing in direct revelation, I'd consider that a Bad Ending. But! If hard scientific evidence were to come out about direct revelation working (e.g. someone publicly pre-registers visions of 100 lottery numbers and they all come true), then if I update towards believing in direct revelation, that's a Good Ending.
One promising strategy for alignment is for an AI to learn my true reward function. (Inverse reward learning) Would a smart-enough AI have an incentive to try to mis-learn my reward function as something easier to fulfill, or actually modify me to have easier-to-fulfill preferences? Everitt & Hutter 2016 prove not, but only for optimally rational agents. Research Q: does it still hold for bounded-rational agents?
I'm using numerical simulation because my analytic-solution skills are crud. Maybe I could collab with someone with better analysis chops?

➡️ Project 6) SCI: Semi-automated Causal Inference

Summary: Make a hybrid AI (a GOFAI with an LLM as a module inside of it) to semi-automate causal inference from observational data, which will be highly scientifically/economically valuable in fields where it's impractical/unethical to do experiments, like epidemiology, economics, social science, etc.

(also, I may collab with MATS alum Egg Syntax on this; they're also already independently pursuing causal-inference-with-LLMs)

. . .

Related to: Human-in-the-loop/Cyborgism/Narrow AI, Causal models/inference

. . .

Motivation:

Not directly related to AI Safety/Alignment, but it may help by giving a proof-of-concept, to change the economic incentives?

Concretely: if we can show a very scientifically/economically valuable use case for automation, that does not require further advances in foundational models... that may(?) shift the incentive away from advancing those models, and towards figuring out how to plug-and-play current AI:

I think causal inference is a big low-hanging fruit, here!

Explanation:

This is “the” Scientific Process:

Look at stuff
Notice weird stuff (violations of model predictions)
Generate hypotheses for why (generate causal and/or mathematical models)
Generate tests that can distinguish between those models
Run those tests
Look at the results
Repeat

How can each part be automated?

(What I may do as part of a pilot test of this project...)

Observation: Grab a bunch of data from Our World In Data or Gallup or similar.
Noticing Weird Stuff:
- Use good ol’ fashioned stats to pick up strong correlations, or sudden discontinuities/kinks in time-series data, etc
- Use an LLM as a “common sense module” to detect if something is weird? (e.g. “Hey Claude, if there was a 50% drop in youth self-harm in March 2020, how weird is that on a scale from 1-5?”)
Generate hypotheses (in the form of causal models)
- I'll need to figure out how to scaffold/prompt LLMs to do this usefully (e.g. “Hey Claude, what major events happened in March 2020?” or “Hey Claude, what possible confounding factors could influence both being trans, and being a programmer?”)
- Human-in-the-loop: You the human can also add/subtract your own hypotheses
Generate tests to distinguish between causal models (Pearl-like causal diagrams)
- This is good ol’ fashioned stats/causal inference. If there’s a correlation between A & B, and our three hypotheses are A causes B, B causes A, Confounder causes A & B... then to rule out Confounder, we “control for” Confounder and see if the correlation reduces or goes away.
- To test between A causes B or B causes A, if we have time-series data, we can do lagged-statistics-tests to see whether A or B leads/follows. Or use an “instrumental variable” which we know only affects A, and can’t affect B unless through A.
Run those tests! (LLM calls stats tools as tool use)
- Bonus: If it realizes it doesn't have enough data, it can request you for more.
Look at results, Repeat!

. . .

Challenges with testing this:

How can we know if this scaffolded LLM is inferring causal relations from scratch, or just pulling it up from its latent knowledge? For example, if we gave it lung cancer & smoking data, ideally it’d figure out smoking causes lung cancer... but if it outputs that, how would we know it figured it out vs just remembered that’s the correct answer, from its written corpus data?
- One difference this system has, vs previous LLM-causal-inference tests, is that we do have a mix of LLMs and “good ol’ fashioned stats” — so, maybe we will be able to tell if it’s correctly going through all the steps of inferring smoking→cancer from scratch? (e.g. it should generate “maybe industrialization is a confounding factor” as a hypothesis, then generate tests to confirm/falsify that, like looking at non-smokers in industrialized areas.)
- That is: it should recreate not just the “correct” conclusion, but the entire process to rule out alternative hypotheses.
“Placebo tests”: this system should not find direct causal links between things we know to be spurious correlations, e.g. ice cream sales & drowning (confounded by summer swimming)
How do I get a hypothesis’s “prior”? Use the “weirdness rating” again? Or a “simplicity rating”?
Generating advice for more science, or for policy: If there is insufficient data to distinguish between two plausible models, it can ask for more data/experiments. Or, once we have a causal model, it can make policy recommendations for stuff we want. (For example, if it figures out what policies reliably cause less poverty or drug use, it can recommend that.)

Alright, those are six AI/Alignment-related research projects I'm working on! Let's see in one year's time if any of these go anywhere.

Ciao,
~ Nicky Case

“[Russell & Norvig's famous AI textbook describes] a seemingly reasonable, but incorrect, reward function for a vacuum robot: if we reward the action of cleaning up dirt, the optimal policy causes the robot to repeatedly dump and clean up the same dirt.” (source) ↩︎
Bishop & Trout (2005) give a snappy overview: The Amazing Success of Statistical Prediction Rules (pdf) ↩︎
It's the force-directed graph drawing algorithm, but with the vertical (y) positions locked. ↩︎