Early 2026 AI Notes

Over the past few weeks, I read through the best end-of-year and early 2026 AI posts to get a sense of where things are headed.

Here are my favorites with tldr notes, tools, and takes.

2025 LLM Year in Review by Andrej Karpathy

Reinforcement Learning from Verifiable Rewards (RLVR): New major training stage added to the LLM production stack after pre-training → SFT → RLHF. Uses automatically checkable rewards (math/code/etc.) to induce reasoning strategies.
Ghosts vs. Animals: LLMs (unlike humans) are optimized for text + puzzles, not survival. Capabilities spike in verifiable domains but remain uneven elsewhere. Benchmarks became unreliable “benchmaxxing”
Cursor / New Layer of LLM Apps: Cursor popularized the LLM app pattern: bundling context, multi-call orchestration, vertical GUI, autonomy slider. Are there green pastures for apps? Yes. Labs are generalists. Apps are specialists in verticals by using private data and feedback loops.
Claude Code / AI That Lives on Your Computer: First convincing demo of what an LLM agent looks like. Runs locally with your environment/data instead of cloud. Analogy: a little spirit/ghost that "lives" on your computer.
Vibe Coding: Natural language → code crossed usability threshold. Non-programmers can build apps; programmers can build far more. Code becomes ephemeral, free, disposable. “Vibe coding will terraform software and alter job descriptions.”
LLM GUI: Text is efficient for machines but not for humans; we prefer visual/spatial formats. Who is “actually going to build the LLM GUI?” Nano Banana hints at this by combining text + images + world knowledge

Standing Out in 2026 by Lulu Cheng Meservey

Everything is fake now: Fake content by fake influencers with fake engagement from fake followers, launching fake products with fake testimonials. Real has never been more precious.
For comms, 2024 was going direct and 2025 was winning attention. 2026 will be about doing real things.
Doing real things means: Putting in real effort, Showing real evidence, Real world events and artifacts, Showing up as real humans, Forming real relationships
“Once you are Real you can’t become unreal again. It lasts for always.”

The GPT-9 Test by Michael Bloch (Quiet Capital)

What happens to your business when GPT-9 ships?
Does your biz depend on AI being bad at something?
Do you have networks effects where the product improves as more people use it?
Does your biz require physical presence that can’t be automated away?
Does better AI make your product more valuable, or less?
Most biz today are “arbitraging a temporary capability gap”
Build something that lasts.

The Last Moat Standing by fintechjunkie

If anyone (kid in dorm room) can build your product in a weekend, what's actually defensible?
Last real moat: An opinionated perspective on the solution
Building is now easy and fast. Having an informed opinion is hard and takes time.
Copying opinionated teams is like hitting a moving target.

The disappearing middle of software work by Karri (Linear)

Middle of software = opening the codebase, booting up the environment, and writing the code
For a long time, this has been the most important work and where most time was spent.
This middle is disappearing/thinning thanks to coding agents.
Understanding the problem, gathering context, and directing agent work become the most important work.

The End of Reusable Software by Sherwood

Code is now free. No longer need to use existing software. Claude can create from scratch.
Why create reusable programs? Why not just write one-off for every scenario? Coding agents already do this.

Observability's Past, Present, and Future by Sherwood

Observability emerged to tame cloud and microservice complexity: distributed tracing + a new reliability mindset that actually worked at first.
Today we over-collect telemetry and obsess over dashboards, but the real bottleneck is humans making sense of the data, not generating more of it.
AI is about to flood the world with wayyy more (and messier) software, so we’ll need a new kind of observability that helps us reason about and operate this infinite codebase.
h/t 1 in every 5 founders I meet in the Bay Area is building an observability platform for agents

Founding of Claude Code + Cowork by Boris Cherny

Initially launched Claude Code to Anthropic team to dogfood
Started with Sonnet 3.5 before model was good at agentic coding
Couple months later, non-eng (research, data sci, design) started using CC daily
Now people are using CC to “control their oven, recover wedding photos from a busted hard drive, analyze their DNA and medical records, haggle with customer support.”
Realized they needed to “make it easier for people that want to use the Claude agent for things that are not coding” → Introducing Claude Cowork

How I use Claude Code by Boris Cherny, creator of Claude Code

Run 1-5 Claudes locally, run 5-10 Claudes on the web (claude.ai)
Opus 4.5 for everything
Team shared and updated claude.md
Start most sessions in plan mode
Create slash commands for repeat work
Create subagents for automating common workflows
PostToolUse hook to clean up code
Use /permissions > dangerous skip permissions
Allow Claude Code to use all your tools (MCP) for you
Verify and/or ralph long-running tasks
Give Claude a way to verify its work

Scaling long-running autonomous coding by Wilson Lin (Cursor) + Leerob summary

Cursor ran ”hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.”
Single agents are good for focused tasks but slow for complex projects
Flat structure of agents failed bc agents became risk-averse and avoided difficult tasks
Separating into planner and worker roles and judging agents solved coordination problems and allowed cursor to scale to very large projects
Lessons deploying trillions of tokens on long-running tasks: Model choice (GPT-5.2) matters, removing complexity (unnecessary roles) but the prompts matter most

Coding agents need product agents by Jordan

Coding is cheap. Decision-making is still expensive.
More pressure on the part teams have always struggled with most: deciding what to build, why it matters, and staying aligned.
Coding agents help teams ship faster. Product agents (Async) can help teams ship the right thing.

Shipping at Inference-Speed by Peter Steinberger

You can now ship code now at a speed that seems unreal. Now limited by inference time and hard thinking.
Important decisions have become languages, ecosystem, and dependencies.
It’s getting harder and harder to trust benchmarks. Try multiple models/tools to understand.

Notes on AI Apps in 2026 by Anish Acharya (a16z)

Thinking tools vs Making tools: Many execution tools exist, but more exploration tools are needed.
Software eats all the “service” functions in the organization: Agents will replace human service functions (legal, finance, HR)
Compounding AI apps: Apps that benefit from multi-modal data, proprietary datasets, networks, and ecosystems (e.g., thick apps) will compound.
Humans discover “the rest” of AI: UX/UI will improve and more consumers will create with AI.
Notes for (incumbent) CEOs: Collapse customer-facing roles, software-first everywhere, and price boldly. For most enterprise tasks, AGI is here.

LLMs vs. Marketplaces by Dan Hockenmaier

LLMs are on a collision course with marketplaces
Collision = User → AI interface → DoorDash → Sandwich delivery
This is a problem because marketplaces pay back CAC from repeat transactions and would have to spend more per tx because orders are coming from ChatGPT rather than their app
Marketplace defensibility to LLMs comes from:
Difficulty of supply aggregation: hotels (easy for LLM) vs airbnb (hard for LLM)
Degree of management: search and tx (easy for LLM) vs risk and service (hard for LLM)
Nature of customer engagement: low frequency and high consideration/research (good for llm) vs high frequency and low consideration (bad for llm)
What marketplaces should do? Do things LLMs wont or can’t.

Consumer AI predictions by Eugenia (Wabi)

Screenless AI devices will flop: voice is good secondary interface but bad primary, hard to fight our addition to feeds/screens
“Always listening” devices won’t work either: most things dont matter to record and things that do you wont dare record. granola is good.
Mini-apps will unlock UGC personal software: full apps are hard to make and heavy to onboard and use, ai coding + mini-apps = UGC software
By 2030 there will be two big general purpose AI chatbots: today we have cGPT and lots of niche bots, tomorrow cGPT-like assistant and AI friend will be the big ones
Performance marketing for apps is dead: saturated channels and copycats will push margins to zero. paid acquisition is a boost, not a biz model
The fastest consumer product to reach $1B ARR will be an AI webcam girl: dropping video generation cost will result in a hyper-personalized 24/7 super OnlyFans
Whoever solves AI discovery wins: normal people use text input for chat and search. consumer AI winners will unlock hidden beyond search/chat use-cases

Human data will be a $1 trillion/year market by Ali Ansari

All functions (digital and physical) will be automated.
Automation pushes humans towards higher-value creative work.
Frontier AI requires structured human data to learn.
A lot of time and money will be spent on “expert human data creation or structured human judgment”

21 Lessons From 14 Years at Google by Addy Osmani

The best engineers are obsessed with solving user problems.** — “User obsession means spending time in support tickets, talking to users, watching users struggle, asking “why” until you hit bedrock.”
Bias towards action. Ship. You can edit a bad page, but you can’t edit a blank one. — “First do it, then do it right, then do it better.“
Your code doesn’t advocate for you. People do. — “In large organizations, decisions get made in meetings you’re not invited to, using summaries you didn’t write, by people who have five minutes and twelve priorities.”
The best code is the code you never had to write.** — “The problem isn’t that engineers can’t write code or use AI to do so. It’s that we’re so good at writing it that we forget to ask whether we should.”
Focus on what you can control. Ignore what you can’t. — “Dwelling on these creates anxiety without agency.”
Writing forces clarity. The fastest way to learn something better is to try teaching it. — “The act of making something legible to someone else makes it more legible to me.”
The work that makes other work possible is priceless — and invisible. — “Glue work - documentation, onboarding, cross-team coordination, process improvement - is vital.”
When a measure becomes a target, it stops measuring. — “The goal is insight, not surveillance.”
Admitting what you don’t know creates more safety than pretending you do. — “When a leader admits uncertainty, it signals that the room is safe for others to do the same.“
Your network outlasts every job you’ll ever have. — “Your job isn’t forever, but your network is. Approach it with curiosity and generosity, not transactional hustle.”
Most performance wins come from removing work, not adding cleverness. — “Before you optimize, question whether the work should exist at all.”

2025 Year in review by Paul Stamatiou

Rewind lessons: “prioritize task-based workflows over pure recall, and explore using lightweight visual models for data classification.”
Limitless pendant lessons: AI wearables have real uses-cases but some “take privacy extraordinarily seriously and lean introverted” and arent excited to wear recording devices
Who to work with? “exceptionally talented team, at the forefront of AI, with leadership and a CEO who genuinely care about quality, and as little organizational friction as possible between us and an outstandingly well-crafted product.”
How to be successful? “if successful, will make the rest of my career look like a footnote.”
Lessons exploring: Designers who code aren't a nice-to-have anymore: they're the norm. People “who weren't already working closely with AI were thinking about leaving their companies.”
Sesame (hiring) is building lifelike personal agents via software + hardware
Restraint with coding agents: “Just because you can, doesn't mean you should.” The hard part now is restraint.

New AI tools

Scott AI: agnetic workspace for eng alignment
Claude Cowork: Claude Code for non-technical tasks.
Ralph Wiggum: self-referential AI development loops in Claude Code.
Worktrunk: git worktree manager for running AI agents in parallel
CallMe: plugin that allows claude code to call you on your phone.
Async: slack-based product agent that learns about your company's product, customers, codebase, and team from existing work.
Universal Commerce Protocol (UCP) — enable commerce inside Google AI products like Gemini

Good AI takes

More from Jayme Hoffman

Jayme Hoffman

Mar 31

CSX: Week 1 Notes

We’re participating in a16z Crypto’s CSX in London. I’ll use this newsletter to share notes, links, and lessons from the program. Here are my notes f...

Cover image for Launchcaster + Orange DAO

Jayme Hoffman

Sep 30

Launchcaster + Orange DAO

I’m excited to announce that Launchcaster has been acquired by Orange DAO. We launched Launchcaster two years ago because we didn’t have a great place to share and discover crypto projects with a community that cared. Since then, our side project has evolved into a public good, fostering thousands of launches and attracting over 15,000 crypto builders. We wanted to find a long-term home for Launchcaster where it could grow and continue as a public good for crypto builders without needing to t...

Over the past few weeks, I read through the best end-of-year and early 2026 AI posts to get a sense of where things are headed.

Here are my favorites with tldr notes, tools, and takes.

2025 LLM Year in Review by Andrej Karpathy

Reinforcement Learning from Verifiable Rewards (RLVR): New major training stage added to the LLM production stack after pre-training → SFT → RLHF. Uses automatically checkable rewards (math/code/etc.) to induce reasoning strategies.
Ghosts vs. Animals: LLMs (unlike humans) are optimized for text + puzzles, not survival. Capabilities spike in verifiable domains but remain uneven elsewhere. Benchmarks became unreliable “benchmaxxing”
Cursor / New Layer of LLM Apps: Cursor popularized the LLM app pattern: bundling context, multi-call orchestration, vertical GUI, autonomy slider. Are there green pastures for apps? Yes. Labs are generalists. Apps are specialists in verticals by using private data and feedback loops.
Claude Code / AI That Lives on Your Computer: First convincing demo of what an LLM agent looks like. Runs locally with your environment/data instead of cloud. Analogy: a little spirit/ghost that "lives" on your computer.
Vibe Coding: Natural language → code crossed usability threshold. Non-programmers can build apps; programmers can build far more. Code becomes ephemeral, free, disposable. “Vibe coding will terraform software and alter job descriptions.”
LLM GUI: Text is efficient for machines but not for humans; we prefer visual/spatial formats. Who is “actually going to build the LLM GUI?” Nano Banana hints at this by combining text + images + world knowledge

Standing Out in 2026 by Lulu Cheng Meservey

Everything is fake now: Fake content by fake influencers with fake engagement from fake followers, launching fake products with fake testimonials. Real has never been more precious.
For comms, 2024 was going direct and 2025 was winning attention. 2026 will be about doing real things.
Doing real things means: Putting in real effort, Showing real evidence, Real world events and artifacts, Showing up as real humans, Forming real relationships
“Once you are Real you can’t become unreal again. It lasts for always.”

The GPT-9 Test by Michael Bloch (Quiet Capital)

What happens to your business when GPT-9 ships?
Does your biz depend on AI being bad at something?
Do you have networks effects where the product improves as more people use it?
Does your biz require physical presence that can’t be automated away?
Does better AI make your product more valuable, or less?
Most biz today are “arbitraging a temporary capability gap”
Build something that lasts.

The Last Moat Standing by fintechjunkie

If anyone (kid in dorm room) can build your product in a weekend, what's actually defensible?
Last real moat: An opinionated perspective on the solution
Building is now easy and fast. Having an informed opinion is hard and takes time.
Copying opinionated teams is like hitting a moving target.

The disappearing middle of software work by Karri (Linear)

Middle of software = opening the codebase, booting up the environment, and writing the code
For a long time, this has been the most important work and where most time was spent.
This middle is disappearing/thinning thanks to coding agents.
Understanding the problem, gathering context, and directing agent work become the most important work.

The End of Reusable Software by Sherwood

Code is now free. No longer need to use existing software. Claude can create from scratch.
Why create reusable programs? Why not just write one-off for every scenario? Coding agents already do this.

Observability's Past, Present, and Future by Sherwood

Observability emerged to tame cloud and microservice complexity: distributed tracing + a new reliability mindset that actually worked at first.
Today we over-collect telemetry and obsess over dashboards, but the real bottleneck is humans making sense of the data, not generating more of it.
AI is about to flood the world with wayyy more (and messier) software, so we’ll need a new kind of observability that helps us reason about and operate this infinite codebase.
h/t 1 in every 5 founders I meet in the Bay Area is building an observability platform for agents

Founding of Claude Code + Cowork by Boris Cherny

Initially launched Claude Code to Anthropic team to dogfood
Started with Sonnet 3.5 before model was good at agentic coding
Couple months later, non-eng (research, data sci, design) started using CC daily
Now people are using CC to “control their oven, recover wedding photos from a busted hard drive, analyze their DNA and medical records, haggle with customer support.”
Realized they needed to “make it easier for people that want to use the Claude agent for things that are not coding” → Introducing Claude Cowork

How I use Claude Code by Boris Cherny, creator of Claude Code

Run 1-5 Claudes locally, run 5-10 Claudes on the web (claude.ai)
Opus 4.5 for everything
Team shared and updated claude.md
Start most sessions in plan mode
Create slash commands for repeat work
Create subagents for automating common workflows
PostToolUse hook to clean up code
Use /permissions > dangerous skip permissions
Allow Claude Code to use all your tools (MCP) for you
Verify and/or ralph long-running tasks
Give Claude a way to verify its work

Scaling long-running autonomous coding by Wilson Lin (Cursor) + Leerob summary

Cursor ran ”hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.”
Single agents are good for focused tasks but slow for complex projects
Flat structure of agents failed bc agents became risk-averse and avoided difficult tasks
Separating into planner and worker roles and judging agents solved coordination problems and allowed cursor to scale to very large projects
Lessons deploying trillions of tokens on long-running tasks: Model choice (GPT-5.2) matters, removing complexity (unnecessary roles) but the prompts matter most

Coding agents need product agents by Jordan

Coding is cheap. Decision-making is still expensive.
More pressure on the part teams have always struggled with most: deciding what to build, why it matters, and staying aligned.
Coding agents help teams ship faster. Product agents (Async) can help teams ship the right thing.

Shipping at Inference-Speed by Peter Steinberger

You can now ship code now at a speed that seems unreal. Now limited by inference time and hard thinking.
Important decisions have become languages, ecosystem, and dependencies.
It’s getting harder and harder to trust benchmarks. Try multiple models/tools to understand.

Notes on AI Apps in 2026 by Anish Acharya (a16z)

Thinking tools vs Making tools: Many execution tools exist, but more exploration tools are needed.
Software eats all the “service” functions in the organization: Agents will replace human service functions (legal, finance, HR)
Compounding AI apps: Apps that benefit from multi-modal data, proprietary datasets, networks, and ecosystems (e.g., thick apps) will compound.
Humans discover “the rest” of AI: UX/UI will improve and more consumers will create with AI.
Notes for (incumbent) CEOs: Collapse customer-facing roles, software-first everywhere, and price boldly. For most enterprise tasks, AGI is here.

LLMs vs. Marketplaces by Dan Hockenmaier

LLMs are on a collision course with marketplaces
Collision = User → AI interface → DoorDash → Sandwich delivery
This is a problem because marketplaces pay back CAC from repeat transactions and would have to spend more per tx because orders are coming from ChatGPT rather than their app
Marketplace defensibility to LLMs comes from:
Difficulty of supply aggregation: hotels (easy for LLM) vs airbnb (hard for LLM)
Degree of management: search and tx (easy for LLM) vs risk and service (hard for LLM)
Nature of customer engagement: low frequency and high consideration/research (good for llm) vs high frequency and low consideration (bad for llm)
What marketplaces should do? Do things LLMs wont or can’t.

Consumer AI predictions by Eugenia (Wabi)

Screenless AI devices will flop: voice is good secondary interface but bad primary, hard to fight our addition to feeds/screens
“Always listening” devices won’t work either: most things dont matter to record and things that do you wont dare record. granola is good.
Mini-apps will unlock UGC personal software: full apps are hard to make and heavy to onboard and use, ai coding + mini-apps = UGC software
By 2030 there will be two big general purpose AI chatbots: today we have cGPT and lots of niche bots, tomorrow cGPT-like assistant and AI friend will be the big ones
Performance marketing for apps is dead: saturated channels and copycats will push margins to zero. paid acquisition is a boost, not a biz model
The fastest consumer product to reach $1B ARR will be an AI webcam girl: dropping video generation cost will result in a hyper-personalized 24/7 super OnlyFans
Whoever solves AI discovery wins: normal people use text input for chat and search. consumer AI winners will unlock hidden beyond search/chat use-cases

Human data will be a $1 trillion/year market by Ali Ansari

All functions (digital and physical) will be automated.
Automation pushes humans towards higher-value creative work.
Frontier AI requires structured human data to learn.
A lot of time and money will be spent on “expert human data creation or structured human judgment”

21 Lessons From 14 Years at Google by Addy Osmani

The best engineers are obsessed with solving user problems.** — “User obsession means spending time in support tickets, talking to users, watching users struggle, asking “why” until you hit bedrock.”
Bias towards action. Ship. You can edit a bad page, but you can’t edit a blank one. — “First do it, then do it right, then do it better.“
Your code doesn’t advocate for you. People do. — “In large organizations, decisions get made in meetings you’re not invited to, using summaries you didn’t write, by people who have five minutes and twelve priorities.”
The best code is the code you never had to write.** — “The problem isn’t that engineers can’t write code or use AI to do so. It’s that we’re so good at writing it that we forget to ask whether we should.”
Focus on what you can control. Ignore what you can’t. — “Dwelling on these creates anxiety without agency.”
Writing forces clarity. The fastest way to learn something better is to try teaching it. — “The act of making something legible to someone else makes it more legible to me.”
The work that makes other work possible is priceless — and invisible. — “Glue work - documentation, onboarding, cross-team coordination, process improvement - is vital.”
When a measure becomes a target, it stops measuring. — “The goal is insight, not surveillance.”
Admitting what you don’t know creates more safety than pretending you do. — “When a leader admits uncertainty, it signals that the room is safe for others to do the same.“
Your network outlasts every job you’ll ever have. — “Your job isn’t forever, but your network is. Approach it with curiosity and generosity, not transactional hustle.”
Most performance wins come from removing work, not adding cleverness. — “Before you optimize, question whether the work should exist at all.”

2025 Year in review by Paul Stamatiou

Rewind lessons: “prioritize task-based workflows over pure recall, and explore using lightweight visual models for data classification.”
Limitless pendant lessons: AI wearables have real uses-cases but some “take privacy extraordinarily seriously and lean introverted” and arent excited to wear recording devices
Who to work with? “exceptionally talented team, at the forefront of AI, with leadership and a CEO who genuinely care about quality, and as little organizational friction as possible between us and an outstandingly well-crafted product.”
How to be successful? “if successful, will make the rest of my career look like a footnote.”
Lessons exploring: Designers who code aren't a nice-to-have anymore: they're the norm. People “who weren't already working closely with AI were thinking about leaving their companies.”
Sesame (hiring) is building lifelike personal agents via software + hardware
Restraint with coding agents: “Just because you can, doesn't mean you should.” The hard part now is restraint.

New AI tools

Scott AI: agnetic workspace for eng alignment
Claude Cowork: Claude Code for non-technical tasks.
Ralph Wiggum: self-referential AI development loops in Claude Code.
Worktrunk: git worktree manager for running AI agents in parallel
CallMe: plugin that allows claude code to call you on your phone.
Async: slack-based product agent that learns about your company's product, customers, codebase, and team from existing work.
Universal Commerce Protocol (UCP) — enable commerce inside Google AI products like Gemini

Good AI takes

More from Jayme Hoffman

Jayme Hoffman

Mar 31

CSX: Week 1 Notes

We’re participating in a16z Crypto’s CSX in London. I’ll use this newsletter to share notes, links, and lessons from the program. Here are my notes f...

Jayme Hoffman

Sep 30

Launchcaster + Orange DAO

Jayme Hoffman

Jayme Hoffman

2 comments

More from Jayme Hoffman

Jayme Hoffman

Jayme Hoffman

2 comments

More from Jayme Hoffman

Early 2026 AI Notes

Early 2026 AI Notes

More from Jayme Hoffman

More from Jayme Hoffman

2 comments

2 comments