Early 2026 AI Notes

Over the past few weeks, I read through the best end-of-year and early 2026 AI posts to get a sense of where things are headed.

Here are my favorites with tldr notes, tools, and takes.


2025 LLM Year in Review by Andrej Karpathy

  • Reinforcement Learning from Verifiable Rewards (RLVR): New major training stage added to the LLM production stack after pre-training → SFT → RLHF. Uses automatically checkable rewards (math/code/etc.) to induce reasoning strategies.

  • Ghosts vs. Animals: LLMs (unlike humans) are optimized for text + puzzles, not survival. Capabilities spike in verifiable domains but remain uneven elsewhere. Benchmarks became unreliable “benchmaxxing”

  • Cursor / New Layer of LLM Apps: Cursor popularized the LLM app pattern: bundling context, multi-call orchestration, vertical GUI, autonomy slider. Are there green pastures for apps? Yes. Labs are generalists. Apps are specialists in verticals by using private data and feedback loops.

  • Claude Code / AI That Lives on Your Computer: First convincing demo of what an LLM agent looks like. Runs locally with your environment/data instead of cloud. Analogy: a little spirit/ghost that "lives" on your computer.

  • Vibe Coding: Natural language → code crossed usability threshold. Non-programmers can build apps; programmers can build far more. Code becomes ephemeral, free, disposable. “Vibe coding will terraform software and alter job descriptions.”

  • LLM GUI: Text is efficient for machines but not for humans; we prefer visual/spatial formats. Who is “actually going to build the LLM GUI?” Nano Banana hints at this by combining text + images + world knowledge

Standing Out in 2026 by Lulu Cheng Meservey

  • Everything is fake now: Fake content by fake influencers with fake engagement from fake followers, launching fake products with fake testimonials. Real has never been more precious.

  • For comms, 2024 was going direct and 2025 was winning attention. 2026 will be about doing real things.

  • Doing real things means: Putting in real effort, Showing real evidence, Real world events and artifacts, Showing up as real humans, Forming real relationships

  • Once you are Real you can’t become unreal again. It lasts for always.”

The GPT-9 Test by Michael Bloch (Quiet Capital)

  • What happens to your business when GPT-9 ships?

  • Does your biz depend on AI being bad at something?

  • Do you have networks effects where the product improves as more people use it?

  • Does your biz require physical presence that can’t be automated away?

  • Does better AI make your product more valuable, or less?

  • Most biz today are “arbitraging a temporary capability gap”

  • Build something that lasts.

The Last Moat Standing by fintechjunkie

  • If anyone (kid in dorm room) can build your product in a weekend, what's actually defensible?

  • Last real moat: An opinionated perspective on the solution

  • Building is now easy and fast. Having an informed opinion is hard and takes time.

  • Copying opinionated teams is like hitting a moving target.

The disappearing middle of software work by Karri (Linear)

  • Middle of software = opening the codebase, booting up the environment, and writing the code

  • For a long time, this has been the most important work and where most time was spent.

  • This middle is disappearing/thinning thanks to coding agents.

  • Understanding the problem, gathering context, and directing agent work become the most important work.

The End of Reusable Software by Sherwood

  • Code is now free. No longer need to use existing software. Claude can create from scratch.

  • Why create reusable programs? Why not just write one-off for every scenario? Coding agents already do this.

Observability's Past, Present, and Future by Sherwood

  • Observability emerged to tame cloud and microservice complexity: distributed tracing + a new reliability mindset that actually worked at first.

  • Today we over-collect telemetry and obsess over dashboards, but the real bottleneck is humans making sense of the data, not generating more of it.

  • AI is about to flood the world with wayyy more (and messier) software, so we’ll need a new kind of observability that helps us reason about and operate this infinite codebase.

  • h/t 1 in every 5 founders I meet in the Bay Area is building an observability platform for agents

Founding of Claude Code + Cowork by Boris Cherny

  • Initially launched Claude Code to Anthropic team to dogfood

  • Started with Sonnet 3.5 before model was good at agentic coding

  • Couple months later, non-eng (research, data sci, design) started using CC daily

  • Now people are using CC to “control their oven, recover wedding photos from a busted hard drive, analyze their DNA and medical records, haggle with customer support.”

  • Realized they needed to “make it easier for people that want to use the Claude agent for things that are not coding” → Introducing Claude Cowork

How I use Claude Code by Boris Cherny, creator of Claude Code

  • Run 1-5 Claudes locally, run 5-10 Claudes on the web (claude.ai)

  • Opus 4.5 for everything

  • Team shared and updated claude.md

  • Start most sessions in plan mode

  • Create slash commands for repeat work

  • Create subagents for automating common workflows

  • PostToolUse hook to clean up code

  • Use /permissions > dangerous skip permissions

  • Allow Claude Code to use all your tools (MCP) for you

  • Verify and/or ralph long-running tasks

  • Give Claude a way to verify its work

Scaling long-running autonomous coding by Wilson Lin (Cursor) + Leerob summary

  • Cursor ran ”hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.”

  • Single agents are good for focused tasks but slow for complex projects

  • Flat structure of agents failed bc agents became risk-averse and avoided difficult tasks

  • Separating into planner and worker roles and judging agents solved coordination problems and allowed cursor to scale to very large projects

  • Lessons deploying trillions of tokens on long-running tasks: Model choice (GPT-5.2) matters, removing complexity (unnecessary roles) but the prompts matter most

Coding agents need product agents by Jordan

  • Coding is cheap. Decision-making is still expensive.

  • More pressure on the part teams have always struggled with most: deciding what to build, why it matters, and staying aligned.

  • Coding agents help teams ship faster. Product agents (Async) can help teams ship the right thing.

Shipping at Inference-Speed by Peter Steinberger

  • You can now ship code now at a speed that seems unreal. Now limited by inference time and hard thinking.

  • Important decisions have become languages, ecosystem, and dependencies.

  • It’s getting harder and harder to trust benchmarks. Try multiple models/tools to understand.

Notes on AI Apps in 2026 by Anish Acharya (a16z)

  • Thinking tools vs Making tools: Many execution tools exist, but more exploration tools are needed.

  • Software eats all the “service” functions in the organization: Agents will replace human service functions (legal, finance, HR)

  • Compounding AI apps: Apps that benefit from multi-modal data, proprietary datasets, networks, and ecosystems (e.g., thick apps) will compound.

  • Humans discover “the rest” of AI: UX/UI will improve and more consumers will create with AI.

  • Notes for (incumbent) CEOs: Collapse customer-facing roles, software-first everywhere, and price boldly. For most enterprise tasks, AGI is here.

LLMs vs. Marketplaces by Dan Hockenmaier

  • LLMs are on a collision course with marketplaces

  • Collision = User → AI interface → DoorDash → Sandwich delivery

  • This is a problem because marketplaces pay back CAC from repeat transactions and would have to spend more per tx because orders are coming from ChatGPT rather than their app

  • Marketplace defensibility to LLMs comes from:

  • Difficulty of supply aggregation: hotels (easy for LLM) vs airbnb (hard for LLM)

  • Degree of management: search and tx (easy for LLM) vs risk and service (hard for LLM)

  • Nature of customer engagement: low frequency and high consideration/research (good for llm) vs high frequency and low consideration (bad for llm)

  • What marketplaces should do? Do things LLMs wont or can’t.

Consumer AI predictions by Eugenia (Wabi)

  • Screenless AI devices will flop: voice is good secondary interface but bad primary, hard to fight our addition to feeds/screens

  • “Always listening” devices won’t work either: most things dont matter to record and things that do you wont dare record. granola is good.

  • Mini-apps will unlock UGC personal software: full apps are hard to make and heavy to onboard and use, ai coding + mini-apps = UGC software

  • By 2030 there will be two big general purpose AI chatbots: today we have cGPT and lots of niche bots, tomorrow cGPT-like assistant and AI friend will be the big ones

  • Performance marketing for apps is dead: saturated channels and copycats will push margins to zero. paid acquisition is a boost, not a biz model

  • The fastest consumer product to reach $1B ARR will be an AI webcam girl: dropping video generation cost will result in a hyper-personalized 24/7 super OnlyFans

  • Whoever solves AI discovery wins: normal people use text input for chat and search. consumer AI winners will unlock hidden beyond search/chat use-cases

Human data will be a $1 trillion/year market by Ali Ansari

  • All functions (digital and physical) will be automated.

  • Automation pushes humans towards higher-value creative work.

  • Frontier AI requires structured human data to learn.

  • A lot of time and money will be spent on “expert human data creation or structured human judgment”

21 Lessons From 14 Years at Google by Addy Osmani

  • The best engineers are obsessed with solving user problems.** — “User obsession means spending time in support tickets, talking to users, watching users struggle, asking “why” until you hit bedrock.”

  • Bias towards action. Ship. You can edit a bad page, but you can’t edit a blank one. — “First do it, then do it right, then do it better.“

  • Your code doesn’t advocate for you. People do. — “In large organizations, decisions get made in meetings you’re not invited to, using summaries you didn’t write, by people who have five minutes and twelve priorities.”

  • The best code is the code you never had to write.** — “The problem isn’t that engineers can’t write code or use AI to do so. It’s that we’re so good at writing it that we forget to ask whether we should.”

  • Focus on what you can control. Ignore what you can’t. — “Dwelling on these creates anxiety without agency.”

  • Writing forces clarity. The fastest way to learn something better is to try teaching it. — “The act of making something legible to someone else makes it more legible to me.”

  • The work that makes other work possible is priceless — and invisible. — “Glue work - documentation, onboarding, cross-team coordination, process improvement - is vital.”

  • When a measure becomes a target, it stops measuring. — “The goal is insight, not surveillance.”

  • Admitting what you don’t know creates more safety than pretending you do. — “When a leader admits uncertainty, it signals that the room is safe for others to do the same.“

  • Your network outlasts every job you’ll ever have. — “Your job isn’t forever, but your network is. Approach it with curiosity and generosity, not transactional hustle.”

  • Most performance wins come from removing work, not adding cleverness. — “Before you optimize, question whether the work should exist at all.”

2025 Year in review by Paul Stamatiou

  • Rewind lessons: “prioritize task-based workflows over pure recall, and explore using lightweight visual models for data classification.”

  • Limitless pendant lessons: AI wearables have real uses-cases but some “take privacy extraordinarily seriously and lean introverted” and arent excited to wear recording devices

  • Who to work with? “exceptionally talented team, at the forefront of AI, with leadership and a CEO who genuinely care about quality, and as little organizational friction as possible between us and an outstandingly well-crafted product.”

  • How to be successful? “if successful, will make the rest of my career look like a footnote.”

  • Lessons exploring: Designers who code aren't a nice-to-have anymore: they're the norm. People “who weren't already working closely with AI were thinking about leaving their companies.”

  • Sesame (hiring) is building lifelike personal agents via software + hardware

  • Restraint with coding agents: “Just because you can, doesn't mean you should.” The hard part now is restraint.

New AI tools

  • Scott AI: agnetic workspace for eng alignment

  • Claude Cowork: Claude Code for non-technical tasks.

  • Ralph Wiggum: self-referential AI development loops in Claude Code.

  • Worktrunk: git worktree manager for running AI agents in parallel

  • CallMe: plugin that allows claude code to call you on your phone.

  • Async: slack-based product agent that learns about your company's product, customers, codebase, and team from existing work.

  • Universal Commerce Protocol (UCP) — enable commerce inside Google AI products like Gemini

Good AI takes