My LLM & Agent Signal Stack

Sat Jan 17 2026

I’ve been consuming a lot of LLM/agent content lately. Some of it is genuinely useful; most of it is noise.
This page is my attempt to keep a practical signal stack — sources + a mental model that help me build agents that hold up past the demo.
Current staples I already follow:
  • Xiaohongshu “update” feeds (fast trend radar)
  • Latent Space Podcast
    • https://www.latent.space/podcast
  • Anthropic blog (Newsroom / updates)
    • https://www.anthropic.com/news
  • LangChain blog
    • https://blog.langchain.com/
What I’m doing now is turning that into a structured system.

The mental model (inspired by Anthropic’s “Building Effective Agents”)

The main framing I stole from Anthropic is simple:
Start with the simplest workflow that can solve the problem, then progressively “agentify” only when complexity demands it.
Very SDE-friendly: get something observable, testable, and debuggable before you crank up autonomy.
I bucket agent building into three layers:

1) Primitives (components)

These are the parts you’ll reuse across systems.
  • Tools / action space
    • Tool calling is table stakes. The real work is discovery, permissions, schema discipline, and keeping context from exploding when tool count grows.
  • Environment / state
    • Agents don’t live in chat logs; they live in an environment — browser, file system, UI state, terminal, DB.
  • Memory / context engineering
    • “Memory” isn’t a checkbox. It’s design choices:
    • what to store (and what not to)
    • when to write
    • how to retrieve
    • how to compress without breaking correctness

2) Patterns (composable architectures)

Instead of jumping straight to a “general agent,” I think in patterns that compose cleanly:
  • Prompt chaining / Routing / Parallelization
  • Orchestrator–Workers
  • Evaluator–Optimizer (review → improve → re-evaluate loops)

3) Harness & evals (reliability)

This is the difference between a demo and something you can ship:
  • traceability
  • failure mode taxonomy
  • safe retries
  • tool error handling
  • drift control over long runs
If I can’t measure reliability, I don’t trust it.

My source list (ranked by “signal per minute”)

Rule of thumb:
  • Tier 0: primary sources — subscribe and skim regularly
  • Tier 1: translators — podcasts/newsletters that turn research into engineering lessons
  • Tier 2: benchmarks/evals — calibration tools so I don’t fool myself

Tier 0 — Primary sources (must-follow)

Anthropic (Research / Engineering)

  • Building Effective AI Agents
    • https://www.anthropic.com/research/building-effective-agents
  • Demystifying evals for AI agents
    • https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents

OpenAI Docs (Agents + tool calling)

  • Function calling
    • https://platform.openai.com/docs/guides/function-calling
  • Agents SDK
    • https://platform.openai.com/docs/guides/agents-sdk
  • Agents Python docs
    • https://openai.github.io/openai-agents-python/

LangChain / LangGraph / LangSmith

  • Deep Agents
    • https://github.com/langchain-ai/deepagents
  • LangGraph
    • https://blog.langchain.com/langgraph/
  • LangSmith Agent Builder
    • https://blog.langchain.com/langsmith-agent-builder/

Manus (context engineering, very practical)

  • Context Engineering for AI Agents: Lessons from Building Manus
    • https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus

Hugging Face (open-source ecosystem radar)

  • smolagents
    • https://huggingface.co/blog/smolagents
  • Agents Course
    • https://huggingface.co/learn/agents-course/en/unit1/tutorial

Google DeepMind Blog (research trends)

  • DeepMind Blog
    • https://deepmind.google/blog/

Tier 1 — Podcasts / newsletters (weekly pick)

  • Latent Space Podcast
    • https://www.latent.space/podcast
  • High Agency (Humanloop)
    • https://humanloop.com/podcast
  • The Cognitive Revolution
    • https://cognitiverevolution.substack.com/
  • The Gradient Podcast
    • https://thegradientpub.substack.com/s/podcast
  • TWIML AI Podcast
    • https://twimlai.com/podcast/twimlai/
  • Import AI (Jack Clark)
    • https://importai.substack.com/

Tier 2 — Benchmarks / evals (calibration tools)

  • AgentBench (LLM-as-agent benchmark)
    • https://arxiv.org/abs/2308.03688
  • WebArena (web environment benchmark; great for browser/tool agents)
    • https://arxiv.org/abs/2307.13854

Chinese sources (fast + noisy, still useful)

I treat these as trend radar, not truth. The goal is “early signal,” then I verify via Tier 0.

Websites / cross-posted feeds

  • QbitAI (量子位)
    • https://www.qbitai.com/
  • Jiqizhixin (机器之心)
    • https://www.jiqizhixin.com/

WeChat Official Accounts I actually keep an eye on

These are the ones that consistently surface papers, product updates, and industry moves quickly:
  • 量子位 / QbitAI(公众号:QbitAI)
  • 新智元(often described as “智能+中国”; sometimes people shorthand it as “new intelligence era”)
  • 机器之心(公众号:almosthuman2014)
  • PaperWeekly(偏论文解读/研究动态)
  • DeepTech 深科技(偏“深科技”与产业侧)
  • 机器学习研究会(更偏学术/研究社区)

Xiaohongshu (my filter keywords)

MCP, LangGraph, agent eval, context engineering, tool calling, memory

YouTube (implementation > hype)

  • AI Engineer
    • https://www.youtube.com/@aiDotEngineer
  • Latent Space (video)
    • https://www.youtube.com/@LatentSpacePod
  • Anthropic
    • https://www.youtube.com/@anthropic-ai
  • LangChain
    • https://www.youtube.com/@LangChain
  • Hugging Face
    • https://www.youtube.com/@huggingface

X (Twitter) — “early signal” accounts

Orgs

  • Anthropic
    • https://x.com/AnthropicAI
  • LangChain
    • https://x.com/LangChainAI
  • Hugging Face
    • https://x.com/huggingface
  • Google DeepMind
    • https://x.com/GoogleDeepMind

People

  • swyx
    • https://x.com/swyx
  • Harrison Chase
    • https://x.com/hwchase17
  • Simon Willison
    • https://x.com/simonw
  • Jack Clark
    • https://x.com/jackclarkSF

Hands-on anchors (so I don’t stay theoretical)

1) nanochat — a minimal end-to-end ChatGPT-style stack

When I feel like I’m consuming too much and building too little, I go back to one repo and follow the plumbing.
  • Repo
    • https://github.com/karpathy/nanochat
  • Karpathy
    • https://x.com/karpathy
My notes template:
  1. How is data + tokenization handled?
  1. What are the critical engineering points in the training loop?
  1. What evals exist, and what’s the minimum viable eval?
  1. What does the inference/serving loop look like end-to-end?

2) Memory-first agents (MemGPT / Letta)

I’m increasingly convinced memory is a real separator for long-running agents — not because it’s fancy, but because it reduces repetition and improves continuity.
What I care about:
  • memory tiers (working vs long-term vs externalized)
  • write policies (when to commit memory)
  • retrieval policies (what to pull back, and when)
  • compression without breaking correctness
I keep this linked to a separate page where I run small experiments (same task over multiple days; measure steps/retries/tool errors).

My low-friction weekly cadence

  • Daily (10 min): skim Tier 0 (Anthropic / OpenAI / LangChain / Manus)
  • Weekly (1 hr): 1–2 podcasts (Latent Space or High Agency)
  • Weekly (30 min): 1 eval/benchmark paper or an eval-focused post
  • Biweekly: write a short “what I learned + how I’ll apply it” note

Personal reminder (so I don’t drift)

  • Don’t chase new buzzwords — chase new failure modes and how people fix them
  • Prefer postmortems and production stories over “top 10 frameworks” lists
  • If I can’t measure reliability, I’m not done building