I’ve been consuming a lot of LLM/agent content lately. Some of it is genuinely useful; most of it is noise.
This page is my attempt to keep a practical signal stack — sources + a mental model that help me build agents that hold up past the demo.
Current staples I already follow:
- Xiaohongshu “update” feeds (fast trend radar)
- Anthropic blog (Newsroom / updates)
What I’m doing now is turning that into a structured system.
The mental model (inspired by Anthropic’s “Building Effective Agents”)
The main framing I stole from Anthropic is simple:
Start with the simplest workflow that can solve the problem, then progressively “agentify” only when complexity demands it.
Very SDE-friendly: get something observable, testable, and debuggable before you crank up autonomy.
I bucket agent building into three layers:
1) Primitives (components)
These are the parts you’ll reuse across systems.
- Tools / action space
Tool calling is table stakes. The real work is discovery, permissions, schema discipline, and keeping context from exploding when tool count grows.
- Environment / state
Agents don’t live in chat logs; they live in an environment — browser, file system, UI state, terminal, DB.
2) Patterns (composable architectures)
Instead of jumping straight to a “general agent,” I think in patterns that compose cleanly:
- Prompt chaining / Routing / Parallelization
- Evaluator–Optimizer (review → improve → re-evaluate loops)
3) Harness & evals (reliability)
This is the difference between a demo and something you can ship:
- drift control over long runs
If I can’t measure reliability, I don’t trust it.
My source list (ranked by “signal per minute”)
Rule of thumb:
- Tier 0: primary sources — subscribe and skim regularly
- Tier 1: translators — podcasts/newsletters that turn research into engineering lessons
- Tier 2: benchmarks/evals — calibration tools so I don’t fool myself
Tier 0 — Primary sources (must-follow)
Anthropic (Research / Engineering)
- Building Effective AI Agents
- Demystifying evals for AI agents
OpenAI Docs (Agents + tool calling)
LangChain / LangGraph / LangSmith
Manus (context engineering, very practical)
- Context Engineering for AI Agents: Lessons from Building Manus
Hugging Face (open-source ecosystem radar)
Google DeepMind Blog (research trends)
Tier 1 — Podcasts / newsletters (weekly pick)
Tier 2 — Benchmarks / evals (calibration tools)
- AgentBench (LLM-as-agent benchmark)
- WebArena (web environment benchmark; great for browser/tool agents)
Chinese sources (fast + noisy, still useful)
I treat these as trend radar, not truth. The goal is “early signal,” then I verify via Tier 0.
Websites / cross-posted feeds