My LLM & Agent Signal Stack - Eric Wu | Chengxiang Wu - Full Stack Developer & Data Scientist

My LLM & Agent Signal Stack

Sat Jan 17 2026

I’ve been consuming a lot of LLM/agent content lately. Some of it is genuinely useful; most of it is noise.

This page is my attempt to keep a practical signal stack — sources + a mental model that help me build agents that hold up past the demo.

Current staples I already follow:

Xiaohongshu “update” feeds (fast trend radar)

Latent Space Podcast

https://www.latent.space/podcast

Anthropic blog (Newsroom / updates)

https://www.anthropic.com/news

LangChain blog

https://blog.langchain.com/

What I’m doing now is turning that into a structured system.

The mental model (inspired by Anthropic’s “Building Effective Agents”)

The main framing I stole from Anthropic is simple:

Start with the simplest workflow that can solve the problem, then progressively “agentify” only when complexity demands it.

Very SDE-friendly: get something observable, testable, and debuggable before you crank up autonomy.

I bucket agent building into three layers:

1) Primitives (components)

These are the parts you’ll reuse across systems.

Tools / action space

Tool calling is table stakes. The real work is discovery, permissions, schema discipline, and keeping context from exploding when tool count grows.

Environment / state

Agents don’t live in chat logs; they live in an environment — browser, file system, UI state, terminal, DB.

Memory / context engineering

“Memory” isn’t a checkbox. It’s design choices:

what to store (and what not to)
when to write
how to retrieve
how to compress without breaking correctness

2) Patterns (composable architectures)

Instead of jumping straight to a “general agent,” I think in patterns that compose cleanly:

Prompt chaining / Routing / Parallelization

Orchestrator–Workers

Evaluator–Optimizer (review → improve → re-evaluate loops)

3) Harness & evals (reliability)

This is the difference between a demo and something you can ship:

traceability

failure mode taxonomy

safe retries

tool error handling

drift control over long runs

If I can’t measure reliability, I don’t trust it.

My source list (ranked by “signal per minute”)

Rule of thumb:

Tier 0: primary sources — subscribe and skim regularly

Tier 1: translators — podcasts/newsletters that turn research into engineering lessons

Tier 2: benchmarks/evals — calibration tools so I don’t fool myself

Tier 0 — Primary sources (must-follow)

Anthropic (Research / Engineering)

Building Effective AI Agents

https://www.anthropic.com/research/building-effective-agents

Demystifying evals for AI agents

https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents

OpenAI Docs (Agents + tool calling)

Function calling

https://platform.openai.com/docs/guides/function-calling

Agents SDK

https://platform.openai.com/docs/guides/agents-sdk

Agents Python docs

https://openai.github.io/openai-agents-python/

LangChain / LangGraph / LangSmith

Deep Agents

https://github.com/langchain-ai/deepagents

LangGraph

https://blog.langchain.com/langgraph/

LangSmith Agent Builder

https://blog.langchain.com/langsmith-agent-builder/

Manus (context engineering, very practical)

Context Engineering for AI Agents: Lessons from Building Manus

https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus

Hugging Face (open-source ecosystem radar)

smolagents

https://huggingface.co/blog/smolagents

Agents Course

https://huggingface.co/learn/agents-course/en/unit1/tutorial

Google DeepMind Blog (research trends)

DeepMind Blog

https://deepmind.google/blog/

Tier 1 — Podcasts / newsletters (weekly pick)

Latent Space Podcast

https://www.latent.space/podcast

High Agency (Humanloop)

https://humanloop.com/podcast

The Cognitive Revolution

https://cognitiverevolution.substack.com/

The Gradient Podcast

https://thegradientpub.substack.com/s/podcast

TWIML AI Podcast

https://twimlai.com/podcast/twimlai/

Import AI (Jack Clark)

https://importai.substack.com/

Tier 2 — Benchmarks / evals (calibration tools)

AgentBench (LLM-as-agent benchmark)

https://arxiv.org/abs/2308.03688

WebArena (web environment benchmark; great for browser/tool agents)

https://arxiv.org/abs/2307.13854

Chinese sources (fast + noisy, still useful)

I treat these as trend radar, not truth. The goal is “early signal,” then I verify via Tier 0.

Websites / cross-posted feeds

QbitAI (量子位)

https://www.qbitai.com/

Jiqizhixin (机器之心)

https://www.jiqizhixin.com/

WeChat Official Accounts I actually keep an eye on

These are the ones that consistently surface papers, product updates, and industry moves quickly:

量子位 / QbitAI（公众号：QbitAI）

新智元（often described as “智能+中国”; sometimes people shorthand it as “new intelligence era”）

机器之心（公众号：almosthuman2014）

PaperWeekly（偏论文解读/研究动态）

DeepTech 深科技（偏“深科技”与产业侧）

机器学习研究会（更偏学术/研究社区）

Xiaohongshu (my filter keywords)

MCP, LangGraph, agent eval, context engineering, tool calling, memory

YouTube (implementation > hype)

AI Engineer

https://www.youtube.com/@aiDotEngineer

Latent Space (video)

https://www.youtube.com/@LatentSpacePod

Anthropic

https://www.youtube.com/@anthropic-ai

LangChain

https://www.youtube.com/@LangChain

Hugging Face

https://www.youtube.com/@huggingface

X (Twitter) — “early signal” accounts

Orgs

Anthropic

https://x.com/AnthropicAI

LangChain

https://x.com/LangChainAI

Hugging Face

https://x.com/huggingface

Google DeepMind

https://x.com/GoogleDeepMind

People

swyx

https://x.com/swyx

Harrison Chase

https://x.com/hwchase17

Simon Willison

https://x.com/simonw

Jack Clark

https://x.com/jackclarkSF

Hands-on anchors (so I don’t stay theoretical)

1) nanochat — a minimal end-to-end ChatGPT-style stack

When I feel like I’m consuming too much and building too little, I go back to one repo and follow the plumbing.

Repo

https://github.com/karpathy/nanochat

Karpathy

https://x.com/karpathy

My notes template:

How is data + tokenization handled?

What are the critical engineering points in the training loop?

What evals exist, and what’s the minimum viable eval?

What does the inference/serving loop look like end-to-end?

2) Memory-first agents (MemGPT / Letta)

I’m increasingly convinced memory is a real separator for long-running agents — not because it’s fancy, but because it reduces repetition and improves continuity.

What I care about:

memory tiers (working vs long-term vs externalized)

write policies (when to commit memory)

retrieval policies (what to pull back, and when)

compression without breaking correctness

I keep this linked to a separate page where I run small experiments (same task over multiple days; measure steps/retries/tool errors).

My low-friction weekly cadence

Daily (10 min): skim Tier 0 (Anthropic / OpenAI / LangChain / Manus)

Weekly (1 hr): 1–2 podcasts (Latent Space or High Agency)

Weekly (30 min): 1 eval/benchmark paper or an eval-focused post

Biweekly: write a short “what I learned + how I’ll apply it” note

Personal reminder (so I don’t drift)

Don’t chase new buzzwords — chase new failure modes and how people fix them

Prefer postmortems and production stories over “top 10 frameworks” lists

If I can’t measure reliability, I’m not done building