This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.
As of yesterday subagents were often getting the entire session copied to them. Happened to me when 2 turns with Claude spawned a subagent, caused 2 compactions, and burned 15% of my 5-hour limit (Max 5x).
how long they stay around after the cache miss is irrelevant if I am burning all the prior tokens again. also, how much context they have depends entirely on the task and your workflow. I you have a subagent implement a feature and use the compile + test loop to ensure it is implemented correctly before a supervisor agent reviews what was implemented vs asked then yes, subagents do have a lot of context.
I'd say it's next to impossible to have a subagent doing a compile+test loop where at least 1 call doesn't get made to the API over multiple 5-minute stretches to keep the cache warm. In such a case it may just be the same as doing the compile+test manually and then having the agent troubleshoot any issues before iterating.
but how to make claude-code send that when paying by API-key?
or when using a custom ANTHROPIC_BASE_URL? (requests will contain cache_control, but no ttl!)
This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.