check out rtk that does this for a bunch of commands

aeonfox · 2026-03-26T04:04:59 1774497899

Do the larger LLM platforms just do this for you? Or perhaps they do this behinds the scenes, and charge you for the same amount of tokens?

fielding · 2026-03-26T07:06:18 1774508778

The tokens still land in the context window either way. Prompt caching gives you a discount on repeated input, but only for stable prefixes like system prompts. Git output changes every call, so it's always uncached, always full price. Nit reduces what goes into the window in the first place.

aeonfox · 2026-03-27T09:00:24 1774602024

I was thinking more if you write a prompt into an IDE that has first-party integration with an LLM platform (e.g. VS Code with Github Copilot), it would make sense on their end to reduce and remove redundant input before ingesting the token into their models, just to increase throughput (increase customers) and decrease latency (reduce costs). They would be foolish not to do this kind of optimisation, so surely they must be doing it. Whether they would pass on those token savings to the user, I couldn't say.

spullara · 2026-03-26T05:59:45 1774504785

no because tool calls are all client side generally. unless you mean using a remote environment where Claude Code is running separately but usually those aren't being charged by the token.

fielding · 2026-03-26T07:04:16 1774508656

this is awesome! thanks for sharing rtk.. going to check it out.