Gemini 2.5新功能：“隐性缓存”节省高达75%成本

Google最近推出了一项名为“隐性缓存”（implicit caching）的新功能，为开发者提供更为便利和低成本的模型，并无需再额外设置。这项功能适用于Gemini API，传递的“重复上下文”时节省高达75%的成本。支持Gemini 2.5 Pro和2.5 Flash模型，为面临成本压力的开发者带来好的工具。

与过去的“显性缓存”（explicit caching）相比，隐性缓存无需开发者手动定义常用提示，省去繁琐设置并避免意外的API费用。隐性缓存在Gemini 2.5模型中自动激活，当请求触发缓存时，并达到更高节省成本的效果。

根据Google开发者文件，隐性缓存的触发门槛为：2.5 Flash模型需至少1,024个token，2.5 Pro模型则需2,048个token，门槛相对较低。Google建议开发者在请求开头放置重复上下文，变化内容置于末尾，以提高缓存命中率。

We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢
We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!
— Logan Kilpatrick (@OfficialLoganK) May 8, 2025

尽管Google对隐性缓存信心十足，但该功能尚未经开发者验证，早期用户反馈将至关重要。在AI技术竞争激烈的市场中，这项创新可能为Google赢得更多开发者的青睐。

（首图来源：Google）