Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.
Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.
使用推测解码、Medusa 多头和前瞻解码技术加速 LLM 推理。在优化推理速度(1.5-3.6× 加速)、减少实时应用的延迟或在计算能力有限的情况下部署模型时使用。涵盖草稿模型、基于树的注意力、Jacobi 迭代、并行令牌生成和生产部署策略。
Category: developer (开发工具) · Author: davila7 · Version: @main · License: MIT
Tags: Emerging Techniques, Speculative Decoding, Medusa, Lookahead Decoding, Fast Inference, Draft Models, Tree Attention, Parallel Generation, Latency Reduction, Inference Optimization
该 Skill 暂无文档文件。
speculative-decoding 是由 davila7 开发的 AI Agent 技能,属于「developer」分类。 Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies. 该技能支持 Emerging Techniques、Speculative Decoding、Medusa、Lookahead Decoding、Fast Inference、Draft Models、Tree Attention、Parallel Generation、Latency Reduction、Inference Optimization 相关能力,可直接集成到兼容的 AI Agent 平台中使用。 安装后,Agent 将获得该技能定义的工具、提示词或工作流,从而在对话中自动调用相应功能。
speculative-decoding 是一个 AI Agent 技能,由 davila7 开发,归类于「developer」。安装后,它会为你的 Agent 增加新的能力,让 Agent 能够执行更丰富的任务。
点击页面右侧的安装命令复制到终端执行即可。大多数技能使用 npx skills add 命令安装,部分技能也支持手动下载 ZIP 文件。
该技能在 AgentCC 上免费提供。但部分技能可能依赖第三方 API 或服务,使用时请查看技能文档了解是否需要额外的 API Key 或付费服务。
安装成功后,技能会自动注册到你的 Agent 平台。在与 Agent 对话时,当你的需求匹配该技能的能力范围,Agent 会自动调用该技能完成任务。
每个技能的实现方式、覆盖范围和作者不同。建议对比页面底部的「相关技能推荐」中的同类选项,选择最符合你需求的技能。
Search for places (restaurants, cafes, etc.) via Google Places API proxy on localhost.
Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries.
Create or update AgentSkills. Use when designing, structuring, or packaging skills with scripts, references, and assets.
Start voice calls via the OpenClaw voice-call plugin.
Notion API for creating and managing pages, databases, and blocks.
Gemini CLI for one-shot Q&A, summaries, and generation.
Category:developer
Tags:Emerging Techniques, Speculative Decoding, Medusa, Lookahead Decoding, Fast Inference, Draft Models, Tree Attention, Parallel Generation, Latency Reduction, Inference Optimization