speculative-decoding

Name: speculative-decoding
Author: davila7

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

21.8k Stars0 Installs更新于 143 days agoMIT

Emerging Techniques Speculative Decoding Medusa Lookahead Decoding Fast Inference Draft Models Tree Attention Parallel Generation Latency Reduction Inference Optimization

About speculative-decoding

使用推测解码、Medusa 多头和前瞻解码技术加速 LLM 推理。在优化推理速度（1.5-3.6× 加速）、减少实时应用的延迟或在计算能力有限的情况下部署模型时使用。涵盖草稿模型、基于树的注意力、Jacobi 迭代、并行令牌生成和生产部署策略。

Category: developer (开发工具) · Author: davila7 · Version: @main · License: MIT

Tags: Emerging Techniques, Speculative Decoding, Medusa, Lookahead Decoding, Fast Inference, Draft Models, Tree Attention, Parallel Generation, Latency Reduction, Inference Optimization

该 Skill 暂无文档文件。

安装指令

npx skills add davila7/speculative-decoding

下载解压包

下载 skill.zip

下载完整 Skill 目录，包含 SKILL.md 及所有相关文件

信息

Authordavila7

Categorydeveloper

Version@main

Last Updated143 days ago

LicenseMIT

在 GitHub 中查看

Related Skills

local-places

Search for places (restaurants, cafes, etc.) via Google Places API proxy on localhost.

246,840·openclaw

github

Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries.

246,840·openclaw

skill-creator

Create or update AgentSkills. Use when designing, structuring, or packaging skills with scripts, references, and assets.

246,840·openclaw

voice-call

Start voice calls via the OpenClaw voice-call plugin.

246,840·openclaw

notion

Notion API for creating and managing pages, databases, and blocks.

246,840·openclaw

gemini

Gemini CLI for one-shot Q&A, summaries, and generation.

246,840·openclaw

Category:developer

Tags:Emerging Techniques, Speculative Decoding, Medusa, Lookahead Decoding, Fast Inference, Draft Models, Tree Attention, Parallel Generation, Latency Reduction, Inference Optimization

主站 Developer Toolsspeculative-decoding

speculative-decoding

21.8k Stars0 Installs更新于 143 days agoMIT

Emerging Techniques Speculative Decoding Medusa Lookahead Decoding Fast Inference Draft Models Tree Attention Parallel Generation Latency Reduction Inference Optimization

About speculative-decoding

Category: developer (开发工具) · Author: davila7 · Version: @main · License: MIT

Tags: Emerging Techniques, Speculative Decoding, Medusa, Lookahead Decoding, Fast Inference, Draft Models, Tree Attention, Parallel Generation, Latency Reduction, Inference Optimization

该 Skill 暂无文档文件。

安装指令

npx skills add davila7/speculative-decoding

下载解压包

下载 skill.zip

下载完整 Skill 目录，包含 SKILL.md 及所有相关文件

信息

Authordavila7

Categorydeveloper

Version@main

Last Updated143 days ago

LicenseMIT

在 GitHub 中查看

Category:developer

Tags:Emerging Techniques, Speculative Decoding, Medusa, Lookahead Decoding, Fast Inference, Draft Models, Tree Attention, Parallel Generation, Latency Reduction, Inference Optimization