eagle投机解码轻量实践
代码只实现了精华部分,其余都是函数forward等部分都是mock的,run_experiment只是打印观察一下正确性eagle""" EAGLE-1: Extrapolation Algorithm for Greater Language-model Efficiency. Simulates a target LLM + lightweight draft head for speculative decoding. The draft head predicts second-to-top-layer features autoregressively, then the target model verifies all draft tokens in one forward pass. Reference: https://arxiv.org/abs/2401.15077 """from__future__importannotationsimporttimefromdataclassesimportdataclass

相关新闻