华为提出业界首个长序列KV Cache静态压缩算法,论文被ICLR收录
华为AI算法团队研究并发表的大模型KV Cache压缩算法“RazorAttention”,节省70%大模型推理内存占用,其论文《RazorAttention: Efficient KV Cache Compression Through Retrieval
华为AI算法团队研究并发表的大模型KV Cache压缩算法“RazorAttention”,节省70%大模型推理内存占用,其论文《RazorAttention: Efficient KV Cache Compression Through Retrieval