微软开源三档Harrier文本嵌入模型,27B版登上多语言MTEB v2榜首

BlockBeatNews

据 1M AI News 监测,微软在 Hugging Face 开源多语言文本嵌入模型家族 harrier-oss-v1,包含 270M、0.6B 和 27B 三档。模型卡显示,这一系列采用 decoder-only 架构、last-token pooling 和 L2 归一化,最长支持 32768 token,可用于检索、聚类、语义相似度、分类、双语挖掘和重排序。

Multilingual MTEB v2 是业内常用的多语言文本嵌入基准,主要测试检索、分类、聚类和语义相似度等任务。微软模型卡称,三档模型在该基准上的分数分别为 66.5、69.0 和 74.3,其中 27B 版在发布当日登上榜首。270M 和 0.6B 版本还额外使用更大嵌入模型进行知识蒸馏,三款模型均以 MIT 许可证发布。

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Commento
0/400
Nessun commento