【Mistral AI Evals:一个用于运行Mistral AI发布的评估以及为流行学术基准测试提供标准化提示、解析和度量计算的代码库,支持多轮LLM-as-a-judge评估任务】'Mistral Evals - This repository contains code to run evals released by Mistral AI as well as standardized prompts, parsing and metrics computation for popular academic benchmarks.' GitHub: github.com/mistralai/mistral-evals #AI评估# #学术基准测试# #代码库#