谷歌研究指出,单次人工翻译评估容易产生“噪音”,影响模型间的质量对比。为此,团队在MQM框架中引入二次标注环节,即由另一评估员复核已有标注。实验表明,该方法能显著提升评分一致性与可靠性,尤其在人机协作流程中可平衡质量与成本。研究同时提醒,需防范评估者过度依赖初次标注,专家监督仍不可或缺。
"Supported by a growing ecosystem of open-source models, large model technology, just like water and electricity and the Internet, is increasingly becoming a convenient and widely accessible basic ...
In early 2025, the Chinese artificial intelligence (AI) start-up DeepSeek grabbed headlines abroad with models promising high ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果