HELM
- A holistic framework for evaluating foundation models. (opens in a new tab)
- 10 scenarios
- Core scenarios
- NarrativeQA
- NaturalQuestions (open-book)
- NaturalQuestions (closed-book)
- OpenbookQA
- MMLU (Massive Multitask Language Understanding)
- MATH
- GSM8K (Grade School Math)
- LegalBench
- MedQA
- WMT 2014
- 72 models
- ...
- github (opens in a new tab)
- 10 scenarios