POCs
Eval
Poc 1

Eval POC #1

  1. Create own evaluation for model for SAFe questions. Create a dataset of Q&A. (Base, and for Supervised).

  2. Evaluate model against it. Get the result. (Is it reproducible? Verifiable? Consistent?).

  3. Does evaluation show the delta of result after fine-tuning for that missing model knowledge? How can I see it from the evaluation? How can I track it? Think of unit tests for LMs. Can I ensure an LM gives correct results for certain entities? Do I write a set of scenarios (e.g. 100 unit tests) to ensure each entity I'm interested in is covered?

    • How do I test each entity? And what exactly? What is an LM capability?