OpenAI Evals
Evaluation frameworks (OpenAI Evals) and how they fit Regent validation/reputation
OpenAI Evals is a framework and benchmark registry for evaluating LLMs and LLM systems.
Live vs planned
Planned: referenced in
layers.mdas an evaluation framework that can feed Regent validation/reputation.
How it connects to Regent
standardized evaluation runs
publishing results into validation registries
comparing agents on consistent suites
External references
Last updated