Evaluation frameworks (OpenAI Evals) and how they fit Regent validation/reputation
OpenAI Evals is a framework and benchmark registry for evaluating LLMs and LLM systems.
Planned: referenced in layers.md as an evaluation framework that can feed Regent validation/reputation.
layers.md
standardized evaluation runs
publishing results into validation registries
comparing agents on consistent suites
openai/evalsarrow-up-right
Last updated 1 month ago