This repository contains evaluation infrastructure for FormalBench including evaluation metrics and wrappers for calling LLMs. If you found this repository to be useful, please cite our research paper ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results