evaluation / outputs

Commit History

add complete theoremqa output for gpt-4o
841a948

ryanhoangt commited on

add complete humaneval output for gpt-4o
45710d9

ryanhoangt commited on

add complete mmlu output for gpt-4o
0948b4d

ryanhoangt commited on

add complete math output for gpt-4o
7d377c3

ryanhoangt commited on

add some outputs
da7aaba

ryanhoangt commited on

update results
fe6c7e5

Xingyao Wang commited on

add results for deepseek chat v2
126490f

Xingyao Wang commited on

add codeact swe agent
9b33edf

Xingyao Wang commited on

add gpt4o result for 1.5
5dbfa12

Xingyao Wang commited on

move data to swe_bench_lite
23df10d

Xingyao Wang commited on

rename dir
0d2d477

Xingyao Wang commited on

add result for deepseek
f07fb3e

Xingyao Wang commited on

add results for gpt-4o
72c2e93

Xingyao Wang commited on

updare resykts
cd893a5

Xingyao Wang commited on

support multi-page
4e9c2f0

Xingyao Wang commited on

remove all logs
3f290ce

Xingyao Wang commited on

initial results
2e05a39

Xingyao Wang commited on