SteuerEx Benchmark

Submit your model predictions for evaluation | View Leaderboard

GitHub

Benchmark Information

Submission Format: JSON file with question IDs as keys

Submissions per IP: 5

Submission Format

Your JSON file should look like this:

{ "1001": "Your answer for question 1001...", "1002": "Your answer for question 1002...", ... }
Get the key from the benchmark repository
⚠ Security Notice & Full Disclaimer — Read Before Submitting

We previously used a shared API key for evaluation, but exhausted our funds due to the high number of submissions. Evaluation is therefore now performed using your own OpenAI API key.

Use a restricted, dedicated key: Create a new API key at platform.openai.com/api-keys with a spending limit set to cover this benchmark only (~$3–$10 USD per submission). The key must have access to gpt-4o (requires a paid OpenAI account with gpt-4o enabled). Never use your primary or unrestricted API key.
Key not persisted: Your key is held in server memory solely for the duration of the evaluation and is never written to disk, logs, or any database. However, no server can guarantee absolute security.
No liability for API charges: The service provider accepts no responsibility whatsoever for any OpenAI API charges incurred, whether from normal evaluation, unexpected errors, retries, or any other cause.
No liability for failed or incomplete evaluations: Evaluations may fail, time out, or produce incorrect results due to OpenAI API errors, rate limits, server issues, or any other reason. We provide no guarantee of evaluation success, accuracy, or completeness. No refund or re-evaluation is guaranteed.
No liability for security incidents: In the event of a server breach, compromise, or any other security incident, the service provider accepts no responsibility for any resulting unauthorized API usage, charges, or damages of any kind.
By submitting, you explicitly accept all of the above risks and waive any claims against the service provider, FAU Erlangen-Nürnberg, and its contributors.
0%