๐ฌ LLM Esoteric Code BenchmarksMarch 20, 2026โ
Tests passing
LLM Code Eval
An automation tool to evaluate code generated by LLMs for esoteric programming tasks. The tool runs the generated code using interpreters for specific esoteric languages, checks for correctness, and logs detailed results. It is useful for validating generated programs and profiling LLM performance.
What It Does
- Executes code in esoteric programming languages using specified interpreters.
- Compares the output of the code against expected results.
- Logs detailed results in JSON format.
Installation
This tool requires Python 3.7 or higher. No additional Python packages are required.
Usage
Run the tool from the command line:
python llm_code_eval.py --code <path_to_code_file> --interpreter <path_to_interpreter> --expected-output <path_to_expected_output_file> [--log <path_to_log_file>]Arguments
--code: Path to the generated code file.--interpreter: Path to the esoteric language interpreter.--expected-output: Path to the file containing the expected output.--log: (Optional) Path to save the execution log as a JSON file.
Source Code
import argparse
import subprocess
import json
import os
def run_code_with_interpreter(code_path, interpreter_path, expected_output_path):
"""
Runs the given code file using the specified interpreter and validates the output.
Args:
code_path (str): Path to the code file.
interpreter_path (str): Path to the esoteric language interpreter.
expected_output_path (str): Path to the file containing the expected output.
Returns:
dict: A dictionary containing execution success, output, and error details.
"""
if not os.path.exists(code_path):
return {"success": False, "error": "Code file not found."}
if not os.path.exists(interpreter_path):
return {"success": False, "error": "Interpreter not found."}
if not os.path.exists(expected_output_path):
return {"success": False, "error": "Expected output file not found."}
try:
with open(expected_output_path, "r") as f:
expected_output = f.read().strip()
result = subprocess.run(
[interpreter_path, code_path],
capture_output=True,
text=True,
timeout=10
)
output = result.stdout.strip()
error = result.stderr.strip()
success = result.returncode == 0 and output == expected_output
return {
"success": success,
"output": output,
"error": error,
"expected_output": expected_output
}
except subprocess.TimeoutExpired:
return {"success": False, "error": "Execution timed out.", "output": "", "expected_output": expected_output}
except Exception as e:
return {"success": False, "error": str(e), "output": "", "expected_output": ""}
def main():
parser = argparse.ArgumentParser(description="LLM Code Eval: Evaluate code generated by LLMs for esoteric programming tasks.")
parser.add_argument("--code", required=True, help="Path to the generated code file.")
parser.add_argument("--interpreter", required=True, help="Path to the esoteric language interpreter.")
parser.add_argument("--expected-output", required=True, help="Path to the file containing the expected output.")
parser.add_argument("--log", required=False, help="Path to save the execution log as JSON.")
args = parser.parse_args()
result = run_code_with_interpreter(args.code, args.interpreter, args.expected_output)
print(json.dumps(result, indent=4))
if args.log:
try:
with open(args.log, "w") as log_file:
json.dump(result, log_file, indent=4)
except Exception as e:
print(f"Failed to write log file: {e}")
if __name__ == "__main__":
main()
Community
Downloads
ยทยทยท
Rate this tool
No ratings yet โ be the first!
Details
- Tool Name
- llm_code_eval
- Category
- LLM Esoteric Code Benchmarks
- Generated
- March 20, 2026
- Tests
- Passing โ
- Fix Loops
- 3
Quick Install
Clone just this tool:
git clone --depth 1 --filter=blob:none --sparse \ https://github.com/ptulin/autoaiforge.git cd autoaiforge git sparse-checkout set generated_tools/2026-03-20/llm_code_eval cd generated_tools/2026-03-20/llm_code_eval pip install -r requirements.txt 2>/dev/null || true python llm_code_eval.py