💬 LLM Esoteric Code BenchmarksMarch 20, 2026✅ Tests passing

LLM Code Eval

An automation tool to evaluate code generated by LLMs for esoteric programming tasks. The tool runs the generated code using interpreters for specific esoteric languages, checks for correctness, and logs detailed results. It is useful for validating generated programs and profiling LLM performance.

View on GitHub Download ZIP

Share:X / Twitter LinkedIn Reddit Hacker News

What It Does

Executes code in esoteric programming languages using specified interpreters.
Compares the output of the code against expected results.
Logs detailed results in JSON format.

Installation

This tool requires Python 3.7 or higher. No additional Python packages are required.

Usage

Run the tool from the command line:

python llm_code_eval.py --code <path_to_code_file> --interpreter <path_to_interpreter> --expected-output <path_to_expected_output_file> [--log <path_to_log_file>]

Arguments

--code: Path to the generated code file.
--interpreter: Path to the esoteric language interpreter.
--expected-output: Path to the file containing the expected output.
--log: (Optional) Path to save the execution log as a JSON file.

Source Code

import argparse
import subprocess
import json
import os

def run_code_with_interpreter(code_path, interpreter_path, expected_output_path):
    """
    Runs the given code file using the specified interpreter and validates the output.

    Args:
        code_path (str): Path to the code file.
        interpreter_path (str): Path to the esoteric language interpreter.
        expected_output_path (str): Path to the file containing the expected output.

    Returns:
        dict: A dictionary containing execution success, output, and error details.
    """
    if not os.path.exists(code_path):
        return {"success": False, "error": "Code file not found."}

    if not os.path.exists(interpreter_path):
        return {"success": False, "error": "Interpreter not found."}

    if not os.path.exists(expected_output_path):
        return {"success": False, "error": "Expected output file not found."}

    try:
        with open(expected_output_path, "r") as f:
            expected_output = f.read().strip()

        result = subprocess.run(
            [interpreter_path, code_path],
            capture_output=True,
            text=True,
            timeout=10
        )

        output = result.stdout.strip()
        error = result.stderr.strip()

        success = result.returncode == 0 and output == expected_output

        return {
            "success": success,
            "output": output,
            "error": error,
            "expected_output": expected_output
        }

    except subprocess.TimeoutExpired:
        return {"success": False, "error": "Execution timed out.", "output": "", "expected_output": expected_output}

    except Exception as e:
        return {"success": False, "error": str(e), "output": "", "expected_output": ""}


def main():
    parser = argparse.ArgumentParser(description="LLM Code Eval: Evaluate code generated by LLMs for esoteric programming tasks.")
    parser.add_argument("--code", required=True, help="Path to the generated code file.")
    parser.add_argument("--interpreter", required=True, help="Path to the esoteric language interpreter.")
    parser.add_argument("--expected-output", required=True, help="Path to the file containing the expected output.")
    parser.add_argument("--log", required=False, help="Path to save the execution log as JSON.")

    args = parser.parse_args()

    result = run_code_with_interpreter(args.code, args.interpreter, args.expected_output)

    print(json.dumps(result, indent=4))

    if args.log:
        try:
            with open(args.log, "w") as log_file:
                json.dump(result, log_file, indent=4)
        except Exception as e:
            print(f"Failed to write log file: {e}")


if __name__ == "__main__":
    main()

Community

Downloads

···

Rate this tool

No ratings yet — be the first!

Details

Tool Name: llm_code_eval
Category: LLM Esoteric Code Benchmarks
Generated: March 20, 2026
Tests: Passing ✅
Fix Loops: 3

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-03-20/llm_code_eval
cd generated_tools/2026-03-20/llm_code_eval
pip install -r requirements.txt 2>/dev/null || true
python llm_code_eval.py

Links

View source on GitHub Raw README.md