All Toolsโ€บGPT Efficiency Benchmark
๐Ÿ”ง GPT-5 Efficiency GainsMarch 24, 2026โœ… Tests passing

GPT Efficiency Benchmark

This CLI tool benchmarks the processing speed, memory usage, and token throughput of GPT-5 against previous GPT models. It automates testing using predefined prompts and datasets to generate detailed comparison metrics, helping developers understand efficiency gains in real-world scenarios.

What It Does

  • Benchmark GPT-5 against older models
  • Analyze token throughput, latency, and memory usage
  • Generate visual performance comparison reports

Installation

  • Python 3.8+
  • openai==0.27.0
  • psutil==5.9.5
  • matplotlib==3.7.2

Usage

1. Create a prompts.json file with prompts:

[
    "What is AI?",
    "Explain quantum physics."
]

2. Run the tool:

python gpt_efficiency_benchmark.py --models gpt-4,gpt-5 --prompts prompts.json --output report.html

3. Open report.html to view the detailed benchmark report.

Source Code

import argparse
import json
import time
import psutil
import matplotlib.pyplot as plt
from openai import ChatCompletion

def benchmark_model(model, prompts):
    """Benchmark a single GPT model with given prompts."""
    results = []
    for prompt in prompts:
        start_time = time.time()
        try:
            response = ChatCompletion.create(model=model, messages=[{"role": "user", "content": prompt}])
        except Exception as e:
            results.append({"prompt": prompt, "error": str(e)})
            continue

        end_time = time.time()
        latency = end_time - start_time
        memory = psutil.virtual_memory().used / (1024 ** 2)  # Memory in MB
        token_count = len(response.choices[0].message.content.split())

        results.append({
            "prompt": prompt,
            "latency": latency,
            "memory": memory,
            "token_count": token_count
        })
    return results

def generate_report(results, models, output_path):
    """Generate a visual performance comparison report."""
    latencies = {model: [res["latency"] for res in results[model] if "latency" in res] for model in models}
    memories = {model: [res["memory"] for res in results[model] if "memory" in res] for model in models}
    token_counts = {model: [res["token_count"] for res in results[model] if "token_count" in res] for model in models}

    plt.figure(figsize=(10, 6))
    for model in models:
        plt.plot(latencies[model], label=f"{model} Latency")
    plt.xlabel("Prompt Index")
    plt.ylabel("Latency (s)")
    plt.title("Latency Comparison")
    plt.legend()
    plt.savefig(output_path.replace(".html", "_latency.png"))

    plt.figure(figsize=(10, 6))
    for model in models:
        plt.plot(memories[model], label=f"{model} Memory Usage")
    plt.xlabel("Prompt Index")
    plt.ylabel("Memory (MB)")
    plt.title("Memory Usage Comparison")
    plt.legend()
    plt.savefig(output_path.replace(".html", "_memory.png"))

    plt.figure(figsize=(10, 6))
    for model in models:
        plt.plot(token_counts[model], label=f"{model} Token Throughput")
    plt.xlabel("Prompt Index")
    plt.ylabel("Tokens")
    plt.title("Token Throughput Comparison")
    plt.legend()
    plt.savefig(output_path.replace(".html", "_tokens.png"))

    with open(output_path, "w") as f:
        f.write(f"<html><body><h1>GPT Efficiency Benchmark Report</h1>")
        f.write(f"<h2>Latency Comparison</h2><img src='{output_path.replace('.html', '_latency.png')}'><br>")
        f.write(f"<h2>Memory Usage Comparison</h2><img src='{output_path.replace('.html', '_memory.png')}'><br>")
        f.write(f"<h2>Token Throughput Comparison</h2><img src='{output_path.replace('.html', '_tokens.png')}'><br>")
        f.write("</body></html>")

def main():
    parser = argparse.ArgumentParser(description="GPT Efficiency Benchmark")
    parser.add_argument("--models", required=True, help="Comma-separated list of models to benchmark")
    parser.add_argument("--prompts", required=True, help="Path to JSON file containing prompts")
    parser.add_argument("--output", required=True, help="Path to output HTML report")
    args = parser.parse_args()

    models = args.models.split(",")
    try:
        with open(args.prompts, "r") as f:
            prompts = json.load(f)
    except Exception as e:
        print(f"Error reading prompts file: {e}")
        return

    results = {}
    for model in models:
        results[model] = benchmark_model(model, prompts)

    generate_report(results, models, args.output)

if __name__ == "__main__":
    main()

Community

Downloads

ยทยทยท

Rate this tool

No ratings yet โ€” be the first!

Details

Tool Name
gpt_efficiency_benchmark
Category
GPT-5 Efficiency Gains
Generated
March 24, 2026
Tests
Passing โœ…

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-03-24/gpt_efficiency_benchmark
cd generated_tools/2026-03-24/gpt_efficiency_benchmark
pip install -r requirements.txt 2>/dev/null || true
python gpt_efficiency_benchmark.py
GPT Efficiency Benchmark โ€” AI Tools by AutoAIForge