💬 Local LLM DeploymentJune 12, 2026✅ Tests passing

LLM Edge Deployer

LLM Edge Deployer is a Python library and CLI tool designed to streamline the process of deploying optimized LLMs on edge hardware. It provides utilities to convert models to hardware-efficient formats like ONNX, export them, and run compatibility checks for edge accelerators such as NVIDIA TensorRT or Intel OpenVINO.

View on GitHub Download ZIP

Share:X / Twitter LinkedIn Reddit Hacker News

What It Does

Convert models to ONNX format.
Check compatibility with edge devices (e.g., NVIDIA TensorRT, OpenVINO).
Run test inference on ONNX models with sample input data.

Installation

Install the required dependencies using pip:

pip install onnx onnxruntime optimum numpy pytest

Usage

CLI

Run the tool from the command line:

python llm_edge_deployer.py --input_model <path_to_model> \
                            --target_device <device_type> \
                            --test_sample <path_to_test_sample> \
                            --output_model <path_to_output_model>

--input_model: Path to the optimized model file.
--target_device: Target edge device type (e.g., tensorrt, openvino).
--test_sample: Path to the test sample JSON file.
--output_model: (Optional) Path to save the converted ONNX model. Default is output_model.onnx.

Library

You can also use the tool as a Python library:

from llm_edge_deployer import convert_to_onnx, check_device_compatibility, run_test_inference

# Convert model to ONNX
output_model_path = convert_to_onnx("path/to/input_model", "path/to/output_model.onnx")

# Check device compatibility
check_device_compatibility("tensorrt")

# Run test inference
result = run_test_inference("path/to/output_model.onnx", "path/to/test_sample.json")
print("Inference result:", result)

Source Code

import argparse
import os
import json
import numpy as np
import onnx
import onnxruntime as ort
from optimum.onnxruntime import ORTModel

def convert_to_onnx(input_model_path, output_model_path):
    """Convert the input model to ONNX format."""
    try:
        model = ORTModel.from_pretrained(input_model_path)
        model.save_pretrained(output_model_path)
        return output_model_path
    except Exception as e:
        raise RuntimeError(f"Failed to convert model to ONNX: {e}")

def check_device_compatibility(target_device):
    """Check compatibility of the target device."""
    supported_devices = ["tensorrt", "openvino"]
    if target_device.lower() not in supported_devices:
        raise ValueError(f"Unsupported target device: {target_device}. Supported devices are: {supported_devices}")
    return True

def run_test_inference(onnx_model_path, test_sample_path):
    """Run a test inference on the ONNX model."""
    try:
        session = ort.InferenceSession(onnx_model_path)
        with open(test_sample_path, "r") as f:
            test_sample = json.load(f)
        input_name = session.get_inputs()[0].name
        input_data = np.array(test_sample, dtype=np.float32)
        result = session.run(None, {input_name: input_data})
        return [np.array(r) for r in result]  # Ensure result is a list of numpy arrays
    except Exception as e:
        raise RuntimeError(f"Failed to run test inference: {e}")

def main():
    parser = argparse.ArgumentParser(description="LLM Edge Deployer")
    parser.add_argument("--input_model", required=True, help="Path to the optimized model file.")
    parser.add_argument("--target_device", required=True, help="Target edge device type (e.g., tensorrt, openvino).")
    parser.add_argument("--test_sample", required=True, help="Path to the test sample JSON file.")
    parser.add_argument("--output_model", default="output_model.onnx", help="Path to save the converted ONNX model.")

    args = parser.parse_args()

    try:
        check_device_compatibility(args.target_device)
        print(f"Target device {args.target_device} is compatible.")

        output_model_path = convert_to_onnx(args.input_model, args.output_model)
        print(f"Model converted to ONNX format and saved at {output_model_path}.")

        inference_result = run_test_inference(output_model_path, args.test_sample)
        print(f"Test inference result: {inference_result}")
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

Community

Downloads

···

Rate this tool

No ratings yet — be the first!

Details

Tool Name: llm_edge_deployer
Category: Local LLM Deployment
Generated: June 12, 2026
Tests: Passing ✅
Fix Loops: 3

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-06-12/llm_edge_deployer
cd generated_tools/2026-06-12/llm_edge_deployer
pip install -r requirements.txt 2>/dev/null || true
python llm_edge_deployer.py

Links

View source on GitHub Raw README.md