All Toolsโ€บLLM Edge Deployer
๐Ÿ’ฌ Local LLM DeploymentJune 12, 2026โœ… Tests passing

LLM Edge Deployer

LLM Edge Deployer is a Python library and CLI tool designed to streamline the process of deploying optimized LLMs on edge hardware. It provides utilities to convert models to hardware-efficient formats like ONNX, export them, and run compatibility checks for edge accelerators such as NVIDIA TensorRT or Intel OpenVINO.

What It Does

  • Convert models to ONNX format.
  • Check compatibility with edge devices (e.g., NVIDIA TensorRT, OpenVINO).
  • Run test inference on ONNX models with sample input data.

Installation

Install the required dependencies using pip:

pip install onnx onnxruntime optimum numpy pytest

Usage

CLI

Run the tool from the command line:

python llm_edge_deployer.py --input_model <path_to_model> \
                            --target_device <device_type> \
                            --test_sample <path_to_test_sample> \
                            --output_model <path_to_output_model>
  • --input_model: Path to the optimized model file.
  • --target_device: Target edge device type (e.g., tensorrt, openvino).
  • --test_sample: Path to the test sample JSON file.
  • --output_model: (Optional) Path to save the converted ONNX model. Default is output_model.onnx.

Library

You can also use the tool as a Python library:

from llm_edge_deployer import convert_to_onnx, check_device_compatibility, run_test_inference

# Convert model to ONNX
output_model_path = convert_to_onnx("path/to/input_model", "path/to/output_model.onnx")

# Check device compatibility
check_device_compatibility("tensorrt")

# Run test inference
result = run_test_inference("path/to/output_model.onnx", "path/to/test_sample.json")
print("Inference result:", result)

Source Code

import argparse
import os
import json
import numpy as np
import onnx
import onnxruntime as ort
from optimum.onnxruntime import ORTModel

def convert_to_onnx(input_model_path, output_model_path):
    """Convert the input model to ONNX format."""
    try:
        model = ORTModel.from_pretrained(input_model_path)
        model.save_pretrained(output_model_path)
        return output_model_path
    except Exception as e:
        raise RuntimeError(f"Failed to convert model to ONNX: {e}")

def check_device_compatibility(target_device):
    """Check compatibility of the target device."""
    supported_devices = ["tensorrt", "openvino"]
    if target_device.lower() not in supported_devices:
        raise ValueError(f"Unsupported target device: {target_device}. Supported devices are: {supported_devices}")
    return True

def run_test_inference(onnx_model_path, test_sample_path):
    """Run a test inference on the ONNX model."""
    try:
        session = ort.InferenceSession(onnx_model_path)
        with open(test_sample_path, "r") as f:
            test_sample = json.load(f)
        input_name = session.get_inputs()[0].name
        input_data = np.array(test_sample, dtype=np.float32)
        result = session.run(None, {input_name: input_data})
        return [np.array(r) for r in result]  # Ensure result is a list of numpy arrays
    except Exception as e:
        raise RuntimeError(f"Failed to run test inference: {e}")

def main():
    parser = argparse.ArgumentParser(description="LLM Edge Deployer")
    parser.add_argument("--input_model", required=True, help="Path to the optimized model file.")
    parser.add_argument("--target_device", required=True, help="Target edge device type (e.g., tensorrt, openvino).")
    parser.add_argument("--test_sample", required=True, help="Path to the test sample JSON file.")
    parser.add_argument("--output_model", default="output_model.onnx", help="Path to save the converted ONNX model.")

    args = parser.parse_args()

    try:
        check_device_compatibility(args.target_device)
        print(f"Target device {args.target_device} is compatible.")

        output_model_path = convert_to_onnx(args.input_model, args.output_model)
        print(f"Model converted to ONNX format and saved at {output_model_path}.")

        inference_result = run_test_inference(output_model_path, args.test_sample)
        print(f"Test inference result: {inference_result}")
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

Community

Downloads

ยทยทยท

Rate this tool

No ratings yet โ€” be the first!

Details

Tool Name
llm_edge_deployer
Category
Local LLM Deployment
Generated
June 12, 2026
Tests
Passing โœ…
Fix Loops
3

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-06-12/llm_edge_deployer
cd generated_tools/2026-06-12/llm_edge_deployer
pip install -r requirements.txt 2>/dev/null || true
python llm_edge_deployer.py
LLM Edge Deployer โ€” AI Tools by AutoAIForge