💬 Local LLM Inference EnginesJune 4, 2026✅ Tests passing

LLM Local Orchestrator

A CLI and library tool designed to simplify the management and inference of local LLMs, providing a consistent interface for running models with TensorSharp or PyTorch backends. It abstracts common setup tasks like hardware optimization, model loading, and tokenization, making it easier for developers to deploy models locally for privacy and offline use.

View on GitHub Download ZIP

Share:X / Twitter LinkedIn Reddit Hacker News

What It Does

Load local LLM models and tokenizers.
Run inference on input text using the loaded models.
Support for PyTorch backend.
CLI interface for easy usage.

Installation

To install the required dependencies, run:

pip install torch transformers click

To install the testing dependencies, run:

pip install pytest

Usage

CLI Usage

Run the CLI tool with the following command:

python llm_local_orchestrator.py --model-path <path_to_model> --input "<input_text>" --device <device> --max-length <max_length>

--model-path: Path to the local model.
--input: Input text for the model.
--device: Device to run the model on (e.g., cuda or cpu). Defaults to cuda if available, otherwise cpu.
--max-length: Maximum length for generated text. Defaults to 128.

Example

python llm_local_orchestrator.py --model-path ./gpt2 --input "Hello, world!" --device cpu --max-length 50

Library Usage

You can also use the tool as a library in your Python code:

from llm_local_orchestrator import load_model, run_inference

model_path = "./gpt2"
device = "cpu"
input_text = "Hello, world!"
max_length = 50

model, tokenizer = load_model(model_path, device)
result = run_inference(model, tokenizer, input_text, device, max_length)
print(result)

Source Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import click

def load_model(model_path, device):
    """Load the model and tokenizer."""
    try:
        tokenizer = AutoTokenizer.from_pretrained(model_path)
        model = AutoModelForCausalLM.from_pretrained(model_path)
        model.to(device)
        return model, tokenizer
    except Exception as e:
        raise RuntimeError(f"Failed to load model: {e}")

def run_inference(model, tokenizer, input_text, device, max_length):
    """Run inference on the input text."""
    try:
        inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
        inputs = {key: value.to(device) for key, value in inputs.items()}
        with torch.no_grad():
            outputs = model.generate(**inputs, max_length=max_length)
        return tokenizer.decode(outputs[0], skip_special_tokens=True)
    except Exception as e:
        raise RuntimeError(f"Failed during inference: {e}")

@click.command()
@click.option('--model-path', required=True, type=click.Path(exists=True), help='Path to the local model.')
@click.option('--input', required=True, type=str, help='Input text for the model.')
@click.option('--device', default='cuda' if torch.cuda.is_available() else 'cpu', type=str, help='Device to run the model on (e.g., cuda or cpu).')
@click.option('--max-length', default=128, type=int, help='Maximum length for generated text.')
def main(model_path, input, device, max_length):
    """Main CLI entry point."""
    try:
        model, tokenizer = load_model(model_path, device)
        result = run_inference(model, tokenizer, input, device, max_length)
        click.echo(result)
    except RuntimeError as e:
        click.echo(f"Error: {e}", err=True)

if __name__ == '__main__':
    main()

Community

Downloads

···

Rate this tool

No ratings yet — be the first!

Details

Tool Name: llm_local_orchestrator
Category: Local LLM Inference Engines
Generated: June 4, 2026
Tests: Passing ✅
Fix Loops: 4

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-06-04/llm_local_orchestrator
cd generated_tools/2026-06-04/llm_local_orchestrator
pip install -r requirements.txt 2>/dev/null || true
python llm_local_orchestrator.py

Links

View source on GitHub Raw README.md