๐ฌ Local LLM Inference EnginesJune 4, 2026โ
Tests passing
LLM Local Orchestrator
A CLI and library tool designed to simplify the management and inference of local LLMs, providing a consistent interface for running models with TensorSharp or PyTorch backends. It abstracts common setup tasks like hardware optimization, model loading, and tokenization, making it easier for developers to deploy models locally for privacy and offline use.
What It Does
- Load local LLM models and tokenizers.
- Run inference on input text using the loaded models.
- Support for PyTorch backend.
- CLI interface for easy usage.
Installation
To install the required dependencies, run:
pip install torch transformers clickTo install the testing dependencies, run:
pip install pytestUsage
CLI Usage
Run the CLI tool with the following command:
python llm_local_orchestrator.py --model-path <path_to_model> --input "<input_text>" --device <device> --max-length <max_length>--model-path: Path to the local model.--input: Input text for the model.--device: Device to run the model on (e.g.,cudaorcpu). Defaults tocudaif available, otherwisecpu.--max-length: Maximum length for generated text. Defaults to 128.
Example
python llm_local_orchestrator.py --model-path ./gpt2 --input "Hello, world!" --device cpu --max-length 50Library Usage
You can also use the tool as a library in your Python code:
from llm_local_orchestrator import load_model, run_inference
model_path = "./gpt2"
device = "cpu"
input_text = "Hello, world!"
max_length = 50
model, tokenizer = load_model(model_path, device)
result = run_inference(model, tokenizer, input_text, device, max_length)
print(result)Source Code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import click
def load_model(model_path, device):
"""Load the model and tokenizer."""
try:
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
model.to(device)
return model, tokenizer
except Exception as e:
raise RuntimeError(f"Failed to load model: {e}")
def run_inference(model, tokenizer, input_text, device, max_length):
"""Run inference on the input text."""
try:
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
inputs = {key: value.to(device) for key, value in inputs.items()}
with torch.no_grad():
outputs = model.generate(**inputs, max_length=max_length)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
except Exception as e:
raise RuntimeError(f"Failed during inference: {e}")
@click.command()
@click.option('--model-path', required=True, type=click.Path(exists=True), help='Path to the local model.')
@click.option('--input', required=True, type=str, help='Input text for the model.')
@click.option('--device', default='cuda' if torch.cuda.is_available() else 'cpu', type=str, help='Device to run the model on (e.g., cuda or cpu).')
@click.option('--max-length', default=128, type=int, help='Maximum length for generated text.')
def main(model_path, input, device, max_length):
"""Main CLI entry point."""
try:
model, tokenizer = load_model(model_path, device)
result = run_inference(model, tokenizer, input, device, max_length)
click.echo(result)
except RuntimeError as e:
click.echo(f"Error: {e}", err=True)
if __name__ == '__main__':
main()
Community
Downloads
ยทยทยท
Rate this tool
No ratings yet โ be the first!
Details
- Tool Name
- llm_local_orchestrator
- Category
- Local LLM Inference Engines
- Generated
- June 4, 2026
- Tests
- Passing โ
- Fix Loops
- 4
Quick Install
Clone just this tool:
git clone --depth 1 --filter=blob:none --sparse \ https://github.com/ptulin/autoaiforge.git cd autoaiforge git sparse-checkout set generated_tools/2026-06-04/llm_local_orchestrator cd generated_tools/2026-06-04/llm_local_orchestrator pip install -r requirements.txt 2>/dev/null || true python llm_local_orchestrator.py