๐ฌ LLM inference optimizationJune 9, 2026โ
Tests passing
Dynamic LLM Router
This tool routes incoming requests to different LLMs based on resource availability and input size, enabling efficient utilization of compute resources. It helps in scenarios where multiple models or devices are available and load balancing is critical.
What It Does
- Dynamically checks the availability of compute resources (CPU and GPU).
- Routes requests to the most suitable model and device based on resource availability.
- Supports multiple LLMs and devices.
Installation
Install the required dependencies using pip:
pip install transformers torch psutil pytestUsage
Run the tool from the command line:
python dynamic_llm_router.py --input "Your input text here" --models "gpt2,EleutherAI/gpt-neo-125M" --devices "cuda,cpu"Arguments
--input: The input text to be processed by the LLM.--models: A comma-separated list of model names (e.g.,gpt2,EleutherAI/gpt-neo-125M).--devices: A comma-separated list of devices to use (e.g.,cuda,cpu).
Example
python dynamic_llm_router.py --input "What is the capital of France?" --models "gpt2" --devices "cpu"Source Code
import argparse
import json
import logging
import psutil
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
def setup_logging():
logging.basicConfig(
format='%(asctime)s - %(levelname)s - %(message)s',
level=logging.INFO
)
def get_device_availability(devices):
"""Check the availability of devices."""
available_devices = {}
for device in devices:
if device == 'cuda' and torch.cuda.is_available():
available_devices['cuda'] = torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)
elif device == 'cpu':
available_devices['cpu'] = psutil.virtual_memory().available
return available_devices
def load_model(model_name, device):
"""Load the specified model and tokenizer on the given device."""
try:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
if device == 'cuda' and torch.cuda.is_available():
model = model.to('cuda')
return model, tokenizer
except Exception as e:
logging.error(f"Error loading model {model_name} on {device}: {e}")
raise
def route_request(input_text, models, devices):
"""Route the request to the optimal model and device."""
available_devices = get_device_availability(devices)
if not available_devices:
raise RuntimeError("No available devices.")
# Sort devices by available memory (descending)
sorted_devices = sorted(available_devices.items(), key=lambda x: x[1], reverse=True)
for device, _ in sorted_devices:
for model_name in models:
try:
model, tokenizer = load_model(model_name, device)
inputs = tokenizer(input_text, return_tensors="pt")
if device == 'cuda':
inputs = {key: value.to('cuda') for key, value in inputs.items()}
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return {
"model": model_name,
"device": device,
"response": response
}
except Exception as e:
logging.warning(f"Failed to process with model {model_name} on {device}: {e}")
raise RuntimeError("Failed to process the request with all available models and devices.")
def main():
parser = argparse.ArgumentParser(description="Dynamic LLM Router")
parser.add_argument('--input', type=str, required=True, help="Input text for the LLM.")
parser.add_argument('--models', type=str, required=True, help="Comma-separated list of model names.")
parser.add_argument('--devices', type=str, required=True, help="Comma-separated list of devices (e.g., cuda,cpu).")
args = parser.parse_args()
input_text = args.input
models = args.models.split(',')
devices = args.devices.split(',')
try:
result = route_request(input_text, models, devices)
print(json.dumps(result, indent=2))
except Exception as e:
logging.error(f"Error: {e}")
print(json.dumps({"error": str(e)}))
if __name__ == "__main__":
setup_logging()
main()Community
Downloads
ยทยทยท
Rate this tool
No ratings yet โ be the first!
Details
- Tool Name
- dynamic_llm_router
- Category
- LLM inference optimization
- Generated
- June 9, 2026
- Tests
- Passing โ
- Fix Loops
- 2
Quick Install
Clone just this tool:
git clone --depth 1 --filter=blob:none --sparse \ https://github.com/ptulin/autoaiforge.git cd autoaiforge git sparse-checkout set generated_tools/2026-06-09/dynamic_llm_router cd generated_tools/2026-06-09/dynamic_llm_router pip install -r requirements.txt 2>/dev/null || true python dynamic_llm_router.py