๐ฌ Local LLM Inference EnginesJune 4, 2026โ
Tests passing
Token Streamer
A CLI and module tool that streams token-by-token completions from a locally hosted LLM, providing real-time feedback for interactive applications. It allows developers to create dynamic and responsive systems while keeping all processing offline for enhanced privacy.
What It Does
- Streams tokens generated by a locally hosted LLM in real-time.
- Supports adjustable streaming speed.
- Optionally saves the generated output to a file.
- Provides a command-line interface (CLI) for ease of use.
Installation
To use Token Streamer, you need to install the required dependencies. You can do this using pip:
pip install transformers rich pytestUsage
CLI Usage
Run the tool from the command line with the following options:
python token_streamer.py --model-path <model_path> --input <input_prompt> [--stream-speed <seconds>] [--output-file <file_path>]--model-path: Path to the locally hosted model.--input: Input prompt for the model.--stream-speed: (Optional) Delay in seconds between streaming tokens. Default is 0.5 seconds.--output-file: (Optional) File path to save the output.
Example
python token_streamer.py --model-path ./models/gpt2 --input "Once upon a time" --stream-speed 0.2 --output-file output.txtSource Code
import argparse
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
from rich.console import Console
def stream_tokens(model_path, input_prompt, stream_speed, output_file=None):
"""
Streams tokens generated by a language model in real-time.
Args:
model_path (str): Path to the locally hosted model.
input_prompt (str): Input prompt for the model.
stream_speed (float): Delay in seconds between streaming tokens.
output_file (str, optional): File path to save the output. Defaults to None.
Returns:
None
"""
console = Console()
try:
console.print("[bold green]Loading model and tokenizer...[/bold green]")
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
console.print("[bold green]Generating tokens...[/bold green]")
input_ids = tokenizer.encode(input_prompt, return_tensors="pt")
output_ids = model.generate(input_ids, max_new_tokens=50, do_sample=True)
generated_tokens = tokenizer.decode(output_ids[0], skip_special_tokens=True)
generated_text = ""
for token in generated_tokens[len(input_prompt):]:
generated_text += token
console.print(token, end="", style="bold blue")
time.sleep(stream_speed)
console.print("\n[bold green]Streaming complete.[/bold green]")
if output_file:
with open(output_file, "w") as f:
f.write(generated_text)
except Exception as e:
console.print(f"[bold red]Error: {e}[/bold red]")
raise
def main():
parser = argparse.ArgumentParser(description="Token Streamer: Stream token-by-token completions from a locally hosted LLM.")
parser.add_argument("--model-path", required=True, help="Path to the locally hosted model.")
parser.add_argument("--input", required=True, help="Input prompt for the model.")
parser.add_argument("--stream-speed", type=float, default=0.5, help="Delay in seconds between streaming tokens.")
parser.add_argument("--output-file", help="Optional file path to save the output.")
args = parser.parse_args()
stream_tokens(
model_path=args.model_path,
input_prompt=args.input,
stream_speed=args.stream_speed,
output_file=args.output_file
)
if __name__ == "__main__":
main()Community
Downloads
ยทยทยท
Rate this tool
No ratings yet โ be the first!
Details
- Tool Name
- token_streamer
- Category
- Local LLM Inference Engines
- Generated
- June 4, 2026
- Tests
- Passing โ
- Fix Loops
- 2
Quick Install
Clone just this tool:
git clone --depth 1 --filter=blob:none --sparse \ https://github.com/ptulin/autoaiforge.git cd autoaiforge git sparse-checkout set generated_tools/2026-06-04/token_streamer cd generated_tools/2026-06-04/token_streamer pip install -r requirements.txt 2>/dev/null || true python token_streamer.py