💬 Local LLM Inference EnginesJune 4, 2026✅ Tests passing

Token Streamer

A CLI and module tool that streams token-by-token completions from a locally hosted LLM, providing real-time feedback for interactive applications. It allows developers to create dynamic and responsive systems while keeping all processing offline for enhanced privacy.

View on GitHub Download ZIP

Share:X / Twitter LinkedIn Reddit Hacker News

What It Does

Streams tokens generated by a locally hosted LLM in real-time.
Supports adjustable streaming speed.
Optionally saves the generated output to a file.
Provides a command-line interface (CLI) for ease of use.

Installation

To use Token Streamer, you need to install the required dependencies. You can do this using pip:

pip install transformers rich pytest

Usage

CLI Usage

Run the tool from the command line with the following options:

python token_streamer.py --model-path <model_path> --input <input_prompt> [--stream-speed <seconds>] [--output-file <file_path>]

--model-path: Path to the locally hosted model.
--input: Input prompt for the model.
--stream-speed: (Optional) Delay in seconds between streaming tokens. Default is 0.5 seconds.
--output-file: (Optional) File path to save the output.

Example

python token_streamer.py --model-path ./models/gpt2 --input "Once upon a time" --stream-speed 0.2 --output-file output.txt

Source Code

import argparse
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
from rich.console import Console

def stream_tokens(model_path, input_prompt, stream_speed, output_file=None):
    """
    Streams tokens generated by a language model in real-time.

    Args:
        model_path (str): Path to the locally hosted model.
        input_prompt (str): Input prompt for the model.
        stream_speed (float): Delay in seconds between streaming tokens.
        output_file (str, optional): File path to save the output. Defaults to None.

    Returns:
        None
    """
    console = Console()

    try:
        console.print("[bold green]Loading model and tokenizer...[/bold green]")
        tokenizer = AutoTokenizer.from_pretrained(model_path)
        model = AutoModelForCausalLM.from_pretrained(model_path)

        console.print("[bold green]Generating tokens...[/bold green]")
        input_ids = tokenizer.encode(input_prompt, return_tensors="pt")
        output_ids = model.generate(input_ids, max_new_tokens=50, do_sample=True)

        generated_tokens = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        generated_text = ""

        for token in generated_tokens[len(input_prompt):]:
            generated_text += token
            console.print(token, end="", style="bold blue")
            time.sleep(stream_speed)

        console.print("\n[bold green]Streaming complete.[/bold green]")

        if output_file:
            with open(output_file, "w") as f:
                f.write(generated_text)

    except Exception as e:
        console.print(f"[bold red]Error: {e}[/bold red]")
        raise

def main():
    parser = argparse.ArgumentParser(description="Token Streamer: Stream token-by-token completions from a locally hosted LLM.")
    parser.add_argument("--model-path", required=True, help="Path to the locally hosted model.")
    parser.add_argument("--input", required=True, help="Input prompt for the model.")
    parser.add_argument("--stream-speed", type=float, default=0.5, help="Delay in seconds between streaming tokens.")
    parser.add_argument("--output-file", help="Optional file path to save the output.")

    args = parser.parse_args()

    stream_tokens(
        model_path=args.model_path,
        input_prompt=args.input,
        stream_speed=args.stream_speed,
        output_file=args.output_file
    )

if __name__ == "__main__":
    main()

Community

Downloads

···

Rate this tool

No ratings yet — be the first!

Details

Tool Name: token_streamer
Category: Local LLM Inference Engines
Generated: June 4, 2026
Tests: Passing ✅
Fix Loops: 2

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-06-04/token_streamer
cd generated_tools/2026-06-04/token_streamer
pip install -r requirements.txt 2>/dev/null || true
python token_streamer.py

Links

View source on GitHub Raw README.md