🔧 AI in Content CreationMay 3, 2026✅ Tests passing

AI Video Captioner

This tool uses AI to automatically generate captions for video files by transcribing audio and optionally translating captions into multiple languages. Useful for content creators looking to make their videos more accessible and global.

View on GitHub Download ZIP

Share:X / Twitter LinkedIn Reddit Hacker News

What It Does

Extract audio from video files.
Transcribe audio to text using OpenAI's Whisper API.
Translate captions into multiple languages using OpenAI's GPT model.
Save captions in .srt or .vtt formats.

Installation

Python 3.7+
ffmpeg-python
openai
pytest

Install the dependencies using the provided requirements.txt file.

Usage

Run the script with the following arguments:

python ai_video_captioner.py --video <path_to_video> --output <path_to_output_file> [--lang <comma_separated_language_codes>] [--format <srt|vtt>]

Arguments

--video: Path to the input video file (required).
--output: Path to the output caption file (required).
--lang: Comma-separated list of language codes for translation (optional).
--format: Output caption format, either srt or vtt (default: srt).

Example

To generate English captions for a video:

python ai_video_captioner.py --video input.mp4 --output captions.srt

To generate captions in English and Spanish:

python ai_video_captioner.py --video input.mp4 --output captions.srt --lang es

Source Code

import os
import argparse
import ffmpeg
import openai

def extract_audio(video_path, audio_path):
    try:
        (
            ffmpeg
            .input(video_path)
            .output(audio_path, ac=1, ar='16000')
            .run(overwrite_output=True)
        )
    except ffmpeg.Error as e:
        raise RuntimeError(f"Error extracting audio: {e}")

def transcribe_audio(audio_path):
    try:
        with open(audio_path, 'rb') as audio_file:
            response = openai.Audio.transcribe("whisper-1", audio_file)
        return response['text']
    except Exception as e:
        raise RuntimeError(f"Error transcribing audio: {e}")

def translate_text(text, target_languages):
    translations = {}
    for lang in target_languages:
        try:
            response = openai.Completion.create(
                engine="text-davinci-003",
                prompt=f"Translate the following text to {lang}: {text}",
                max_tokens=1000
            )
            translations[lang] = response['choices'][0]['text'].strip()
        except Exception as e:
            raise RuntimeError(f"Error translating to {lang}: {e}")
    return translations

def save_captions(captions, output_path, format):
    try:
        with open(output_path, 'w', encoding='utf-8') as f:
            if format == 'srt':
                for i, caption in enumerate(captions, start=1):
                    f.write(f"{i}\n00:00:{i:02},000 --> 00:00:{i+1:02},000\n{caption}\n\n")
            elif format == 'vtt':
                f.write("WEBVTT\n\n")
                for i, caption in enumerate(captions, start=1):
                    f.write(f"00:00:{i:02}.000 --> 00:00:{i+1:02}.000\n{caption}\n\n")
            else:
                raise ValueError("Unsupported format. Use 'srt' or 'vtt'.")
    except Exception as e:
        raise RuntimeError(f"Error saving captions: {e}")

def main():
    parser = argparse.ArgumentParser(description="AI Video Captioner: Generate captions for video files.")
    parser.add_argument('--video', required=True, help="Path to the input video file.")
    parser.add_argument('--lang', help="Comma-separated list of language codes for translation.")
    parser.add_argument('--output', required=True, help="Path to the output caption file.")
    parser.add_argument('--format', choices=['srt', 'vtt'], default='srt', help="Output caption format (default: srt).")

    args = parser.parse_args()

    video_path = args.video
    output_path = args.output
    format = args.format
    target_languages = args.lang.split(',') if args.lang else []

    audio_path = "temp_audio.wav"

    try:
        extract_audio(video_path, audio_path)
        transcription = transcribe_audio(audio_path)

        captions = [transcription]
        if target_languages:
            translations = translate_text(transcription, target_languages)
            captions.extend(translations.values())

        save_captions(captions, output_path, format)
        print(f"Captions saved to {output_path}")
    except Exception as e:
        print(f"Error: {e}")
    finally:
        if os.path.exists(audio_path):
            os.remove(audio_path)

if __name__ == "__main__":
    main()

Community

Downloads

···

Rate this tool

No ratings yet — be the first!

Details

Tool Name: ai_video_captioner
Category: AI in Content Creation
Generated: May 3, 2026
Tests: Passing ✅
Fix Loops: 5

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-05-03/ai_video_captioner
cd generated_tools/2026-05-03/ai_video_captioner
pip install -r requirements.txt 2>/dev/null || true
python ai_video_captioner.py

Links

View source on GitHub Raw README.md