All Toolsโ€บSynthetic Data Generator for AI
๐Ÿ’ป AIMarch 15, 2026โœ… Tests passing

Synthetic Data Generator for AI

Synthetic Data Generator is a Python library for creating high-quality synthetic datasets for AI tasks, such as image classification or text processing. It uses preconfigured templates of data generation (e.g., random images with labels or simulated text) and integrates augmentation options, helping developers test models when real-world data is scarce or inaccessible.

What It Does

  • Generate synthetic image datasets with optional augmentation.
  • Generate synthetic text datasets with optional augmentation.
  • Configurable parameters for dataset size, image dimensions, and text length.

Installation

Install the required dependencies:

pip install numpy Pillow

Usage

Run the script with the following options:

python synthetic_data_generator.py --type [image|text] --num_samples <number> --augment --image_size <width height> --sentence_length <number> --output_dir <directory>

Examples

Generate 100 synthetic images:

python synthetic_data_generator.py --type image --num_samples 100 --image_size 128 128 --output_dir synthetic_images

Generate 50 synthetic text samples with augmentation:

python synthetic_data_generator.py --type text --num_samples 50 --sentence_length 10 --augment

Source Code

import os
import numpy as np
from PIL import Image, ImageEnhance
import random
import string

def generate_image_data(num_samples=100, image_size=(128, 128), augment=False, output_dir="synthetic_images"):
    """
    Generate synthetic image data with optional augmentation.

    Args:
        num_samples (int): Number of images to generate.
        image_size (tuple): Size of each image (width, height).
        augment (bool): Whether to apply random augmentations.
        output_dir (str): Directory to save generated images.

    Returns:
        list: List of generated image file paths.
        list: List of corresponding labels.
    """
    os.makedirs(output_dir, exist_ok=True)
    image_paths = []
    labels = []

    for i in range(num_samples):
        # Create a random image
        image_array = np.random.randint(0, 256, (image_size[1], image_size[0], 3), dtype=np.uint8)
        image = Image.fromarray(image_array)

        # Apply augmentation if enabled
        if augment:
            enhancer = ImageEnhance.Brightness(image)
            image = enhancer.enhance(random.uniform(0.5, 1.5))

        # Save the image
        label = f"class_{random.randint(0, 9)}"
        filename = f"{label}_{i}.png"
        file_path = os.path.join(output_dir, filename)
        image.save(file_path)

        image_paths.append(file_path)
        labels.append(label)

    return image_paths, labels

def generate_text_data(num_samples=100, sentence_length=10, augment=False):
    """
    Generate synthetic text data with optional augmentation.

    Args:
        num_samples (int): Number of text samples to generate.
        sentence_length (int): Number of words in each sentence.
        augment (bool): Whether to apply random augmentations.

    Returns:
        list: List of generated text samples.
        list: List of corresponding labels.
    """
    # Mocked word list for testing purposes
    words = ["word1", "word2", "word3", "word4", "word5"]
    text_samples = []
    labels = []

    for _ in range(num_samples):
        sentence = " ".join(random.choices(words, k=sentence_length))

        # Apply augmentation if enabled
        if augment:
            sentence = sentence.lower() if random.random() > 0.5 else sentence.upper()

        label = "positive" if random.random() > 0.5 else "negative"
        text_samples.append(sentence)
        labels.append(label)

    return text_samples, labels

if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Synthetic Data Generator for AI")
    parser.add_argument("--type", choices=["image", "text"], required=True, help="Type of dataset to generate.")
    parser.add_argument("--num_samples", type=int, default=100, help="Number of samples to generate.")
    parser.add_argument("--augment", action="store_true", help="Enable data augmentation.")
    parser.add_argument("--image_size", type=int, nargs=2, default=(128, 128), help="Size of images (width height).")
    parser.add_argument("--sentence_length", type=int, default=10, help="Number of words in each text sample.")
    parser.add_argument("--output_dir", type=str, default="synthetic_data", help="Output directory for generated data.")

    args = parser.parse_args()

    if args.type == "image":
        generate_image_data(
            num_samples=args.num_samples,
            image_size=tuple(args.image_size),
            augment=args.augment,
            output_dir=args.output_dir
        )
    elif args.type == "text":
        generate_text_data(
            num_samples=args.num_samples,
            sentence_length=args.sentence_length,
            augment=args.augment
        )

Community

Downloads

ยทยทยท

Rate this tool

No ratings yet โ€” be the first!

Details

Tool Name
synthetic_data_generator
Category
AI
Generated
March 15, 2026
Tests
Passing โœ…
Fix Loops
2

Quick Install

Clone just this tool:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/ptulin/autoaiforge.git
cd autoaiforge
git sparse-checkout set generated_tools/2026-03-15/synthetic_data_generator
cd generated_tools/2026-03-15/synthetic_data_generator
pip install -r requirements.txt 2>/dev/null || true
python synthetic_data_generator.py