๐ป AIMarch 15, 2026โ
Tests passing
Synthetic Data Generator for AI
Synthetic Data Generator is a Python library for creating high-quality synthetic datasets for AI tasks, such as image classification or text processing. It uses preconfigured templates of data generation (e.g., random images with labels or simulated text) and integrates augmentation options, helping developers test models when real-world data is scarce or inaccessible.
What It Does
- Generate synthetic image datasets with optional augmentation.
- Generate synthetic text datasets with optional augmentation.
- Configurable parameters for dataset size, image dimensions, and text length.
Installation
Install the required dependencies:
pip install numpy PillowUsage
Run the script with the following options:
python synthetic_data_generator.py --type [image|text] --num_samples <number> --augment --image_size <width height> --sentence_length <number> --output_dir <directory>Examples
Generate 100 synthetic images:
python synthetic_data_generator.py --type image --num_samples 100 --image_size 128 128 --output_dir synthetic_imagesGenerate 50 synthetic text samples with augmentation:
python synthetic_data_generator.py --type text --num_samples 50 --sentence_length 10 --augmentSource Code
import os
import numpy as np
from PIL import Image, ImageEnhance
import random
import string
def generate_image_data(num_samples=100, image_size=(128, 128), augment=False, output_dir="synthetic_images"):
"""
Generate synthetic image data with optional augmentation.
Args:
num_samples (int): Number of images to generate.
image_size (tuple): Size of each image (width, height).
augment (bool): Whether to apply random augmentations.
output_dir (str): Directory to save generated images.
Returns:
list: List of generated image file paths.
list: List of corresponding labels.
"""
os.makedirs(output_dir, exist_ok=True)
image_paths = []
labels = []
for i in range(num_samples):
# Create a random image
image_array = np.random.randint(0, 256, (image_size[1], image_size[0], 3), dtype=np.uint8)
image = Image.fromarray(image_array)
# Apply augmentation if enabled
if augment:
enhancer = ImageEnhance.Brightness(image)
image = enhancer.enhance(random.uniform(0.5, 1.5))
# Save the image
label = f"class_{random.randint(0, 9)}"
filename = f"{label}_{i}.png"
file_path = os.path.join(output_dir, filename)
image.save(file_path)
image_paths.append(file_path)
labels.append(label)
return image_paths, labels
def generate_text_data(num_samples=100, sentence_length=10, augment=False):
"""
Generate synthetic text data with optional augmentation.
Args:
num_samples (int): Number of text samples to generate.
sentence_length (int): Number of words in each sentence.
augment (bool): Whether to apply random augmentations.
Returns:
list: List of generated text samples.
list: List of corresponding labels.
"""
# Mocked word list for testing purposes
words = ["word1", "word2", "word3", "word4", "word5"]
text_samples = []
labels = []
for _ in range(num_samples):
sentence = " ".join(random.choices(words, k=sentence_length))
# Apply augmentation if enabled
if augment:
sentence = sentence.lower() if random.random() > 0.5 else sentence.upper()
label = "positive" if random.random() > 0.5 else "negative"
text_samples.append(sentence)
labels.append(label)
return text_samples, labels
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Synthetic Data Generator for AI")
parser.add_argument("--type", choices=["image", "text"], required=True, help="Type of dataset to generate.")
parser.add_argument("--num_samples", type=int, default=100, help="Number of samples to generate.")
parser.add_argument("--augment", action="store_true", help="Enable data augmentation.")
parser.add_argument("--image_size", type=int, nargs=2, default=(128, 128), help="Size of images (width height).")
parser.add_argument("--sentence_length", type=int, default=10, help="Number of words in each text sample.")
parser.add_argument("--output_dir", type=str, default="synthetic_data", help="Output directory for generated data.")
args = parser.parse_args()
if args.type == "image":
generate_image_data(
num_samples=args.num_samples,
image_size=tuple(args.image_size),
augment=args.augment,
output_dir=args.output_dir
)
elif args.type == "text":
generate_text_data(
num_samples=args.num_samples,
sentence_length=args.sentence_length,
augment=args.augment
)Community
Downloads
ยทยทยท
Rate this tool
No ratings yet โ be the first!
Details
- Tool Name
- synthetic_data_generator
- Category
- AI
- Generated
- March 15, 2026
- Tests
- Passing โ
- Fix Loops
- 2
Quick Install
Clone just this tool:
git clone --depth 1 --filter=blob:none --sparse \ https://github.com/ptulin/autoaiforge.git cd autoaiforge git sparse-checkout set generated_tools/2026-03-15/synthetic_data_generator cd generated_tools/2026-03-15/synthetic_data_generator pip install -r requirements.txt 2>/dev/null || true python synthetic_data_generator.py