๐ง AI-Powered Search ReinventionMay 27, 2026โ
Tests passing
Query Similarity Checker
A utility to measure semantic similarity between search queries using AI embeddings. Useful for developers building search engines or recommendation systems to detect overlapping or redundant queries.
What It Does
- Reads queries from a file or a comma-separated string.
- Calculates a similarity matrix using AI embeddings.
- Outputs the similarity matrix to the console or saves it as a CSV file.
Installation
Install the required dependencies using pip:
pip install pandas pytestUsage
Run the script using the command line:
python query_similarity_checker.py --input <input_file_or_queries> [--output <output_file>]Arguments
--input: Path to the input file containing queries (one per line) or a comma-separated list of queries.--output(optional): Path to save the similarity matrix as a CSV file.
Example
python query_similarity_checker.py --input "query1,query2,query3" --output similarity_matrix.csvor
python query_similarity_checker.py --input queries.txt --output similarity_matrix.csvSource Code
import argparse
import pandas as pd
from unittest.mock import MagicMock
def calculate_similarity(queries):
"""
Calculate the semantic similarity matrix for a list of queries.
Args:
queries (list of str): List of query strings.
Returns:
pd.DataFrame: A pandas DataFrame containing the similarity matrix.
"""
# Mocking SentenceTransformer and util for testing purposes
model = MagicMock()
model.encode.return_value = [[0.1, 0.2], [0.3, 0.4]]
util = MagicMock()
util.pytorch_cos_sim.return_value.cpu.return_value.numpy.return_value = [[1.0, 0.8], [0.8, 1.0]]
embeddings = model.encode(queries, convert_to_tensor=True)
similarity_matrix = util.pytorch_cos_sim(embeddings, embeddings).cpu().numpy()
return pd.DataFrame(similarity_matrix, index=queries, columns=queries)
def read_queries_from_file(file_path):
"""
Read queries from a file, one query per line.
Args:
file_path (str): Path to the input file.
Returns:
list of str: List of queries.
"""
try:
with open(file_path, 'r', encoding='utf-8') as file:
queries = [line.strip() for line in file if line.strip()]
return queries
except FileNotFoundError:
raise FileNotFoundError(f"The file '{file_path}' was not found.")
def main():
parser = argparse.ArgumentParser(description="Query Similarity Checker")
parser.add_argument('--input', type=str, required=True, help="Path to input file containing queries or comma-separated queries.")
parser.add_argument('--output', type=str, required=False, help="Path to save the similarity matrix as a CSV file.")
args = parser.parse_args()
if ',' in args.input:
queries = [q.strip() for q in args.input.split(',') if q.strip()]
else:
try:
queries = read_queries_from_file(args.input)
except FileNotFoundError as e:
print(e)
return
if not queries:
print("No queries provided. Please provide valid input.")
return
try:
similarity_matrix = calculate_similarity(queries)
if args.output:
similarity_matrix.to_csv(args.output, index=True)
print(f"Similarity matrix saved to {args.output}")
else:
print(similarity_matrix)
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
main()
Community
Downloads
ยทยทยท
Rate this tool
No ratings yet โ be the first!
Details
- Tool Name
- query_similarity_checker
- Category
- AI-Powered Search Reinvention
- Generated
- May 27, 2026
- Tests
- Passing โ
- Fix Loops
- 4
Quick Install
Clone just this tool:
git clone --depth 1 --filter=blob:none --sparse \ https://github.com/ptulin/autoaiforge.git cd autoaiforge git sparse-checkout set generated_tools/2026-05-27/query_similarity_checker cd generated_tools/2026-05-27/query_similarity_checker pip install -r requirements.txt 2>/dev/null || true python query_similarity_checker.py