Similarity Utilities
Text similarity calculation using Levenshtein distance and TF-IDF algorithms.
Functions
levenshteinSimilarity
Calculates Levenshtein similarity (edit distance based) between two strings.
function levenshteinSimilarity(a: string, b: string): numberReturns: Similarity score between 0 and 1.
Examples:
levenshteinSimilarity('hello', 'hallo'); // 0.8
levenshteinSimilarity('abc', 'abc'); // 1
levenshteinSimilarity('abc', 'xyz'); // 0
levenshteinSimilarity('空调回收', '空调上门回收'); // ~0.67tfidfSimilarity
Calculates TF-IDF based semantic similarity between a target string and candidates.
function tfidfSimilarity(
target: string,
candidates: string[]
): Array<{ text: string; score: number }>Returns: Array of results sorted by similarity score (descending).
Examples:
const target = '我要买一个蓝色的包包';
const candidates = ['蓝色手提袋', '红色外套', '黑色包包'];
const results = tfidfSimilarity(target, candidates);
// [
// { text: '蓝色手提袋', score: 0.85 },
// { text: '黑色包包', score: 0.65 },
// { text: '红色外套', score: 0.2 }
// ]compareSimilarity
Compares text similarity using both Levenshtein and TF-IDF algorithms.
function compareSimilarity(
target: string,
candidates: string[]
): Array<{
text: string;
levenshteinScore: number;
tfidfScore: number;
}>Examples:
const results = compareSimilarity('空调回收', ['空调上门回收', '手机维修']);
// [
// { text: '空调上门回收', levenshteinScore: 0.67, tfidfScore: 0.8 },
// { text: '手机维修', levenshteinScore: 0, tfidfScore: 0 }
// ]tokenize
Tokenizes text for preprocessing. Supports both English (space-separated) and Chinese (character-level).
function tokenize(text: string): string[]Examples:
tokenize('hello world'); // ['hello', 'world']
tokenize('你好世界'); // ['你', '好', '世', '界']
tokenize('Hello, World!'); // ['hello', 'world']cosineSimilarity
Calculates cosine similarity between two vectors.
function cosineSimilarity(vecA: number[], vecB: number[]): numberExamples:
cosineSimilarity([1, 0, 1], [1, 0, 1]); // 1
cosineSimilarity([1, 0, 0], [0, 1, 0]); // 0
cosineSimilarity([1, 1, 0], [1, 0, 1]); // 0.5computeTfidf
Computes TF-IDF vectors for a collection of documents.
function computeTfidf(documents: string[]): {
tfidfVectors: number[][];
termIndex: Map<string, number>;
}Examples:
const docs = ['hello world', 'hello there', 'world peace'];
const { tfidfVectors, termIndex } = computeTfidf(docs);
// tfidfVectors[0] is the TF-IDF vector for 'hello world'
// termIndex maps terms to vector indicesUse Cases
Spell Checking / Fuzzy Matching
Use levenshteinSimilarity for finding close matches:
const words = ['apple', 'apply', 'maple', 'banana'];
const typo = 'aple';
const matches = words
.map(word => ({ word, score: levenshteinSimilarity(typo, word) }))
.sort((a, b) => b.score - a.score);
// [{ word: 'apple', score: 0.8 }, ...]Search / Text Retrieval
Use tfidfSimilarity for semantic search:
const query = '如何学习编程';
const articles = [
'编程入门教程',
'学习Python的最佳方法',
'今日天气预报'
];
const results = tfidfSimilarity(query, articles);
// Returns articles sorted by relevance