Check % matching between two strings

 

At my current workplace, we have a recurring task where we need to analyze data manually and create mappings between two separate sets of string columns. For a long time, it was not a task we had to perform too commonly. However, recently, it became a very recurring task, and performing it manually became a little too tedious, so I decided to automate it.       

For instance, if I want to map a string 'quis nostrud laboris nisi ut commodo consequat' and we have the following data set to map to, the closest map is #5: 'quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat'. 



Now, going through the whole data set to manually find a mapping for this given string is pretty time-consuming. So, I decided to develop a procedure that would take two strings and returns the percentage of matching between those. 

The script does a word-by-word comparison and returns the percentage of words that match. 



CREATE PROCEDURE StringMatch (@String1 nvarchar(max), @String2 nvarchar(max)) 
AS
BEGIN

	DECLARE @NoOfMatches INT = 0
	DECLARE @i INT = 1,
		@count INT
        
        
	--Replace special characters 
	SET @String1 = REPLACE(REPLACE(REPLACE(@String1, ',',''), '.', ''), '''','')
	SET @String2 = REPLACE(REPLACE(REPLACE(@String2, ',',''), '.', ''), '''','')

	CREATE TABLE Split1 (SplittedColumn nvarchar(max), isMatched BIT)
	CREATE TABLE Split2 (SplittedColumn nvarchar(max))
	
	INSERT INTO Split1 (SplittedColumn)
	SELECT ReferralID 
	FROM [dbo].[SplitString](@String1, '') 

	INSERT INTO Split2 (SplittedColumn)
	SELECT ReferralID
	FROM [dbo].[SplitString](@String2, '') 


	UPDATE s1
	SET isMatched =
		(
			SELECT COUNT(*)
			FROM Split2 s2
			WHERE s1.SplittedColumn = s2.SplittedColumn
		)
	FROM Split1 s1
	

	SELECT
	(
		SELECT COUNT(*)
		FROM Split1
	) AS [Total Words],
	(
		SELECT COUNT(*)
		FROM Split1
		WHERE isMatched = 1
	) [Matched Words],
	CAST
	(
		(
			CAST
			(
				(
					SELECT COUNT(*)
					FROM Split1
					WHERE isMatched = 1
				) 
			AS FLOAT)/
			CAST
			(
				(
					SELECT COUNT(*)
					FROM Split1
				) 
			AS FLOAT)
		) * 100
		AS DECIMAL(18,2)
	) AS [% Match]


	DROP TABLE Split1
	DROP TABLE Split2

	RETURN 

END


It takes two strings, splits them on the basis of space into separate words. It then takes the first string and for each word of the first string that matches with the second string, it turns the 'isMatched' on. After all words have been checked, it counts the percentage of mapping as:

Total Words Matched/Total Words(in main/first string) * 100.

I ran the procedure with sample strings and it returned the following result: 




Total words in first string = 12
Total words in the second string that match with the first string = 7
% Match = (7/12)*100 = 58.33


PS: SplitString is a function that takes a string and a delimiter and returns a table by separating the string into rows based on the given delimiter.

For my actual problem, I looped through the table with string 1 as the string to be matched and string 2 from each row of the table and got the string that was matching the most with my original string.

Comments