At my current workplace, we have a recurring task where we need to analyze data manually and create mappings between two separate sets of string columns. For a long time, it was not a task we had to perform too commonly. However, recently, it became a very recurring task, and performing it manually became a little too tedious, so I decided to automate it.
For instance, if I want to map a string 'quis nostrud laboris nisi ut commodo consequat' and we have the following data set to map to, the closest map is #5: 'quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat'.
Now, going through the whole data set to manually find a mapping for this given string is pretty time-consuming. So, I decided to develop a procedure that would take two strings and returns the percentage of matching between those.
The script does a word-by-word comparison and returns the percentage of words that match.
CREATE PROCEDURE StringMatch (@String1 nvarchar(max), @String2 nvarchar(max))
AS
BEGIN
DECLARE @NoOfMatches INT = 0
DECLARE @i INT = 1,
@count INT
--Replace special characters
SET @String1 = REPLACE(REPLACE(REPLACE(@String1, ',',''), '.', ''), '''','')
SET @String2 = REPLACE(REPLACE(REPLACE(@String2, ',',''), '.', ''), '''','')
CREATE TABLE Split1 (SplittedColumn nvarchar(max), isMatched BIT)
CREATE TABLE Split2 (SplittedColumn nvarchar(max))
INSERT INTO Split1 (SplittedColumn)
SELECT ReferralID
FROM [dbo].[SplitString](@String1, '')
INSERT INTO Split2 (SplittedColumn)
SELECT ReferralID
FROM [dbo].[SplitString](@String2, '')
UPDATE s1
SET isMatched =
(
SELECT COUNT(*)
FROM Split2 s2
WHERE s1.SplittedColumn = s2.SplittedColumn
)
FROM Split1 s1
SELECT
(
SELECT COUNT(*)
FROM Split1
) AS [Total Words],
(
SELECT COUNT(*)
FROM Split1
WHERE isMatched = 1
) [Matched Words],
CAST
(
(
CAST
(
(
SELECT COUNT(*)
FROM Split1
WHERE isMatched = 1
)
AS FLOAT)/
CAST
(
(
SELECT COUNT(*)
FROM Split1
)
AS FLOAT)
) * 100
AS DECIMAL(18,2)
) AS [% Match]
DROP TABLE Split1
DROP TABLE Split2
RETURN
END
Comments
Post a Comment