PHP Classes

similar_text

Recommend this page to a friend!

      PHP Similar Text Percentage  >  All threads  >  similar_text  >  (Un) Subscribe thread alerts  
Subject:similar_text
Summary:Why not use the built-in function?
Messages:2
Author:Lee McAuley
Date:2018-05-25 23:52:39
 

  1. similar_text   Reply   Report abuse  
Picture of Lee McAuley Lee McAuley - 2018-05-25 23:52:39
Hello, I really appreciate the work I've seen you and other people contribute to this site. But I have a question: have you checked out your code against the built-in "similar_text" function of PHP? It looks like you went to a lot of trouble and work to recreate it. It's very accurate in the cases I've used it in -- and those were Enterprise-level applications that did not have much tolerance for duplicate entries in some areas.

An example I've used, in order to prevent duplicate registrations by people from the same company is this:

$match = similar_text($item_on_form, $item_on_dbase, $pct);
if (round($pct, 0) > 85) {
// then you've got a duplicate, don't register!
}

My use of 85% is, of course, arbitrary. I could tighten it up to 90% or above if needed to.
Let me know your thoughts, please.
Thanks

  2. Re: similar_text   Reply   Report abuse  
Picture of zinsou A.A.E.Moïse zinsou A.A.E.Moïse - 2018-05-26 05:40:44 - In reply to message 1 from Lee McAuley
Hi

Thanks for the feedback i really appreciate it.First i want to reassure you that your way to achieve the validation is good.PHP built in similar_text function is really accuracy and my goal was not to replace it.However people tend to ask again some string comparison function.Some of them want to highlight the differences between strings for example show the concerned missing letters in one string which the second have and vice versa.Some of them also just want how similar two string are and not just the number of similar letters or the percentage of similarity.Eg:

var_dump(similar_text('man is a joker','joker is a man',$p),$p);// print 6 percentage 42.857142857143

If i take the similar_text doc i see that the returned value is the number of similar letters but as you may remark here this value doesn't really reflect this similarity. But with my approach of similarity is to give the user all the details that he can need to say how linked or similar the two texts are.

my code will print something like
percentage similarity: 100
percentage contains :100
boolean really contains: false
array letters not in string 0: []
array letters not in string 1: []

and this details lead to the conclusion that the two are anagrams or something else .

So the only real differences stand in this two points.First the approach is to determine how similar is the shortest string to the longest and second to give sufficient data to let user classify this similarity in a category...