PHP Classes

PHP Similar Text Percentage: Compare two strings to compute a similarity score

Recommend this page to a friend!
     
  Info   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStarStar 69%Total: 417 All time: 6,427 This week: 47Up
Version License PHP version Categories
similar-text 4.0.0MIT/X Consortium ...5Algorithms, PHP 5, Text processing
Description 

Author

This class can compare two strings to compute a similarity score.

It takes the text of two strings and analyze them using pure PHP code to evaluate how equal they are.

The class returns a number that represents a percentage of the two strings to tell the level of similarity.

It achieves that by sorting words, ignoring white space and punctuation, removing or adding word, strip URLs, replace words by acronyms or expanding acronyms into the original words, compare words with similar sounds using stems, checking parts of the strings, replace words by abbreviations or using anagrams.

Innovation Award
PHP Programming Innovation award nominee
April 2018
Number 6
PHP comes with built-in functions for comparing strings and determine how similar they are.

This package provides a pure PHP solution that works in a more sophisticated way by performing text comparison on a sentences basis, rather than on a word by word basis.

Manuel Lemos
Picture of zinsou A.A.E.Moïse
  Performance   Level  
Name: zinsou A.A.E.Moïse <contact>
Classes: 50 packages by
Country: Benin Benin
Innovation award
Innovation award
Nominee: 23x

Winner: 2x

Recommendations

check similariries between text files
i want to check different text documents to find similarities

Details

PHP Similar Text Percentage: Compare two strings to compute a similarity score ============================================================================== [![Build Status](https://travis-ci.org/manuwhat/similar-text.svg?branch=master)](https://travis-ci.org/manuwhat/similar-text) [![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/manuwhat/similar-text/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/manuwhat/similar-text/?branch=master) [![Build Status](https://scrutinizer-ci.com/g/manuwhat/similar-text/badges/build.png?b=master)](https://scrutinizer-ci.com/g/manuwhat/similar-text/build-status/master) [![Code Intelligence Status](https://scrutinizer-ci.com/g/manuwhat/similar-text/badges/code-intelligence.svg?b=master)](https://scrutinizer-ci.com/code-intelligence) ### Library which help to Compare two strings to compute a similarity score and get stats on how linked are the strings. **Requires**: PHP 5.3+ ### What this library exactly does? this library can compare two strings to compute a similarity score. It takes the text of two strings and analyze them using pure PHP code to evaluate how equal they are. The class returns a number that represents a percentage of the two strings to tell the level of similarity. Based on the stats provided It actually can help to detect similarity even if these cases occurred : WORD REORDER,WHITESPACE AND PUNCTUATION,REMOVE WORDS,ADD WORDS,URL STRIPPING, FORM ACRONYM,EXPAND ACRONYM,STEMMING,SUBSTRING ,SUPERSTRING,ABBREVIATION ,ANAGRAM ### How to use it Require the library by issuing this command: ```bash composer require manuwhat/similar-text ``` Add `require 'vendor/autoload.php';` to the top of your script. Next, use it in your script, just like this: ```php use ezama/similar-text; 100.0===similarText('qwerty', 'ytrewq') ``` This is an example of how to use the stats to check a special case.Here we will use them to check about anagrams (note that this has already been implemented in the library check the file similar_text.php to know more about all available implementation) ```php function areAnagrams($a, $b) { return Ezama\similar_text::similarText($a, $b, 2, true, $check)?$check['similar'] === 100.0&&$check['contain']===true:false; } areAnagrams('qwerty', 'ytrewq');// return true; ``` Nb: some functions and methods are more subtle than one can think. For example the method simpleCommonTextSimilarities::aIsSuperStringOfB and its helper aIsSuperStringOfB are not at all equal to the usual checking functions built on top of preg_match ,stripos and PHP similar functions a simple example is : ```php function aisSuperStringOfB_stripos($a, $b) { return false!==stripos($a,$b); } function aisSuperStringOfB_PCRE($a, $b) { return preg_match('#'.preg_quote($b).'#i',$a); } require './vendor/manuwhat/similar-text/similar_text.php'; aIsSuperStringOfB('mum do you want to cook something', 'do you cook something mum');//return true; aIsSuperStringOfB_stripos('mum do you want to cook something', 'do you cook something mum');//false; aIsSuperStringOfB_PCRE('mum do you want to cook something', 'do you cook something mum');//return false; ``` ### How To run unit tests ```bash phpunit ./tests ```

  Files folder image Files (17)  
File Role Description
Files folder imagesrc (9 files)
Files folder imagetests (1 file)
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE Lic. License text
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation
Accessible without login Plain text file readme.txt Doc. readme
Accessible without login Plain text file similar_text.php Aux. Auxiliary script

  Files folder image Files (17)  /  src  
File Role Description
  Plain text file complexCommonTextSimilarities.php Class Class source
  Plain text file complexCommonTextSimilaritiesHelper.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
  Plain text file diceDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
  Plain text file distance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
  Plain text file hammingDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methodsimplemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
  Plain text file jaroWinklerDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
  Plain text file levenshteinDistance.php Class implemented common distance algorithms with some custom behavior so it won't do as good as original -levenshtein without string length limit -levenshtein damerau -dice -hamming -jaroWinkler Also improved existing methods
  Plain text file similar_text.php Class Class source
  Plain text file simpleCommonTextSimilarities.php Class Class source

  Files folder image Files (17)  /  tests  
File Role Description
  Plain text file Similar_textTest.php Class Class source

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 94%
Total:417
This week:0
All time:6,427
This week:47Up
 User Ratings  
 
 All time
Utility:100%StarStarStarStarStarStar
Consistency:100%StarStarStarStarStarStar
Documentation:91%StarStarStarStarStar
Examples:-
Tests:-
Videos:-
Overall:69%StarStarStarStar
Rank:356