Last Updated | | Ratings | | Unique User Downloads | | Download Rankings |
2020-10-20 (18 hours ago) | | Not enough user ratings | | Total: 102 | | All time: 9,398 This week: 214 |
|
Description | | Author |
This class can identify predominant character set in a string.
It can take a string of text in UTF-8 and analyzes the character codes to determine which is the predominant character set that the is used based on the frequency of the characters that are typically of certain languages.
Currently it can identify the character sets of Latin, Greek, Cyrillic. Armenian, Hebrew, Arabic, Devanagari, Bengali, Gujarati, Tamil, Malayalam, Sinhala, Thai, Lao, Tibetan, Burmese, Georgian, Korean, Khmer, Japanese, and CJK. Innovation Award
June 2017
Number 6 |
A string in Unicode may contain text of multiple character sets.
This class can identify predominant character set in a string of many possible character sets.
Manuel Lemos |
| |
|
|
Innovation award
Nominee: 23x
Winner: 2x |
|
Details
Charset From String
Identifies predominant script (charset, language) in a string. This library is capable of identifying:
<pre>
Arabic
Armenian
Bengali
Burmese
CJK
Cyrillic
Devanagari
Georgian
Greek
Gujarati
Hebrew
Japanese
Khmer
Korean
Lao
Latin
Malayalam
Sinhala
Tamil
Thai
Tibetan
</pre>
Usage
use peterkahl\CharsetFromString\CharsetFromString;
echo CharsetFromString::getCharset('????? ????? ?? ???????')."\n"; # ARABIC
echo CharsetFromString::getCharset('????? ????? ?? ????-??')."\n"; # HEBREW
echo CharsetFromString::getCharset('??? ?????? ?????? ??????, ??? ??? ?? ?????.')."\n"; # CYRILLIC
echo CharsetFromString::getCharset('Lex iniusta non est lex.')."\n"; # LATIN
echo CharsetFromString::getCharset('??? ??? ?? ??? ??? ???? ??.')."\n"; # KOREAN
echo CharsetFromString::getCharset('??????????????')."\n"; # JAPANESE
echo CharsetFromString::getCharset('??????????')."\n"; # CJK
echo CharsetFromString::getCharset('??????????????? ??????????????')."\n"; # THAI
echo CharsetFromString::getCharset('????????????????????????????????????? ????')."\n"; # LAO
echo CharsetFromString::getCharset('?????????????????????????????????????????????')."\n"; # KHMER
echo CharsetFromString::getCharset('???????????????????????????????')."\n"; # TIBETAN
echo CharsetFromString::getCharset('? ????? ???????? ??? ???????? ?????? ????, ???? ???????? ???? ??? ??????.')."\n"; # GREEK
|
Applications that use this package |
|
No pages of applications that use this class were specified.
If you know an application of this package, send a message to the author to add a link here.