Downloadlanguage_detection
Detect the language from a given text.
To do that it generates a language profile based on N-grams for every file in etc directory.
Then it generate such language profile for the unknown text and compare the previosly language profiles against the unknown.
Requirements:
Only requirement is a PHP version greater than or equal to 7.1.
> Note: language_detection requires the Multibyte String extension in order to work.
Install via Composer
composer require patrick-schur/language-detection
Or add the following to composer.json {
"require": {
"patrick-schur/language-detection": "*"
}
}
Basic Usage
Before we can recognize the language from a given text, we have to generate a language profile for each language.
From the beginning it comes with a pre-trained language profile (etc/_langs.json ).<br>
Also you can add new files to etc or change existing ones.
First we have to generate a language profile.
require_once 'vendor/autoload.php';
use LanguageDetector\Trainer;
$t = new Trainer;
$t->learn();
If we have our language profile, we can classify texts by their language.
To detect the language correctly, the length of the input text should be at least some sentences.
require_once 'vendor/autoload.php';
use LanguageDetector\LanguageDetector;
$ld = new LanguageDetector;
var_dump($ld->detect('Das ist ein deutscher Satz.')); // de
Supported languages:
It supports up to now 73 languages.
If your language not supported, feel free to add your own language files.
-
ab (abkhaz)
-
af (afrikaans)
-
am (amharic)
-
ar (arabic)
-
az (azerbaijani)
-
be (belarusian)
-
bg (bulgarian)
-
bn (bengali)
-
co (corsican)
-
cs (czech)
-
cy (welsh)
-
de (german)
-
dk (danish)
-
el (greek)
-
en (english)
-
eo (esperanto)
-
es (spanish)
-
et (estonian)
-
eu (basque)
-
fa (persian)
-
fi (finnish)
-
fj (fijian)
-
fo (faroese)
-
fr (french)
-
ga (irish)
-
gd (scottish)
-
gl (galician)
-
gn (guarani)
-
ha (hausa)
-
he (hebrew)
-
hi (hindi)
-
hr (croatian)
-
hu (hungarian)
-
hy (armenian)
-
ia (interlingua)
-
ig (igbo)
-
io (ido)
-
is (icelandic)
-
it (italian)
-
iu (inuktitut)
-
jp (japanese)
-
jv (javanese)
-
ka (georgian)
-
ko (korean)
-
ku (kurdish)
-
la (latin)
-
lg (ganda)
-
lo (lao)
-
lt (lithuanian)
-
lv (latvian)
-
mh (marshallese)
-
mn (mongolian)
-
ms (malay)
-
mt (maltese)
-
nl (dutch)
-
no (norwegian)
-
nv (navajo)
-
pl (polish)
-
pt (portuguese)
-
ro (romanian)
-
ru (russian)
-
sk (slovak)
-
sl (slovene)
-
so (somali)
-
sv (swedish)
-
th (thai)
-
tr (turkish)
-
ty (tahitian)
-
ug (uyghur)
-
uk (ukrainian)
-
uz (uzbek)
-
vi (vietnamese)
-
zh (chinese)
|