PHP Classes

File: README.md

Recommend this page to a friend!
  Classes of AccountKiller   PHP Text Language Detection Library   README.md   Download  
File: README.md
Role: Documentation
Content type: text/markdown
Description: Documentation
Class: PHP Text Language Detection Library
Detect the language of a given text string
Author: By
Last change:
Date: 7 years ago
Size: 3,576 bytes
 

Contents

Class file image Download

language_detection

Build Status Version Total Downloads Maintenance License

Detect the language from a given text. To do that it generates a language profile based on N-grams for every file in etc directory. Then it generate such language profile for the unknown text and compare the previosly language profiles against the unknown.

Requirements:

Only requirement is a PHP version greater than or equal to 7.1. > Note: language_detection requires the Multibyte String extension in order to work.

Install via Composer

composer require patrick-schur/language-detection

Or add the following to composer.json

{
  "require": {
     "patrick-schur/language-detection": "*"
  }
}

Basic Usage

Before we can recognize the language from a given text, we have to generate a language profile for each language. From the beginning it comes with a pre-trained language profile (etc/_langs.json).<br> Also you can add new files to etc or change existing ones.

First we have to generate a language profile.

require_once 'vendor/autoload.php';
 
use LanguageDetector\Trainer;
 
$t = new Trainer;
 
$t->learn();

If we have our language profile, we can classify texts by their language. To detect the language correctly, the length of the input text should be at least some sentences.

require_once 'vendor/autoload.php';
 
use LanguageDetector\LanguageDetector;
 
$ld = new LanguageDetector;
 
var_dump($ld->detect('Das ist ein deutscher Satz.')); // de

Supported languages:

It supports up to now 73 languages. If your language not supported, feel free to add your own language files.

  • ab (abkhaz)
  • af (afrikaans)
  • am (amharic)
  • ar (arabic)
  • az (azerbaijani)
  • be (belarusian)
  • bg (bulgarian)
  • bn (bengali)
  • co (corsican)
  • cs (czech)
  • cy (welsh)
  • de (german)
  • dk (danish)
  • el (greek)
  • en (english)
  • eo (esperanto)
  • es (spanish)
  • et (estonian)
  • eu (basque)
  • fa (persian)
  • fi (finnish)
  • fj (fijian)
  • fo (faroese)
  • fr (french)
  • ga (irish)
  • gd (scottish)
  • gl (galician)
  • gn (guarani)
  • ha (hausa)
  • he (hebrew)
  • hi (hindi)
  • hr (croatian)
  • hu (hungarian)
  • hy (armenian)
  • ia (interlingua)
  • ig (igbo)
  • io (ido)
  • is (icelandic)
  • it (italian)
  • iu (inuktitut)
  • jp (japanese)
  • jv (javanese)
  • ka (georgian)
  • ko (korean)
  • ku (kurdish)
  • la (latin)
  • lg (ganda)
  • lo (lao)
  • lt (lithuanian)
  • lv (latvian)
  • mh (marshallese)
  • mn (mongolian)
  • ms (malay)
  • mt (maltese)
  • nl (dutch)
  • no (norwegian)
  • nv (navajo)
  • pl (polish)
  • pt (portuguese)
  • ro (romanian)
  • ru (russian)
  • sk (slovak)
  • sl (slovene)
  • so (somali)
  • sv (swedish)
  • th (thai)
  • tr (turkish)
  • ty (tahitian)
  • ug (uyghur)
  • uk (ukrainian)
  • uz (uzbek)
  • vi (vietnamese)
  • zh (chinese)