PHP Classes
elePHPant
Icontem

Fast Chinese Word Segmentation: Segment Chinese text using the RMM approach

Recommend this page to a friend!
  Info   Screenshots Screenshots   View files View files (4)   DownloadInstall with Composer Download .zip   Reputation   Support forum (1)   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2005-08-10 (11 years ago) RSS 2.0 feedNot yet rated by the usersTotal: 608 All time: 4,905 This week: 862Up
Version License Categories
fcws 1.0Free for non-comm...Text processing
Description Author
This package is specific mainly for applications used in China China .

This class can segment Chinese text.

It uses the RMM (reverse maximum match) approach. Therefore it may commit some mistakes that cannot be avoided with perfection.

It handles English but in a very simple way.

Innovation Award
PHP Programming Innovation award nominee
July 2005
Number 9
Chinese is a language that is becoming more and more relevant on the Internet due to the growth of the Chinese economy. This growth is making it possible for many Chinese speaking people becoming Internet users.

The Chinese language words are actually individual symbols. Certain encodings may include ASCII characters allowing for words in other languages to be mixed in Chinese documents.

This class provides a solution to break a Chinese text in a way that it avoids breaking English words that may be mixed with Chinese symbols.

Manuel Lemos
Picture of Wudi
Name: Wudi <contact>
Classes: 5 packages by
Country: China China
Innovation award
Innovation award
Nominee: 2x

Screenshots  
  • screenshot.png
  Files folder image Files  
File Role Description
Plain text file cwordseg_fast.lib.php Class Class
HTML file Readme_CN.htm Doc. Readme (Chinese)
HTML file Readme_EN.htm Doc. Readme (English)
Plain text file test.php Example Test

 Version Control Unique User Downloads Download Rankings  
 0%
Total:608
This week:0
All time:4,905
This week:862Up