PHP Classes
elePHPant
Icontem

Portable UTF-8: Manipulate UTF-8 text strings in pure PHP

Recommend this page to a friend!
  Info   View files Documentation   View files View files (103)   DownloadInstall with Composer Download .zip   Reputation   Support forum (2)   Blog (1)    
Last Updated Ratings Unique User Downloads Download Rankings
2020-02-17 (Less than 1 hour ago) RSS 2.0 feedStarStarStarStar 67%Total: 569 This week: 1All time: 5,288 This week: 368Up
Version License PHP version Categories
portable-utf8 3.0.106Custom (specified...5.3PHP 5, Text processing
Collaborate with this project Author

portable-utf8 - github.com

Description

This package can manipulate UTF-8 text strings in pure PHP.

It performs several types of functions to manipulate text strings encoded using UTF-8 that can work even when extensions like mbstring, iconv, or Intl are not available.

If these extensions are available the class will fallback to using them instead.

Recommendations

What is the best PHP count words class?
Count the number of words in a string

Innovation Award
PHP Programming Innovation award nominee
September 2016
Number 3


Prize: One downloadable e-book of choice by O'Reilly
Nowadays PHP has different extensions to manipulate text strings using Unicode UTF-8. However some may be available or not in different PHP environments.

This package provides a pure PHP solution to manipulate text in UTF-8, so you do not depend on whether any other extensions are available.

If such extensions are available, the package may fallback to using them them for performing the same UTF-8 text manipulation operations.

Manuel Lemos
  Performance   Level  
Name: Lars Moelleken <contact>
Classes: 20 packages by
Country: Germany Germany
Innovation award
Innovation award
Nominee: 9x

Winner: 1x

 

Details

Build Status Build status FOSSA Status Coverage Status Codacy Badge Latest Stable Version Total Downloads License Donate to this project using PayPal Donate to this project using Patreon

? Portable UTF-8

Description

It is written in PHP (PHP 7+) and can work without "mbstring", "iconv" or any other extra encoding php-extension on your server.

The benefit of Portable UTF-8 is that it is easy to use, easy to bundle. This library will also auto-detect your server environment and will use the installed php-extensions if they are available, so you will have the best possible performance.

As a fallback we will use Symfony Polyfills, if needed. (https://github.com/symfony/polyfill)

The project based on ... + Hamid Sarfraz's work - portable-utf8 + Nicolas Grekas's work - tchwork/utf8 + Behat's work - Behat/Transliterator + Sebastián Grignoli's work - neitanod/forceutf8 + Ivan Enderlin's work - hoaproject/Ustring + and many cherry-picks from "GitHub"-gists and "Stack Overflow"-snippets ...

Demo

Here you can test some basic functions from this library and you can compare some results with the native php function results.

Index

Alternative

If you like a more Object Oriented Way to edit strings, then you can take a look at voku/Stringy, it's a fork of "danielstjules/Stringy" but it used the "Portable UTF-8"-Class and some extra methods.

// Standard library
strtoupper('fòôbà?');       // 'FòôBà?'
strlen('fòôbà?');           // 10

// mbstring 
// WARNING: if you don't use a polyfill like "Portable UTF-8", you need to install the php-extension "mbstring" on your server
mb_strtoupper('fòôbà?');    // 'FÒÔBÀ?'
mb_strlen('fòôbà?');        // '6'

// Portable UTF-8
use voku\helper\UTF8;
UTF8::strtoupper('fòôbà?');    // 'FÒÔBÀ?'
UTF8::strlen('fòôbà?');        // '6'

// voku/Stringy
use Stringy\Stringy as S;
$stringy = S::create('fòôbà?');
$stringy->toUpperCase();    // 'FÒÔBÀ?'
$stringy->length();         // '6'

Install "Portable UTF-8" via "composer require"

composer require voku/portable-utf8

If your project do not need some of the Symfony polyfills please use the replace section of your composer.json. This removes any overhead from these polyfills as they are no longer part of your project. e.g.:

{
  "replace": {
    "symfony/polyfill-php72": "1.99",
    "symfony/polyfill-iconv": "1.99",
    "symfony/polyfill-intl-grapheme": "1.99",
    "symfony/polyfill-intl-normalizer": "1.99",
    "symfony/polyfill-mbstring": "1.99"
  }
}

Why Portable UTF-8?[]()

PHP 5 and earlier versions have no native Unicode support. To bridge the gap, there exist several extensions like "mbstring", "iconv" and "intl".

The problem with "mbstring" and others is that most of the time you cannot ensure presence of a specific one on a server. If you rely on one of these, your application is no more portable. This problem gets even severe for open source applications that have to run on different servers with different configurations. Considering these, I decided to write a library:

Requirements and Recommendations

  • No extensions are required to run this library. Portable UTF-8 only needs PCRE library that is available by default since PHP 4.2.0 and cannot be disabled since PHP 5.3.0. "\u" modifier support in PCRE for UTF-8 handling is not a must.
  • PHP 5.3 is the minimum requirement, and all later versions are fine with Portable UTF-8.
  • PHP 7.0 is the minimum requirement since version 4.0 of Portable UTF-8, otherwise composer will install an older version
  • To speed up string handling, it is recommended that you have "mbstring" or "iconv" available on your server, as well as the latest version of PCRE library
  • Although Portable UTF-8 is easy to use; moving from native API to Portable UTF-8 may not be straight-forward for everyone. It is highly recommended that you do not update your scripts to include Portable UTF-8 or replace or change anything before you first know the reason and consequences. Most of the time, some native function may be all what you need.
  • There is also a shim for "mbstring", "iconv" and "intl", so you can use it also on shared webspace.

Info

Since version 5.4.26 this library will NOT force "UTF-8" by "bootstrap.php" anymore. If you need to enable this behavior you can define "PORTABLE_UTF8__ENABLE_AUTO_FILTER", before requiring the autoloader.

define('PORTABLE_UTF8__ENABLE_AUTO_FILTER', 1);

Before version 5.4.26 this behavior was enabled by default and you could disable it via "PORTABLE_UTF8__DISABLE_AUTO_FILTER", but the code had potential security vulnerabilities via injecting code while redirecting via `header('Location ...`. This is the reason I decided to add this BC in a bug fix release, so that everybody using the current version will receive the security-fix.

Usage

Example 1: UTF8::cleanup()

  echo UTF8::cleanup('?Düsseldorf?');
  
  // will output:
  // Düsseldorf

Example 2: UTF8::strlen()

  $string = 'string <strong>with utf-8 chars åèä</strong> - doo-bee doo-bee dooh';

  echo strlen($string) . "\n<br />";
  echo UTF8::strlen($string) . "\n<br />";

  // will output:
  // 70
  // 67

  $string_test1 = strip_tags($string);
  $string_test2 = UTF8::strip_tags($string);

  echo strlen($string_test1) . "\n<br />";
  echo UTF8::strlen($string_test2) . "\n<br />";

  // will output:
  // 53
  // 50

Example 3: UTF8::fix_utf8()


  echo UTF8::fix_utf8('Düsseldorf');
  echo UTF8::fix_utf8('ä');
  
  // will output:
  // Düsseldorf
  // ä

Portable UTF-8 | API

The API from the "UTF8"-Class is written as small static methods that will match the default PHP-API.

Class methods

access(string $str, int $pos)

Return the character at the specified position: $str[1] like functionality.

UTF8::access('fòô', 1); // 'ò'

add_bom_to_string(string $str)

Prepends UTF-8 BOM character to the string and returns the whole string.

If BOM already existed there, the Input string is returned.

UTF8::add_bom_to_string('fòô'); // "\xEF\xBB\xBF" . 'fòô'

binary_to_str(mixed $bin)

Convert binary into a string.

opposite: UTF8::str_to_binary()

UTF8::binary_to_str('11110000100111111001100010000011'); // '?'

bom()

Returns the UTF-8 Byte Order Mark Character.

UTF8::bom(); // "\xEF\xBB\xBF"

chr(int $code_point) : string

Generates a UTF-8 encoded character from the given code point.

opposite: UTF8::ord()

UTF8::chr(0x2603); // '?'

chr_map(string|array $callback, string $str) : array

Applies callback to all characters of a string.

UTF8::chr_map(['voku\helper\UTF8', 'strtolower'], '?????'); // ['?','?', '?', '?', '?']

chr_size_list(string $str) : array

Generates a UTF-8 encoded character from the given code point.

1 byte => U+0000 - U+007F 2 byte => U+0080 - U+07FF 3 byte => U+0800 - U+FFFF 4 byte => U+10000 - U+10FFFF

UTF8::chr_size_list('????-test'); // [3, 3, 3, 3, 1, 1, 1, 1, 1]

chr_to_decimal(string $chr) : int

Get a decimal code representation of a specific character.

opposite: UTF8::decimal_to_chr()

alias: UTF8::chr_to_int()

UTF8::chr_to_decimal('§'); // 0xa7

chr_to_hex(string $chr, string $pfix = 'U+')

Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character.

UTF8::chr_to_hex('§'); // U+00a7

chunk_split(string $body, int $chunklen = 76, string $end = "\r\n") : string

Splits a string into smaller chunks and multiple lines, using the specified line ending character.

UTF8::chunk_split('ABC-ÖÄÜ-????-?????', 3); // "ABC\r\n-ÖÄ\r\nÜ-?\r\n???\r\n-??\r\n???"

clean(string $str, bool $remove_bom = false, bool $normalize_whitespace = false, bool $normalize_msword = false, bool $keep_non_breaking_space = false, bool $replace_diamond_question_mark = false, bool $remove_invisible_characters = true) : string

Accepts a string and removes all non-UTF-8 characters from it + extras if needed.

UTF8::clean("\xEF\xBB\xBF?Abcdef\xc2\xa0\x20?? ? ? - Düsseldorf", true, true); // '?Abcdef  ?? ? ? - Düsseldorf'

cleanup(string $str) : string

Clean-up a string and show only printable UTF-8 chars at the end + fix UTF-8 encoding.

UTF8::cleanup("\xEF\xBB\xBF?Abcdef\xc2\xa0\x20?? ? ? - Düsseldorf", true, true); // '?Abcdef  ?? ? ? - Düsseldorf'

codepoints(mixed $arg, bool $u_style = false) : array

Accepts a string and returns an array of Unicode code points.

opposite: UTF8::string()

UTF8::codepoints('?öñ'); // array(954, 246, 241)
// ... OR ...
UTF8::codepoints('?öñ', true); // array('U+03ba', 'U+00f6', 'U+00f1')

count_chars(string $str, bool $clean_utf8 = false) : array

Returns count of characters used in a string.

UTF8::count_chars('?a?b?c'); // array('?' => 3, 'a' => 1, 'b' => 1, 'c' => 1)

decimal_to_chr(mixed $int) : string

Converts an int value into a UTF-8 character.

opposite: UTF8::chr_to_decimal()

alias: UTF8::int_to_chr()

UTF8::decimal_to_chr(931); // '?'

emoji_decode(string $str, bool $use_reversible_string_mappings = false) : string

Decodes a string which was encoded by "UTF8::emoji_encode()".

UTF8::emoji_decode('foo CHARACTER_OGRE', false); // 'foo ?'
//
UTF8::emoji_encode('foo _-_PORTABLE_UTF8_-_308095726_-_627590803_-_8FTU_ELBATROP_-_', true); // 'foo ?'

emoji_encode(string $str, bool $use_reversible_string_mappings = false) : string

Encode a string with emoji chars into a non-emoji string.

UTF8::emoji_encode('foo ?', false); // 'foo CHARACTER_OGRE'
//
UTF8::emoji_encode('foo ?', true); // 'foo _-_PORTABLE_UTF8_-_308095726_-_627590803_-_8FTU_ELBATROP_-_'

encode(string $to_encoding, string $str, bool $auto_detect_the_from_encoding = true, string $from_encoding = ''): string

Encode a string with a new charset-encoding.

INFO: This function will also try to fix broken / double encoding,

  so you can call this function also on a UTF-8 string and you don't mess up the string.

UTF8::encode('ISO-8859-1', '-ABC-????-'); // '-ABC-????-'
//
UTF8::encode('UTF-8', '-ABC-????-'); // '-ABC-????-'
//
UTF8::encode('HTML', '-ABC-????-'); // '-ABC-&#20013;&#25991;&#31354;&#30333;-'
//
UTF8::encode('BASE64', '-ABC-????-'); // 'LUFCQy3kuK3mlofnqbrnmb0t'

file_get_contents(string $filename, int|null $flags = null, resource|null $context = null, int|null $offset = null, int|null $maxlen = null, int $timeout = 10, bool $convert_to_utf8 = true) : string

Reads entire file into a string.

WARNING: Do not use UTF-8 Option ($convert_to_utf8) for binary files (e.g.: images) !!!

UTF8::file_get_contents('utf16le.txt'); // ...

file_has_bom(string $file_path) : bool

Checks if a file starts with BOM (Byte Order Mark) character.

UTF8::file_has_bom('utf8_with_bom.txt'); // true

filter(mixed $var, int $normalization_form = 4, string $leading_combining = '?') : mixed

Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

UTF8::filter(array("\xE9", 'à', 'a')); // array('é', 'a?', 'a')

filter_input(int $type, string $var, int $filter = FILTER_DEFAULT, null|array $option = null) : string

"filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

// _GET['foo'] = 'bar';
UTF8::filter_input(INPUT_GET, 'foo', FILTER_SANITIZE_STRING)); // 'bar'

filter_input_array(int $type, mixed $definition = null, bool $add_empty = true) : mixed

"filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

// _GET['foo'] = 'bar';
UTF8::filter_input_array(INPUT_GET, array('foo' => 'FILTER_SANITIZE_STRING')); // array('bar')

filter_var(string $var, int $filter = FILTER_DEFAULT, array $option = null) : string

"filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

UTF8::filter_var('-ABC-????-', FILTER_VALIDATE_URL); // false

filter_var_array(array $data, mixed $definition = null, bool $add_empty = true) : mixed

"filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

$filters = [ 
  'name'  => ['filter'  => FILTER_CALLBACK, 'options' => ['voku\helper\UTF8', 'ucwords']],
  'age'   => ['filter'  => FILTER_VALIDATE_INT, 'options' => ['min_range' => 1, 'max_range' => 120]],
  'email' => FILTER_VALIDATE_EMAIL,
];

$data = [
  'name' => '?????', 
  'age' => '18', 
  'email' => 'foo@bar.de'
];

UTF8::filter_var_array($data, $filters, true); // ['name' => '?????', 'age' => 18, 'email' => 'foo@bar.de']

fits_inside(string $str, int $box_size) : bool

Check if the number of Unicode characters isn't greater than the specified integer.

UTF8::fits_inside('?????', 6); // false

fix_simple_utf8(string $str) : string

Try to fix simple broken UTF-8 strings.

INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings.

UTF8::fix_simple_utf8('Düsseldorf'); // 'Düsseldorf'

fix_utf8(string|string[] $str) : mixed

Fix a double (or multiple) encoded UTF8 string.

UTF8::fix_utf8('Fédération'); // 'Fédération'

getCharDirection(string $char) : string ('RTL' or 'LTR')

Get character of a specific character.

UTF8::getCharDirection('?'); // 'RTL'

hex_to_chr(string $hexdec) : string|false

Converts a hexadecimal value into a UTF-8 character.

opposite: UTF8::chr_to_hex()

UTF8::hex_to_chr('U+00a7'); // '§'

hex_to_int(string $hexdec) : int|false

Converts hexadecimal U+xxxx code point representation to integer.

opposite: UTF8::int_to_hex()

UTF8::hex_to_int('U+00f1'); // 241

html_encode(string $str, bool $keep_ascii_chars = false, string $encoding = 'UTF-8') : string

Converts a UTF-8 string to a series of HTML numbered entities.

opposite: UTF8::html_decode()

UTF8::html_encode('????'); // '&#20013;&#25991;&#31354;&#30333;'

html_entity_decode(string $str, int $flags = null, string $encoding = 'UTF-8') : string

UTF-8 version of html_entity_decode()

The reason we are not using html_entity_decode() by itself is because while it is not technically correct to leave out the semicolon at the end of an entity most browsers will still interpret the entity correctly. html_entity_decode() does not convert entities without semicolons, so we are left with our own little solution here. Bummer.

Convert all HTML entities to their applicable characters

opposite: UTF8::html_encode()

alias: UTF8::html_decode()

UTF8::html_entity_decode('&#20013;&#25991;&#31354;&#30333;'); // '????' 

htmlentities(string $str, int $flags = ENT_COMPAT, string $encoding = 'UTF-8', bool $double_encode = true) : string

Convert all applicable characters to HTML entities: UTF-8 version of htmlentities()

UTF8::htmlentities('<?-öäü>'); // '&lt;&#30333;-&ouml;&auml;&uuml;&gt;'

htmlspecialchars(string $str, int $flags = ENT_COMPAT, string $encoding = 'UTF-8', bool $double_encode = true) : string

Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars()

INFO: Take a look at "UTF8::htmlentities()"

UTF8::htmlspecialchars('<?-öäü>'); // '&lt;?-öäü&gt;'

int_to_hex(int $int, string $pfix = 'U+') : str

Converts Integer to hexadecimal U+xxxx code point representation.

opposite: UTF8::hex_to_int()

UTF8::int_to_hex(241); // 'U+00f1'

is_ascii(string $str) : bool

Checks if a string is 7 bit ASCII.

alias: UTF8::isAscii()

UTF8::is_ascii('?'); // false

is_base64(string $str) : bool

Returns true if the string is base64 encoded, false otherwise.

alias: UTF8::isBase64()

UTF8::is_base64('4KSu4KWL4KSo4KS/4KSa'); // true

is_binary(mixed $input, bool $strict = false) : bool

Check if the input is binary... (is look like a hack).

alias: UTF8::isBinary()

UTF8::is_binary(01); // true

is_binary_file(string $file) : bool

Check if the file is binary.

UTF8::is_binary('./utf32.txt'); // true

is_bom(string $str) : bool

Checks if the given string is equal to any "Byte Order Mark".

WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string.

alias: UTF8::isBom()

UTF8::is_bom("\xef\xbb\xbf"); // true

is_json(string $str) : bool

Try to check if "$str" is a JSON-string.

alias: UTF8::isJson()

UTF8::is_json('{"array":[1,"¥","ä"]}'); // true

is_html(string $str) : bool

Check if the string contains any HTML tags <lall>.

alias: UTF8::isHtml()

UTF8::is_html('<b>lall</b>'); // true

is_utf16(string $str) : int|false

Check if the string is UTF-16: This function will return false if it's not UTF-16, 1 for UTF-16LE, 2 for UTF-16BE.

alias: UTF8::isUtf16()

UTF8::is_utf16(file_get_contents('utf-16-le.txt')); // 1
UTF8::is_utf16(file_get_contents('utf-16-be.txt')); // 2
UTF8::is_utf16(file_get_contents('utf-8.txt')); // false

is_utf32(string $str) : int|false

Check if the string is UTF-32: This function will return false if it's not UTF-32, 1 for UTF-32LE, 2 for UTF-32BE.

alias: UTF8::isUtf16()

UTF8::is_utf32(file_get_contents('utf-32-le.txt')); // 1
UTF8::is_utf32(file_get_contents('utf-32-be.txt')); // 2
UTF8::is_utf32(file_get_contents('utf-8.txt')); // false

is_utf8(string $str, bool $strict = false) : bool

Checks whether the passed string contains only byte sequences that are valid UTF-8 characters.

alias: UTF8::isUtf8()

UTF8::is_utf8('Iñtërnâtiônàlizætiøn'); // true
UTF8::is_utf8("Iñtërnâtiônàlizætiøn\xA0\xA1"); // false

json_decode(string $json, bool $assoc = false, int $depth = 512, int $options = 0) : mixed

Decodes a JSON string.

UTF8::json_decode('[1,"\u00a5","\u00e4"]'); // array(1, '¥', 'ä')

json_encode(mixed $value, int $options = 0, int $depth = 512) : string

Returns the JSON representation of a value.

UTF8::json_enocde(array(1, '¥', 'ä')); // '[1,"\u00a5","\u00e4"]'

lcfirst(string $str) : string

Makes string's first char lowercase.

UTF8::lcfirst('ÑTËRNÂTIÔNÀLIZÆTIØN'); // ñTËRNÂTIÔNÀLIZÆTIØN 

max(mixed $arg) : string

Returns the UTF-8 character with the maximum code point in the given data.

UTF8::max('abc-äöü-????'); // 'ø'

max_chr_width(string $str) : int

Calculates and returns the maximum number of bytes taken by any UTF-8 encoded character in the given string.

UTF8::max_chr_width('Intërnâtiônàlizætiøn'); // 2

min(mixed $arg) : string

Returns the UTF-8 character with the minimum code point in the given data.

UTF8::min('abc-äöü-????'); // '-'

normalize_encoding(string $encoding) : string

Normalize the encoding-"name" input.

UTF8::normalize_encoding('UTF8'); // 'UTF-8'

normalize_msword(string $str) : string

Normalize some MS Word special characters.

UTF8::normalize_msword('?Abcdef??'); // '"Abcdef..."'

normalize_whitespace(string $str, bool $keep_non_breaking_space = false, bool $keep_bidi_unicode_controls = false) : string

Normalize the whitespace.

UTF8::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"

ord(string $chr) : int

Calculates Unicode code point of the given UTF-8 encoded character.

opposite: UTF8::chr()

UTF8::ord('?'); // 0x2603

parse_str(string $str, &$result, bool $clean_utf8 = false) : bool

Parses the string into an array (into the the second parameter).

WARNING: Unlike "parse_str()", this method does not (re-)place variables in the current scope,

      if the second parameter is not set!

UTF8::parse_str('Iñtërnâtiônéàlizætiøn=??&arr[]=foo+??&arr[]=?????????', $array);
echo $array['Iñtërnâtiônéàlizætiøn']; // '??'

range(mixed $var1, mixed $var2) : array

Create an array containing a range of UTF-8 characters.

UTF8::range('?', '?'); // array('?', '?', '?', '?', '?',)

remove_bom(string $str) : string

Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings.

UTF8::remove_bom("\xEF\xBB\xBF????? ??"); // '????? ??'

remove_duplicates(string $str, string|array $what = ' ') : string

Removes duplicate occurrences of a string in another string.

UTF8::remove_duplicates('öäü-??????????-äöü', '?????'); // 'öäü-?????-äöü'

remove_invisible_characters(string $str, bool $url_encoded = true, string $replacement = '') : string

Remove invisible characters from a string.

UTF8::remove_invisible_characters("???\0??"); // '?????'

replace_diamond_question_mark(string $str, string $replacement_char = '', bool $process_invalid_utf8 = true) : string

Replace the diamond question mark (?) and invalid-UTF8 chars with the replacement.

UTF8::replace_diamond_question_mark('?????', ''); // '????'

trim(string $str = '', string $chars = INF) : string

Strip whitespace or other characters from the beginning and end of a UTF-8 string.

UTF8::rtrim('   -ABC-????-  '); // '-ABC-????-'

rtrim(string $str = '', string $chars = INF) : string

Strip whitespace or other characters from the end of a UTF-8 string.

UTF8::rtrim('-ABC-????-  '); // '-ABC-????-'

ltrim(string $str, string $chars = INF) : string

Strip whitespace or other characters from the beginning of a UTF-8 string.

UTF8::ltrim('?????? '); // '????? '

single_chr_html_encode(string $char, bool $keep_ascii_chars = false) : string

Converts a UTF-8 character to HTML Numbered Entity like "&#123;".

UTF8::single_chr_html_encode('?'); // '&#954;'

split(string $str, int $length = 1, bool $clean_utf8 = false) : array

Convert a string to an array of Unicode characters.

UTF8::split('????'); // array('?', '?', '?', '?')

str_detect_encoding(string $str) : string

Optimized "\mb_detect_encoding()"-function -> with support for UTF-16 and UTF-32.

UTF8::str_detect_encoding('????'); // 'UTF-8'
UTF8::str_detect_encoding('Abc'); // 'ASCII'

str_ends_with(string $haystack, string $needle) : bool

Check if the string ends with the given substring.

UTF8::str_ends_with('BeginMiddle?????', '?????'); // true
UTF8::str_ends_with('BeginMiddle?????', '?????'); // false

str_iends_with(string $haystack, string $needle) : bool

Check if the string ends with the given substring, case-insensitive.

UTF8::str_iends_with('BeginMiddle?????', '?????'); // true
UTF8::str_iends_with('BeginMiddle?????', '?????'); // true

str_ireplace(mixed $search, mixed $replace, mixed $subject, int &$count = null) : mixed

Case-insensitive and UTF-8 safe version of <function>str_replace</function>.

UTF8::str_ireplace('lIzÆ', 'lise', array('Iñtërnâtiônàlizætiøn')); // array('Iñtërnâtiônàlisetiøn')

str_limit_after_word(string $str, int $length = 100, stirng $str_add_on = '...') : string

Limit the number of characters in a string, but also after the next word.

UTF8::str_limit_after_word('fòô bà? fòô', 8, ''); // 'fòô bà?'

str_pad(string $str, int $pad_length, string $pad_string = ' ', int $pad_type = STR_PAD_RIGHT) : string

Pad a UTF-8 string to a given length with another string.

UTF8::str_pad('????', 10, '_', STR_PAD_BOTH); // '___????___'

str_repeat(string $str, int $multiplier) : string

Repeat a string.

UTF8::str_repeat("°~\xf0\x90\x28\xbc", 2); // '°~ð(¼°~ð(¼'

str_shuffle(string $str) : string

Shuffles all the characters in the string.

UTF8::str_shuffle('fòô bà? fòô'); // 'àòô?b ffòô '

str_sort(string $str, bool $unique = false, bool $desc = false) : string

Sort all characters according to code points.

UTF8::str_sort('  -ABC-????-  '); // '    ---ABC????'

str_split(string $str, int $len = 1) : array

Split a string into an array.

UTF8::split('déjà', 2); // array('dé', 'jà')

str_starts_with(string $haystack, string $needle) : bool

Check if the string starts with the given substring.

UTF8::str_starts_with('?????MiddleEnd', '?????'); // true
UTF8::str_starts_with('?????MiddleEnd', '?????'); // false

str_istarts_with(string $haystack, string $needle) : bool

Check if the string starts with the given substring, case-insensitive.

UTF8::str_istarts_with('?????MiddleEnd', '?????'); // true
UTF8::str_iistarts_with('?????MiddleEnd', '?????'); // true

str_to_binary(string $str) : string

Get a binary representation of a specific string.

opposite: UTF8::binary_to_str()

UTF8::str_to_binary('?'); // '11110000100111111001100010000011'

str_word_count(string $str, int $format = 0, string $charlist = '') : string

Get the number of words in a specific string.

// format: 0 -> return only word count (int)
//
UTF8::str_word_count('???? öäü abc#c'); // 4
UTF8::str_word_count('???? öäü abc#c', 0, '#'); // 3

// format: 1 -> return words (array) 
//
UTF8::str_word_count('???? öäü abc#c', 1); // array('????', 'öäü', 'abc', 'c')
UTF8::str_word_count('???? öäü abc#c', 1, '#'); // array('????', 'öäü', 'abc#c')

// format: 2 -> return words with offset (array) 
//
UTF8::str_word_count('???? öäü ab#c', 2); // array(0 => '????', 5 => 'öäü', 9 => 'abc', 13 => 'c')
UTF8::str_word_count('???? öäü ab#c', 2, '#'); // array(0 => '????', 5 => 'öäü', 9 => 'abc#c')

strcmp(string $str1, string $str2) : int

Case-insensitive string comparison: < 0 if str1 is less than str2;

                                > 0 if str1 is greater than str2, 
                                0 if they are equal.

UTF8::strcmp("iñtërnâtiôn\nàlizætiøn", "iñtërnâtiôn\nàlizætiøn"); // 0

strnatcmp(string $str1, string $str2) : int

Case sensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2;

                                                                 > 0 if str1 is greater than str2, 
                                                                 0 if they are equal.

INFO: natural order version of UTF8::strcmp()

UTF8::strnatcmp('2Hello world ????!', '10Hello WORLD ????!'); // -1
UTF8::strcmp('2Hello world ????!', '10Hello WORLD ????!'); // 1

UTF8::strnatcmp('10Hello world ????!', '2Hello WORLD ????!'); // 1
UTF8::strcmp('10Hello world ????!', '2Hello WORLD ????!')); // -1

strcasecmp(string $str1, string $str2) : int

Case-insensitive string comparison: < 0 if str1 is less than str2;

                                > 0 if str1 is greater than str2, 
                                0 if they are equal.

INFO: Case-insensitive version of UTF8::strcmp()

UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn", "Iñtërnâtiôn\nàlizætiøn"); // 0

strnatcasecmp(string $str1, string $str2) : int

Case-insensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2;

                                                                   > 0 if str1 is greater than str2, 
                                                                   0 if they are equal.

INFO: natural order version of UTF8::strcasecmp()

UTF8::strnatcasecmp('2', '10Hello WORLD ????!'); // -1
UTF8::strcasecmp('2Hello world ????!', '10Hello WORLD ????!'); // 1
    
UTF8::strnatcasecmp('10Hello world ????!', '2Hello WORLD ????!'); // 1
UTF8::strcasecmp('10Hello world ????!', '2Hello WORLD ????!'); // -1

strncasecmp(string $str1, string $str2, int $len) : int

Case-insensitive string comparison of the first n characters.:

< 0 if str1 is less than str2; 
> 0 if str1 is greater than str2, 
0 if they are equal.

INFO: Case-insensitive version of UTF8::strncmp()

UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn321", "iñtërnâtiôn\nàlizætiøn123", 5); // 0

strncmp(string $str1, string $str2, int $len) : int

Case-sensitive string comparison of the first n characters.:

< 0 if str1 is less than str2; 
> 0 if str1 is greater than str2, 
0 if they are equal.

UTF8::strncmp("Iñtërnâtiôn\nàlizætiøn321", "Iñtërnâtiôn\nàlizætiøn123", 5); // 0

string(string $str1, string $str2) : int

Create a UTF-8 string from code points.

opposite: UTF8::codepoints()

UTF8::string(array(246, 228, 252)); // 'öäü'

string_has_bom(string $str) : bool

Checks if string starts with "BOM" (Byte Order Mark Character) character.

alias: UTF8::hasBom()

UTF8::string_has_bom("\xef\xbb\xbf foobar"); // true

strip_tags(string $str, sting|null $allowable_tags = null, bool $clean_utf8 = false) : string

Strip HTML and PHP tags from a string + clean invalid UTF-8.

UTF8::strip_tags("<span>?????\xa0\xa1</span>"); // '?????'

strip_whitespace(string $str)

Strip all whitespace characters. This includes tabs and newline characters, as well as multibyte whitespace such as the thin space and ideographic space.

UTF8::strip_whitespace('   ?     ??????????  '); // '???????????'

strlen(string $str, string $encoding = 'UTF-8', bool $clean_utf8 = false) : int

Get the string length, not the byte-length!

UTF8::strlen("Iñtërnâtiôn\xE9àlizætiøn")); // 20

strwidth(string $str, string $encoding = 'UTF-8', bool $clean_utf8 = false) : int

Return the width of a string.

UTF8::strwidth("Iñtërnâtiôn\xE9àlizætiøn")); // 21

strpbrk(string $haystack, string $char_list) : string

Search a string for any of a set of characters.

UTF8::strpbrk('-????-', '?'); // '?-'

strpos(string $haystack, string $needle, int $offset = 0, string $encoding = 'UTF-8', bool $clean_utf8 = false) : int|false

Find the position of the first occurrence of a substring in a string.

UTF8::strpos('ABC-ÖÄÜ-????-????', '?'); // 8

stripos(string $str, string $needle, int $offset = null, string $encoding = 'UTF-8', bool $clean_utf8 = false) : int|false

Find the position of the first occurrence of a substring in a string, case-insensitive.

UTF8::strpos('ABC-ÖÄÜ-????-????', '?'); // 8

strrpos(string $haystack, string $needle, int $offset = 0, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string|false

Find the position of the last occurrence of a substring in a string.

UTF8::strrpos('ABC-ÖÄÜ-????-????', '?'); // 13

strripos(string $haystack, string $needle, int $offset = 0, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string|false

Find the position of the last occurrence of a substring in a string, case-insensitive.

UTF8::strripos('ABC-ÖÄÜ-????-????', '?'); // 13

strrchr(string $haystack, string $needle, bool $part = false, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string|false

Find the last occurrence of a character in a string within another.

UTF8::strrchr('??????????-äöü', '?????'); // '?????-äöü'

strrichr(string $haystack, string $needle, bool $part = false, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string|false

Find the last occurrence of a character in a string within another, case-insensitive.

UTF8::strrichr('A??????????-äöü', 'a?????'); // 'A??????????-äöü'

strrev(string $str) : string

Reverses characters order in the string.

UTF8::strrev('?-öäü'); // 'üäö-?'

strspn(string $str, string $mask, int $offset = 0, int $length = 2147483647) : string

Finds the length of the initial segment of a string consisting entirely of characters contained within a given mask.

UTF8::strspn('iñtërnâtiônàlizætiøn', 'itñ'); // '3'

strstr(string $str, string $needle, bool $before_needle = false, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Returns part of haystack string from the first occurrence of needle to the end of haystack.

alias: UTF8::strchr()

$str = 'iñtërnâtiônàlizætiøn';
$search = 'nât';

UTF8::strstr($str, $search)); // 'nâtiônàlizætiøn'
UTF8::strstr($str, $search, true)); // 'iñtër'

stristr(string $str, string $needle, bool $before_needle = false, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Returns all of haystack starting from and including the first occurrence of needle to the end.

alias: UTF8::strichr()

$str = 'iñtërnâtiônàlizætiøn';
$search = 'NÂT';

UTF8::stristr($str, $search)); // 'nâtiônàlizætiøn'
UTF8::stristr($str, $search, true)); // 'iñtër'

strtocasefold(string $str, bool $full = true) : string

Unicode transformation for case-less matching.

UTF8::strtocasefold('???'); // 'j???'

strtolower(string $str, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Make a string lowercase.

UTF8::strtolower('DÉJÀ ??? I??i'); // 'déjà ??? i?ii'

strtoupper(string $str, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Make a string uppercase.

UTF8::strtoupper('Déjà ??? I??i'); // 'DÉJÀ ??? II?I'

strtr(string $str, string|array $from, string|array $to = INF) : string

Translate characters or replace sub-strings.

$arr = array(
    'Hello'   => '???',
    '????' => 'earth',
);
UTF8::strtr('Hello ????', $arr); // '??? earth'

str_to_words(string $str, string $charlist = '') : array

Convert a string (phrase, sentence, ...) into an array of words.

UTF8::str_to_words('???? oöäü#s', '#') // array('', '????', ' ', 'oöäü#s', '')

substr(string $str, int $start = 0, int $length = null, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Get part of a string.

UTF8::substr('????', 1, 2); // '??'

substr_compare(string $main_str, string $str, int $offset, int $length = 2147483647, bool $case_insensitivity = false) : int

Binary-safe comparison of two strings from an offset, up to a length of characters.

UTF8::substr_compare("???\r", '??', 0, 2); // -1
UTF8::substr_compare("???\r", '??', 1, 2); // 1
UTF8::substr_compare("???\r", '??', 1, 2); // 0

substr_count(string $haystack, string $needle, int $offset = 0, int $length = null, string $encoding = 'UTF-8', bool $clean_utf8 = false) : int|false

Count the number of substring occurrences.

UTF8::substr_count('????', '??', 1, 2); // 1

substr_left(string $haystack, string $needle) : string

Removes a prefix ($needle) from the beginning of the string ($haystack).

UTF8::substr_left('?????MiddleEnd', '?????'); // 'MiddleEnd'
UTF8::substr_left('?????MiddleEnd', '?????'); // '?????MiddleEnd'

substr_ileft(string $haystack, string $needle) : string

Removes a prefix ($needle) from the beginning of the string ($haystack), case-insensitive.

UTF8::substr_ileft('?????MiddleEnd', '?????'); // 'MiddleEnd'
UTF8::substr_ileft('?????MiddleEnd', '?????'); // 'MiddleEnd'

substr_right(string $haystack, string $needle) : string

Removes a suffix ($needle) from the end of the string ($haystack).

UTF8::substr_right('BeginMiddle?????', '?????'); // 'BeginMiddle'
UTF8::substr_right('BeginMiddle?????', '?????'); // 'BeginMiddle?????'

substr_iright(string $haystack, string $needle) : string

Removes a suffix ($needle) from the end of the string ($haystack), case-insensitive.

UTF8::substr_iright('BeginMiddle?????', '?????'); // 'BeginMiddle'
UTF8::substr_iright('BeginMiddle?????', '?????'); // 'BeginMiddle'

substr_replace(string|string[] $str, string|string[] $replacement, int|int[] $start, int|int[] $length = null) : string|array

Replace text within a portion of a string.

UTF8::substr_replace(array('Iñtërnâtiônàlizætiøn', 'foo'), 'æ', 1); // array('Iæñtërnâtiônàlizætiøn', 'fæoo')

swapCase(string $str, string string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Returns a case swapped version of the string.

UTF8::swapCase('déJÀ ??? i?II'); // 'DÉjà ??? IIii'

to_ascii(string $str, string $unknown = '?', bool $strict) : string

Convert a string into ASCII.

alias: UTF8::toAscii() alias: UTF8::str_transliterate()

UTF8::to_ascii('déjà ??? i?ii'); // 'deja sss iiii'

to_utf8(string|string[] $str, bool $decode_html_entity_to_utf8 = false) : string|string[]

This function leaves UTF-8 characters alone, while converting almost all non-UTF8 to UTF8.

  • It decode UTF-8 codepoints and Unicode escape sequences.
  • It assumes that the encoding of the original string is either WINDOWS-1252 or ISO-8859-1.
  • WARNING: It does not remove invalid UTF-8 characters, so you maybe need to use "UTF8::clean()" for this case.

alias: UTF8::toUtf8()

UTF8::to_utf8("\u0063\u0061\u0074"); // 'cat'

to_iso8859(string|string[] $str) : string|string[]

Convert a string into "ISO-8859"-encoding (Latin-1).

alias: UTF8::toIso8859() alias: UTF8::to_latin1() alias: UTF8::toLatin1()

UTF8::to_utf8(UTF8::to_latin1('  -ABC-????-  ')); // '  -ABC-????-  ' 

ucfirst(string $str, string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Makes string's first char uppercase.

alias: UTF8::ucword()

UTF8::ucfirst('ñtërnâtiônàlizætiøn'); // 'Ñtërnâtiônàlizætiøn'

ucwords(string $str, array $exceptions = array(), string $charlist = '', string $encoding = 'UTF-8', bool $clean_utf8 = false) : string

Uppercase for all words in the string.

UTF8::ucwords('iñt ërn âTi ônà liz æti øn'); // 'Iñt Ërn ÂTi Ônà Liz Æti Øn'

rawurldecode(string $str) : string

Multi decode HTML entity + fix urlencoded-win1252-chars.

UTF8::urldecode('tes%20öäü%20\u00edtest+test'); // 'tes öäü ítest+test'

urldecode(string $str) : string

Multi decode HTML entity + fix urlencoded-win1252-chars.

UTF8::urldecode('tes%20öäü%20\u00edtest+test'); // 'tes öäü ítest test'

utf8_decode(string $str) : string

Decodes a UTF-8 string to ISO-8859-1.

UTF8::encode('UTF-8', UTF8::utf8_decode('-ABC-????-')); // '-ABC-????-'

utf8_encode(string $str) : string

Encodes an ISO-8859-1 string to UTF-8.

UTF8::utf8_decode(UTF8::utf8_encode('-ABC-????-')); // '-ABC-????-'

words_limit(string $str, int $words = 100, string $str_add_on = '...') : string

Limit the number of words in a string.

UTF8::words_limit('fòô bà? fòô', 2, ''); // 'fòô bà?'

wordwrap(string $str, int $width = 75, string $break = "\n", bool $cut = false) : string

Wraps a string to a given number of characters

UTF8::wordwrap('Iñtërnâtiônàlizætiøn', 2, '<br>', true)); // 'Iñ<br>të<br>rn<br>ât<br>iô<br>nà<br>li<br>zæ<br>ti<br>øn'

Unit Test

1) Composer is a prerequisite for running the tests.

composer install

2) The tests can be executed by running this command from the root directory:

./vendor/bin/phpunit

Support

For support and donations please visit GitHub | Issues | PayPal | Patreon.

For status updates and release announcements please visit Releases | Twitter | Patreon.

For professional support please contact me.

Thanks

  • Thanks to GitHub (Microsoft) for hosting the code and a good infrastructure including Issues-Management, etc.
  • Thanks to IntelliJ as they make the best IDEs for PHP and they gave me an open source license for PhpStorm!
  • Thanks to Travis CI for being the most awesome, easiest continuous integration tool out there!
  • Thanks to StyleCI for the simple but powerful code style check.
  • Thanks to PHPStan && Psalm for really great Static analysis tools and for discovering bugs in the code!

License and Copyright

"Portable UTF8" is free software; you can redistribute it and/or modify it under the terms of the (at your option): - Apache License v2.0, or - GNU General Public License v2.0.

Unicode handling requires tedious work to be implemented and maintained on the long run. As such, contributions such as unit tests, bug reports, comments or patches licensed under both licenses are really welcomed.

FOSSA Status

  Files folder image Files  
File Role Description
Files folder image.github (4 files)
Files folder imagesrc (1 directory)
Files folder imagetests (47 files, 1 directory)
Accessible without login Plain text file .bookignore Data Auxiliary data
Accessible without login Plain text file .editorconfig Data Auxiliary data
Accessible without login Plain text file .scrutinizer.yml Data Auxiliary data
Accessible without login Plain text file .styleci.yml Data Auxiliary data
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file appveyor.yml Data Auxiliary data
Accessible without login Plain text file book.json Data Auxiliary data
Accessible without login Plain text file bootstrap.php Aux. Auxiliary script
Accessible without login Plain text file CHANGELOG.md Data Auxiliary data
Accessible without login Plain text file circle.yml Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE-APACHE Lic. License text
Accessible without login Plain text file package.json Data Auxiliary data
Accessible without login Plain text file phpcs.php_cs Example Example script
Accessible without login Plain text file phpstan.neon Data Auxiliary data
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file psalm.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation
Accessible without login Plain text file SECURITY.md Data Auxiliary data
Accessible without login Plain text file SUMMARY.md Data Auxiliary data

  Files folder image Files  /  .github  
File Role Description
  Accessible without login Plain text file CONTRIBUTING.md Data Auxiliary data
  Accessible without login Plain text file FUNDING.yml Data Auxiliary data
  Accessible without login Plain text file ISSUE_TEMPLATE.md Data Auxiliary data
  Accessible without login Plain text file PULL_REQUEST_TEMPLATE.md Data Auxiliary data

  Files folder image Files  /  src  
File Role Description
Files folder imagevoku (1 directory)

  Files folder image Files  /  src  /  voku  
File Role Description
Files folder imagehelper (2 files, 1 directory)

  Files folder image Files  /  src  /  voku  /  helper  
File Role Description
Files folder imagedata (8 files)
  Plain text file Bootup.php Class Class source
  Plain text file UTF8.php Class Class source

  Files folder image Files  /  src  /  voku  /  helper  /  data  
File Role Description
  Accessible without login Plain text file caseFolding_full.php Aux. Auxiliary script
  Accessible without login Plain text file chr.php Aux. Auxiliary script
  Accessible without login Plain text file emoji.php Aux. Auxiliary script
  Accessible without login Plain text file encodings.php Aux. Auxiliary script
  Accessible without login Plain text file ord.php Aux. Auxiliary script
  Accessible without login Plain text file transliterator_list.php Aux. Auxiliary script
  Accessible without login Plain text file utf8_fix.php Aux. Auxiliary script
  Accessible without login Plain text file win1252_to_utf8.php Aux. Auxiliary script

  Files folder image Files  /  tests  
File Role Description
Files folder imagefixtures (22 files)
  Accessible without login Plain text file bootstrap.php Aux. Auxiliary script
  Plain text file BootupTest.php Class Class source
  Plain text file HhvmTest.php Class Class source
  Plain text file ShimIconvTest.php Class Class source
  Plain text file ShimIntlTest.php Class Class source
  Plain text file ShimMbstringTest.php Class Class source
  Plain text file ShimXmlTest.php Class Class source
  Plain text file Utf8AccessTest.php Class Class source
  Plain text file Utf8AsciiTest.php Class Class source
  Plain text file Utf8CodePointsTest.php Class Class source
  Plain text file Utf8CompliantTest.php Class Class source
  Plain text file Utf8GlobalNonStrictPart1Test.php Class Class source
  Plain text file Utf8GlobalNonStrictPart2Test.php Class Class source
  Plain text file Utf8GlobalNonStrictPart3Test.php Class Class source
  Plain text file Utf8GlobalPart1Test.php Class Class source
  Plain text file Utf8GlobalPart2Test.php Class Class source
  Plain text file Utf8GlobalPart3Test.php Class Class source
  Plain text file Utf8HtmlEncode.php Class Class source
  Plain text file Utf8IsUtf8Test.php Class Class source
  Plain text file Utf8IsValidTest.php Class Class source
  Plain text file Utf8LcfirstTest.php Class Class source
  Plain text file Utf8LtrimTest.php Class Class source
  Plain text file Utf8OrdTest.php Class Class source
  Plain text file Utf8RtrimTest.php Class Class source
  Plain text file Utf8StrcasecmpTest.php Class Class source
  Plain text file Utf8StrcspnTest.php Class Class source
  Plain text file Utf8StrIreplaceTest.php Class Class source
  Plain text file Utf8StristrTest.php Class Class source
  Plain text file Utf8StrlenTest.php Class Class source
  Plain text file Utf8StrPadTest.php Class Class source
  Plain text file Utf8StrReplaceTest.php Class Class source
  Plain text file Utf8StrrevTest.php Class Class source
  Plain text file Utf8StrriposTest.php Class Class source
  Plain text file Utf8StrrposTest.php Class Class source
  Plain text file Utf8StrSplitTest.php Class Class source
  Plain text file Utf8StrspnTest.php Class Class source
  Plain text file Utf8StrToUpperTest.php Class Class source
  Plain text file Utf8StrTransliterateTest.php Class Class source
  Plain text file Utf8StrtToLowerTest.php Class Class source
  Plain text file Utf8StrWordwrapTest.php Class Class source
  Plain text file Utf8SubstrReplaceTest.php Class Class source
  Plain text file Utf8SubstrTest.php Class Class source
  Plain text file Utf8ToAsciiTest.php Class Class source
  Plain text file Utf8TrimTest.php Class Class source
  Plain text file Utf8UcfirstTest.php Class Class source
  Plain text file Utf8UcwordsTest.php Class Class source
  Plain text file ZNormalizationTest.php Class Class source

  Files folder image Files  /  tests  /  fixtures  
File Role Description
  Accessible without login Plain text file broken_import.csv Data Auxiliary data
  Accessible without login Image file image.png Data Auxiliary data
  Accessible without login Image file image_small.png Data Auxiliary data
  Accessible without login Plain text file iso-8859-7.txt Doc. Documentation
  Accessible without login Plain text file latin.txt Doc. Documentation
  Accessible without login Plain text file sample-ascii-chart.txt Doc. Documentation
  Accessible without login Plain text file sample-html.txt Doc. Documentation
  Accessible without login Plain text file sample-unicode-chart.txt Doc. Documentation
  Accessible without login Plain text file sample-utf-16-be-bom-only.txt Doc. Documentation
  Accessible without login Plain text file sample-utf-16-le-bom-only.txt Doc. Documentation
  Accessible without login Plain text file sample-utf-8-bom-only.txt Doc. Documentation
  Accessible without login Plain text file sample-utf-8-bom.txt Doc. Documentation
  Accessible without login HTML file sample-win1252.html Doc. Documentation
  Accessible without login Plain text file test.js Data Auxiliary data
  Accessible without login Plain text file test.pdf Data Auxiliary data
  Accessible without login Plain text file utf-8-bom.txt Doc. Documentation
  Accessible without login Plain text file utf-8-extra.txt Doc. Documentation
  Accessible without login Plain text file utf-8.txt Doc. Documentation
  Accessible without login Plain text file ZNormalizationTest.50.txt Doc. Documentation
  Accessible without login Plain text file ZNormalizationTest.63.txt Doc. Documentation
  Accessible without login Plain text file ZNormalizationTest.70.txt Doc. Documentation
  Accessible without login Plain text file ZNormalizationTest.80.txt Doc. Documentation

 Version Control Reuses Unique User Downloads Download Rankings  
 100%5
Total:569
This week:1
All time:5,288
This week:368Up
User Ratings User Comments (2)
 All time
Utility:100%StarStarStarStarStarStar
Consistency:92%StarStarStarStarStar
Documentation:85%StarStarStarStarStar
Examples:-
Tests:-
Videos:-
Overall:67%StarStarStarStar
Rank:595
 
nice
2 years ago (muabshir)
70%StarStarStarStar
There is a huge amount of work behind this package, which mak...
3 years ago (Christian Vigh)
65%StarStarStarStar