Download .zip |
Info | Documentation | View files (103) | Download .zip | Reputation | Support forum (2) | Blog (1) | Links |
Last Updated | Ratings | Unique User Downloads | Download Rankings | |||||
2020-02-17 (Less than 1 hour ago) | 67% | Total: 569 This week: 1 | All time: 5,288 This week: 368 |
Version | License | PHP version | Categories | |||
portable-utf8 3.0.106 | Custom (specified... | 5.3 | PHP 5, Text processing |
Collaborate with this project | Author | |||
portable-utf8 - github.com Description This package can manipulate UTF-8 text strings in pure PHP. Recommendations What is the best PHP count words class? Innovation Award
|
|
It is written in PHP (PHP 7+) and can work without "mbstring", "iconv" or any other extra encoding php-extension on your server.
The benefit of Portable UTF-8 is that it is easy to use, easy to bundle. This library will also auto-detect your server environment and will use the installed php-extensions if they are available, so you will have the best possible performance.
As a fallback we will use Symfony Polyfills, if needed. (https://github.com/symfony/polyfill)
The project based on ... + Hamid Sarfraz's work - portable-utf8 + Nicolas Grekas's work - tchwork/utf8 + Behat's work - Behat/Transliterator + Sebastián Grignoli's work - neitanod/forceutf8 + Ivan Enderlin's work - hoaproject/Ustring + and many cherry-picks from "GitHub"-gists and "Stack Overflow"-snippets ...
Here you can test some basic functions from this library and you can compare some results with the native php function results.
If you like a more Object Oriented Way to edit strings, then you can take a look at voku/Stringy, it's a fork of "danielstjules/Stringy" but it used the "Portable UTF-8"-Class and some extra methods.
// Standard library
strtoupper('fòôbà?'); // 'FòôBà?'
strlen('fòôbà?'); // 10
// mbstring
// WARNING: if you don't use a polyfill like "Portable UTF-8", you need to install the php-extension "mbstring" on your server
mb_strtoupper('fòôbà?'); // 'FÒÔBÀ?'
mb_strlen('fòôbà?'); // '6'
// Portable UTF-8
use voku\helper\UTF8;
UTF8::strtoupper('fòôbà?'); // 'FÒÔBÀ?'
UTF8::strlen('fòôbà?'); // '6'
// voku/Stringy
use Stringy\Stringy as S;
$stringy = S::create('fòôbà?');
$stringy->toUpperCase(); // 'FÒÔBÀ?'
$stringy->length(); // '6'
composer require voku/portable-utf8
If your project do not need some of the Symfony polyfills please use the replace
section of your composer.json
.
This removes any overhead from these polyfills as they are no longer part of your project. e.g.:
{
"replace": {
"symfony/polyfill-php72": "1.99",
"symfony/polyfill-iconv": "1.99",
"symfony/polyfill-intl-grapheme": "1.99",
"symfony/polyfill-intl-normalizer": "1.99",
"symfony/polyfill-mbstring": "1.99"
}
}
PHP 5 and earlier versions have no native Unicode support. To bridge the gap, there exist several extensions like "mbstring", "iconv" and "intl".
The problem with "mbstring" and others is that most of the time you cannot ensure presence of a specific one on a server. If you rely on one of these, your application is no more portable. This problem gets even severe for open source applications that have to run on different servers with different configurations. Considering these, I decided to write a library:
Since version 5.4.26 this library will NOT force "UTF-8" by "bootstrap.php" anymore. If you need to enable this behavior you can define "PORTABLE_UTF8__ENABLE_AUTO_FILTER", before requiring the autoloader.
define('PORTABLE_UTF8__ENABLE_AUTO_FILTER', 1);
Before version 5.4.26 this behavior was enabled by default and you could disable it via "PORTABLE_UTF8__DISABLE_AUTO_FILTER",
but the code had potential security vulnerabilities via injecting code while redirecting via `
header('Location ...`
.
This is the reason I decided to add this BC in a bug fix release, so that everybody using the current version will receive the security-fix.
Example 1: UTF8::cleanup()
echo UTF8::cleanup('?Düsseldorf?');
// will output:
// Düsseldorf
Example 2: UTF8::strlen()
$string = 'string <strong>with utf-8 chars åèä</strong> - doo-bee doo-bee dooh';
echo strlen($string) . "\n<br />";
echo UTF8::strlen($string) . "\n<br />";
// will output:
// 70
// 67
$string_test1 = strip_tags($string);
$string_test2 = UTF8::strip_tags($string);
echo strlen($string_test1) . "\n<br />";
echo UTF8::strlen($string_test2) . "\n<br />";
// will output:
// 53
// 50
Example 3: UTF8::fix_utf8()
echo UTF8::fix_utf8('Düsseldorf');
echo UTF8::fix_utf8('ä');
// will output:
// Düsseldorf
// ä
The API from the "UTF8"-Class is written as small static methods that will match the default PHP-API.
Return the character at the specified position: $str[1] like functionality.
UTF8::access('fòô', 1); // 'ò'
Prepends UTF-8 BOM character to the string and returns the whole string.
If BOM already existed there, the Input string is returned.
UTF8::add_bom_to_string('fòô'); // "\xEF\xBB\xBF" . 'fòô'
Convert binary into a string.
opposite: UTF8::str_to_binary()
UTF8::binary_to_str('11110000100111111001100010000011'); // '?'
Returns the UTF-8 Byte Order Mark Character.
UTF8::bom(); // "\xEF\xBB\xBF"
Generates a UTF-8 encoded character from the given code point.
opposite: UTF8::ord()
UTF8::chr(0x2603); // '?'
Applies callback to all characters of a string.
UTF8::chr_map(['voku\helper\UTF8', 'strtolower'], '?????'); // ['?','?', '?', '?', '?']
Generates a UTF-8 encoded character from the given code point.
1 byte => U+0000 - U+007F 2 byte => U+0080 - U+07FF 3 byte => U+0800 - U+FFFF 4 byte => U+10000 - U+10FFFF
UTF8::chr_size_list('????-test'); // [3, 3, 3, 3, 1, 1, 1, 1, 1]
Get a decimal code representation of a specific character.
opposite: UTF8::decimal_to_chr()
alias: UTF8::chr_to_int()
UTF8::chr_to_decimal('§'); // 0xa7
Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character.
UTF8::chr_to_hex('§'); // U+00a7
Splits a string into smaller chunks and multiple lines, using the specified line ending character.
UTF8::chunk_split('ABC-ÖÄÜ-????-?????', 3); // "ABC\r\n-ÖÄ\r\nÜ-?\r\n???\r\n-??\r\n???"
Accepts a string and removes all non-UTF-8 characters from it + extras if needed.
UTF8::clean("\xEF\xBB\xBF?Abcdef\xc2\xa0\x20?? ? ? - Düsseldorf", true, true); // '?Abcdef ?? ? ? - Düsseldorf'
Clean-up a string and show only printable UTF-8 chars at the end + fix UTF-8 encoding.
UTF8::cleanup("\xEF\xBB\xBF?Abcdef\xc2\xa0\x20?? ? ? - Düsseldorf", true, true); // '?Abcdef ?? ? ? - Düsseldorf'
Accepts a string and returns an array of Unicode code points.
opposite: UTF8::string()
UTF8::codepoints('?öñ'); // array(954, 246, 241)
// ... OR ...
UTF8::codepoints('?öñ', true); // array('U+03ba', 'U+00f6', 'U+00f1')
Returns count of characters used in a string.
UTF8::count_chars('?a?b?c'); // array('?' => 3, 'a' => 1, 'b' => 1, 'c' => 1)
Converts an int value into a UTF-8 character.
opposite: UTF8::chr_to_decimal()
alias: UTF8::int_to_chr()
UTF8::decimal_to_chr(931); // '?'
Decodes a string which was encoded by "UTF8::emoji_encode()".
UTF8::emoji_decode('foo CHARACTER_OGRE', false); // 'foo ?'
//
UTF8::emoji_encode('foo _-_PORTABLE_UTF8_-_308095726_-_627590803_-_8FTU_ELBATROP_-_', true); // 'foo ?'
Encode a string with emoji chars into a non-emoji string.
UTF8::emoji_encode('foo ?', false); // 'foo CHARACTER_OGRE'
//
UTF8::emoji_encode('foo ?', true); // 'foo _-_PORTABLE_UTF8_-_308095726_-_627590803_-_8FTU_ELBATROP_-_'
Encode a string with a new charset-encoding.
INFO: This function will also try to fix broken / double encoding,
so you can call this function also on a UTF-8 string and you don't mess up the string.
UTF8::encode('ISO-8859-1', '-ABC-????-'); // '-ABC-????-'
//
UTF8::encode('UTF-8', '-ABC-????-'); // '-ABC-????-'
//
UTF8::encode('HTML', '-ABC-????-'); // '-ABC-中文空白-'
//
UTF8::encode('BASE64', '-ABC-????-'); // 'LUFCQy3kuK3mlofnqbrnmb0t'
Reads entire file into a string.
WARNING: Do not use UTF-8 Option ($convert_to_utf8) for binary files (e.g.: images) !!!
UTF8::file_get_contents('utf16le.txt'); // ...
Checks if a file starts with BOM (Byte Order Mark) character.
UTF8::file_has_bom('utf8_with_bom.txt'); // true
Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
UTF8::filter(array("\xE9", 'à', 'a')); // array('é', 'a?', 'a')
"filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
// _GET['foo'] = 'bar';
UTF8::filter_input(INPUT_GET, 'foo', FILTER_SANITIZE_STRING)); // 'bar'
"filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
// _GET['foo'] = 'bar';
UTF8::filter_input_array(INPUT_GET, array('foo' => 'FILTER_SANITIZE_STRING')); // array('bar')
"filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
UTF8::filter_var('-ABC-????-', FILTER_VALIDATE_URL); // false
"filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
$filters = [
'name' => ['filter' => FILTER_CALLBACK, 'options' => ['voku\helper\UTF8', 'ucwords']],
'age' => ['filter' => FILTER_VALIDATE_INT, 'options' => ['min_range' => 1, 'max_range' => 120]],
'email' => FILTER_VALIDATE_EMAIL,
];
$data = [
'name' => '?????',
'age' => '18',
'email' => 'foo@bar.de'
];
UTF8::filter_var_array($data, $filters, true); // ['name' => '?????', 'age' => 18, 'email' => 'foo@bar.de']
Check if the number of Unicode characters isn't greater than the specified integer.
UTF8::fits_inside('?????', 6); // false
Try to fix simple broken UTF-8 strings.
INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings.
UTF8::fix_simple_utf8('Düsseldorf'); // 'Düsseldorf'
Fix a double (or multiple) encoded UTF8 string.
UTF8::fix_utf8('Fédération'); // 'Fédération'
Get character of a specific character.
UTF8::getCharDirection('?'); // 'RTL'
Converts a hexadecimal value into a UTF-8 character.
opposite: UTF8::chr_to_hex()
UTF8::hex_to_chr('U+00a7'); // '§'
Converts hexadecimal U+xxxx code point representation to integer.
opposite: UTF8::int_to_hex()
UTF8::hex_to_int('U+00f1'); // 241
Converts a UTF-8 string to a series of HTML numbered entities.
opposite: UTF8::html_decode()
UTF8::html_encode('????'); // '中文空白'
UTF-8 version of html_entity_decode()
The reason we are not using html_entity_decode() by itself is because while it is not technically correct to leave out the semicolon at the end of an entity most browsers will still interpret the entity correctly. html_entity_decode() does not convert entities without semicolons, so we are left with our own little solution here. Bummer.
Convert all HTML entities to their applicable characters
opposite: UTF8::html_encode()
alias: UTF8::html_decode()
UTF8::html_entity_decode('中文空白'); // '????'
Convert all applicable characters to HTML entities: UTF-8 version of htmlentities()
UTF8::htmlentities('<?-öäü>'); // '<白-öäü>'
Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars()
INFO: Take a look at "UTF8::htmlentities()"
UTF8::htmlspecialchars('<?-öäü>'); // '<?-öäü>'
Converts Integer to hexadecimal U+xxxx code point representation.
opposite: UTF8::hex_to_int()
UTF8::int_to_hex(241); // 'U+00f1'
Checks if a string is 7 bit ASCII.
alias: UTF8::isAscii()
UTF8::is_ascii('?'); // false
Returns true if the string is base64 encoded, false otherwise.
alias: UTF8::isBase64()
UTF8::is_base64('4KSu4KWL4KSo4KS/4KSa'); // true
Check if the input is binary... (is look like a hack).
alias: UTF8::isBinary()
UTF8::is_binary(01); // true
Check if the file is binary.
UTF8::is_binary('./utf32.txt'); // true
Checks if the given string is equal to any "Byte Order Mark".
WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string.
alias: UTF8::isBom()
UTF8::is_bom("\xef\xbb\xbf"); // true
Try to check if "$str" is a JSON-string.
alias: UTF8::isJson()
UTF8::is_json('{"array":[1,"¥","ä"]}'); // true
Check if the string contains any HTML tags <lall>.
alias: UTF8::isHtml()
UTF8::is_html('<b>lall</b>'); // true
Check if the string is UTF-16: This function will return false if it's not UTF-16, 1 for UTF-16LE, 2 for UTF-16BE.
alias: UTF8::isUtf16()
UTF8::is_utf16(file_get_contents('utf-16-le.txt')); // 1
UTF8::is_utf16(file_get_contents('utf-16-be.txt')); // 2
UTF8::is_utf16(file_get_contents('utf-8.txt')); // false
Check if the string is UTF-32: This function will return false if it's not UTF-32, 1 for UTF-32LE, 2 for UTF-32BE.
alias: UTF8::isUtf16()
UTF8::is_utf32(file_get_contents('utf-32-le.txt')); // 1
UTF8::is_utf32(file_get_contents('utf-32-be.txt')); // 2
UTF8::is_utf32(file_get_contents('utf-8.txt')); // false
Checks whether the passed string contains only byte sequences that are valid UTF-8 characters.
alias: UTF8::isUtf8()
UTF8::is_utf8('Iñtërnâtiônàlizætiøn'); // true
UTF8::is_utf8("Iñtërnâtiônàlizætiøn\xA0\xA1"); // false
Decodes a JSON string.
UTF8::json_decode('[1,"\u00a5","\u00e4"]'); // array(1, '¥', 'ä')
Returns the JSON representation of a value.
UTF8::json_enocde(array(1, '¥', 'ä')); // '[1,"\u00a5","\u00e4"]'
Makes string's first char lowercase.
UTF8::lcfirst('ÑTËRNÂTIÔNÀLIZÆTIØN'); // ñTËRNÂTIÔNÀLIZÆTIØN
Returns the UTF-8 character with the maximum code point in the given data.
UTF8::max('abc-äöü-????'); // 'ø'
Calculates and returns the maximum number of bytes taken by any UTF-8 encoded character in the given string.
UTF8::max_chr_width('Intërnâtiônàlizætiøn'); // 2
Returns the UTF-8 character with the minimum code point in the given data.
UTF8::min('abc-äöü-????'); // '-'
Normalize the encoding-"name" input.
UTF8::normalize_encoding('UTF8'); // 'UTF-8'
Normalize some MS Word special characters.
UTF8::normalize_msword('?Abcdef??'); // '"Abcdef..."'
Normalize the whitespace.
UTF8::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"
Calculates Unicode code point of the given UTF-8 encoded character.
opposite: UTF8::chr()
UTF8::ord('?'); // 0x2603
Parses the string into an array (into the the second parameter).
WARNING: Unlike "parse_str()", this method does not (re-)place variables in the current scope,
if the second parameter is not set!
UTF8::parse_str('Iñtërnâtiônéàlizætiøn=??&arr[]=foo+??&arr[]=?????????', $array);
echo $array['Iñtërnâtiônéàlizætiøn']; // '??'
Create an array containing a range of UTF-8 characters.
UTF8::range('?', '?'); // array('?', '?', '?', '?', '?',)
Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings.
UTF8::remove_bom("\xEF\xBB\xBF????? ??"); // '????? ??'
Removes duplicate occurrences of a string in another string.
UTF8::remove_duplicates('öäü-??????????-äöü', '?????'); // 'öäü-?????-äöü'
Remove invisible characters from a string.
UTF8::remove_invisible_characters("???\0??"); // '?????'
Replace the diamond question mark (?) and invalid-UTF8 chars with the replacement.
UTF8::replace_diamond_question_mark('?????', ''); // '????'
Strip whitespace or other characters from the beginning and end of a UTF-8 string.
UTF8::rtrim(' -ABC-????- '); // '-ABC-????-'
Strip whitespace or other characters from the end of a UTF-8 string.
UTF8::rtrim('-ABC-????- '); // '-ABC-????-'
Strip whitespace or other characters from the beginning of a UTF-8 string.
UTF8::ltrim('?????? '); // '????? '
Converts a UTF-8 character to HTML Numbered Entity like "{".
UTF8::single_chr_html_encode('?'); // 'κ'
Convert a string to an array of Unicode characters.
UTF8::split('????'); // array('?', '?', '?', '?')
Optimized "\mb_detect_encoding()"-function -> with support for UTF-16 and UTF-32.
UTF8::str_detect_encoding('????'); // 'UTF-8'
UTF8::str_detect_encoding('Abc'); // 'ASCII'
Check if the string ends with the given substring.
UTF8::str_ends_with('BeginMiddle?????', '?????'); // true
UTF8::str_ends_with('BeginMiddle?????', '?????'); // false
Check if the string ends with the given substring, case-insensitive.
UTF8::str_iends_with('BeginMiddle?????', '?????'); // true
UTF8::str_iends_with('BeginMiddle?????', '?????'); // true
Case-insensitive and UTF-8 safe version of <function>str_replace</function>.
UTF8::str_ireplace('lIzÆ', 'lise', array('Iñtërnâtiônàlizætiøn')); // array('Iñtërnâtiônàlisetiøn')
Limit the number of characters in a string, but also after the next word.
UTF8::str_limit_after_word('fòô bà? fòô', 8, ''); // 'fòô bà?'
Pad a UTF-8 string to a given length with another string.
UTF8::str_pad('????', 10, '_', STR_PAD_BOTH); // '___????___'
Repeat a string.
UTF8::str_repeat("°~\xf0\x90\x28\xbc", 2); // '°~ð(¼°~ð(¼'
Shuffles all the characters in the string.
UTF8::str_shuffle('fòô bà? fòô'); // 'àòô?b ffòô '
Sort all characters according to code points.
UTF8::str_sort(' -ABC-????- '); // ' ---ABC????'
Split a string into an array.
UTF8::split('déjà', 2); // array('dé', 'jà')
Check if the string starts with the given substring.
UTF8::str_starts_with('?????MiddleEnd', '?????'); // true
UTF8::str_starts_with('?????MiddleEnd', '?????'); // false
Check if the string starts with the given substring, case-insensitive.
UTF8::str_istarts_with('?????MiddleEnd', '?????'); // true
UTF8::str_iistarts_with('?????MiddleEnd', '?????'); // true
Get a binary representation of a specific string.
opposite: UTF8::binary_to_str()
UTF8::str_to_binary('?'); // '11110000100111111001100010000011'
Get the number of words in a specific string.
// format: 0 -> return only word count (int)
//
UTF8::str_word_count('???? öäü abc#c'); // 4
UTF8::str_word_count('???? öäü abc#c', 0, '#'); // 3
// format: 1 -> return words (array)
//
UTF8::str_word_count('???? öäü abc#c', 1); // array('????', 'öäü', 'abc', 'c')
UTF8::str_word_count('???? öäü abc#c', 1, '#'); // array('????', 'öäü', 'abc#c')
// format: 2 -> return words with offset (array)
//
UTF8::str_word_count('???? öäü ab#c', 2); // array(0 => '????', 5 => 'öäü', 9 => 'abc', 13 => 'c')
UTF8::str_word_count('???? öäü ab#c', 2, '#'); // array(0 => '????', 5 => 'öäü', 9 => 'abc#c')
Case-insensitive string comparison: < 0 if str1 is less than str2;
> 0 if str1 is greater than str2,
0 if they are equal.
UTF8::strcmp("iñtërnâtiôn\nàlizætiøn", "iñtërnâtiôn\nàlizætiøn"); // 0
Case sensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2;
> 0 if str1 is greater than str2,
0 if they are equal.
INFO: natural order version of UTF8::strcmp()
UTF8::strnatcmp('2Hello world ????!', '10Hello WORLD ????!'); // -1
UTF8::strcmp('2Hello world ????!', '10Hello WORLD ????!'); // 1
UTF8::strnatcmp('10Hello world ????!', '2Hello WORLD ????!'); // 1
UTF8::strcmp('10Hello world ????!', '2Hello WORLD ????!')); // -1
Case-insensitive string comparison: < 0 if str1 is less than str2;
> 0 if str1 is greater than str2,
0 if they are equal.
INFO: Case-insensitive version of UTF8::strcmp()
UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn", "Iñtërnâtiôn\nàlizætiøn"); // 0
Case-insensitive string comparisons using a "natural order" algorithm: < 0 if str1 is less than str2;
> 0 if str1 is greater than str2,
0 if they are equal.
INFO: natural order version of UTF8::strcasecmp()
UTF8::strnatcasecmp('2', '10Hello WORLD ????!'); // -1
UTF8::strcasecmp('2Hello world ????!', '10Hello WORLD ????!'); // 1
UTF8::strnatcasecmp('10Hello world ????!', '2Hello WORLD ????!'); // 1
UTF8::strcasecmp('10Hello world ????!', '2Hello WORLD ????!'); // -1
Case-insensitive string comparison of the first n characters.:
< 0 if str1 is less than str2;
> 0 if str1 is greater than str2,
0 if they are equal.
INFO: Case-insensitive version of UTF8::strncmp()
UTF8::strcasecmp("iñtërnâtiôn\nàlizætiøn321", "iñtërnâtiôn\nàlizætiøn123", 5); // 0
Case-sensitive string comparison of the first n characters.:
< 0 if str1 is less than str2;
> 0 if str1 is greater than str2,
0 if they are equal.
UTF8::strncmp("Iñtërnâtiôn\nàlizætiøn321", "Iñtërnâtiôn\nàlizætiøn123", 5); // 0
Create a UTF-8 string from code points.
opposite: UTF8::codepoints()
UTF8::string(array(246, 228, 252)); // 'öäü'
Checks if string starts with "BOM" (Byte Order Mark Character) character.
alias: UTF8::hasBom()
UTF8::string_has_bom("\xef\xbb\xbf foobar"); // true
Strip HTML and PHP tags from a string + clean invalid UTF-8.
UTF8::strip_tags("<span>?????\xa0\xa1</span>"); // '?????'
Strip all whitespace characters. This includes tabs and newline characters, as well as multibyte whitespace such as the thin space and ideographic space.
UTF8::strip_whitespace(' ? ?????????? '); // '???????????'
Get the string length, not the byte-length!
UTF8::strlen("Iñtërnâtiôn\xE9àlizætiøn")); // 20
Return the width of a string.
UTF8::strwidth("Iñtërnâtiôn\xE9àlizætiøn")); // 21
Search a string for any of a set of characters.
UTF8::strpbrk('-????-', '?'); // '?-'
Find the position of the first occurrence of a substring in a string.
UTF8::strpos('ABC-ÖÄÜ-????-????', '?'); // 8
Find the position of the first occurrence of a substring in a string, case-insensitive.
UTF8::strpos('ABC-ÖÄÜ-????-????', '?'); // 8
Find the position of the last occurrence of a substring in a string.
UTF8::strrpos('ABC-ÖÄÜ-????-????', '?'); // 13
Find the position of the last occurrence of a substring in a string, case-insensitive.
UTF8::strripos('ABC-ÖÄÜ-????-????', '?'); // 13
Find the last occurrence of a character in a string within another.
UTF8::strrchr('??????????-äöü', '?????'); // '?????-äöü'
Find the last occurrence of a character in a string within another, case-insensitive.
UTF8::strrichr('A??????????-äöü', 'a?????'); // 'A??????????-äöü'
Reverses characters order in the string.
UTF8::strrev('?-öäü'); // 'üäö-?'
Finds the length of the initial segment of a string consisting entirely of characters contained within a given mask.
UTF8::strspn('iñtërnâtiônàlizætiøn', 'itñ'); // '3'
Returns part of haystack string from the first occurrence of needle to the end of haystack.
alias: UTF8::strchr()
$str = 'iñtërnâtiônàlizætiøn';
$search = 'nât';
UTF8::strstr($str, $search)); // 'nâtiônàlizætiøn'
UTF8::strstr($str, $search, true)); // 'iñtër'
Returns all of haystack starting from and including the first occurrence of needle to the end.
alias: UTF8::strichr()
$str = 'iñtërnâtiônàlizætiøn';
$search = 'NÂT';
UTF8::stristr($str, $search)); // 'nâtiônàlizætiøn'
UTF8::stristr($str, $search, true)); // 'iñtër'
Unicode transformation for case-less matching.
UTF8::strtocasefold('???'); // 'j???'
Make a string lowercase.
UTF8::strtolower('DÉJÀ ??? I??i'); // 'déjà ??? i?ii'
Make a string uppercase.
UTF8::strtoupper('Déjà ??? I??i'); // 'DÉJÀ ??? II?I'
Translate characters or replace sub-strings.
$arr = array(
'Hello' => '???',
'????' => 'earth',
);
UTF8::strtr('Hello ????', $arr); // '??? earth'
Convert a string (phrase, sentence, ...) into an array of words.
UTF8::str_to_words('???? oöäü#s', '#') // array('', '????', ' ', 'oöäü#s', '')
Get part of a string.
UTF8::substr('????', 1, 2); // '??'
Binary-safe comparison of two strings from an offset, up to a length of characters.
UTF8::substr_compare("???\r", '??', 0, 2); // -1
UTF8::substr_compare("???\r", '??', 1, 2); // 1
UTF8::substr_compare("???\r", '??', 1, 2); // 0
Count the number of substring occurrences.
UTF8::substr_count('????', '??', 1, 2); // 1
Removes a prefix ($needle) from the beginning of the string ($haystack).
UTF8::substr_left('?????MiddleEnd', '?????'); // 'MiddleEnd'
UTF8::substr_left('?????MiddleEnd', '?????'); // '?????MiddleEnd'
Removes a prefix ($needle) from the beginning of the string ($haystack), case-insensitive.
UTF8::substr_ileft('?????MiddleEnd', '?????'); // 'MiddleEnd'
UTF8::substr_ileft('?????MiddleEnd', '?????'); // 'MiddleEnd'
Removes a suffix ($needle) from the end of the string ($haystack).
UTF8::substr_right('BeginMiddle?????', '?????'); // 'BeginMiddle'
UTF8::substr_right('BeginMiddle?????', '?????'); // 'BeginMiddle?????'
Removes a suffix ($needle) from the end of the string ($haystack), case-insensitive.
UTF8::substr_iright('BeginMiddle?????', '?????'); // 'BeginMiddle'
UTF8::substr_iright('BeginMiddle?????', '?????'); // 'BeginMiddle'
Replace text within a portion of a string.
UTF8::substr_replace(array('Iñtërnâtiônàlizætiøn', 'foo'), 'æ', 1); // array('Iæñtërnâtiônàlizætiøn', 'fæoo')
Returns a case swapped version of the string.
UTF8::swapCase('déJÀ ??? i?II'); // 'DÉjà ??? IIii'
Convert a string into ASCII.
alias: UTF8::toAscii() alias: UTF8::str_transliterate()
UTF8::to_ascii('déjà ??? i?ii'); // 'deja sss iiii'
This function leaves UTF-8 characters alone, while converting almost all non-UTF8 to UTF8.
alias: UTF8::toUtf8()
UTF8::to_utf8("\u0063\u0061\u0074"); // 'cat'
Convert a string into "ISO-8859"-encoding (Latin-1).
alias: UTF8::toIso8859() alias: UTF8::to_latin1() alias: UTF8::toLatin1()
UTF8::to_utf8(UTF8::to_latin1(' -ABC-????- ')); // ' -ABC-????- '
Makes string's first char uppercase.
alias: UTF8::ucword()
UTF8::ucfirst('ñtërnâtiônàlizætiøn'); // 'Ñtërnâtiônàlizætiøn'
Uppercase for all words in the string.
UTF8::ucwords('iñt ërn âTi ônà liz æti øn'); // 'Iñt Ërn ÂTi Ônà Liz Æti Øn'
Multi decode HTML entity + fix urlencoded-win1252-chars.
UTF8::urldecode('tes%20öäü%20\u00edtest+test'); // 'tes öäü ítest+test'
Multi decode HTML entity + fix urlencoded-win1252-chars.
UTF8::urldecode('tes%20öäü%20\u00edtest+test'); // 'tes öäü ítest test'
Decodes a UTF-8 string to ISO-8859-1.
UTF8::encode('UTF-8', UTF8::utf8_decode('-ABC-????-')); // '-ABC-????-'
Encodes an ISO-8859-1 string to UTF-8.
UTF8::utf8_decode(UTF8::utf8_encode('-ABC-????-')); // '-ABC-????-'
Limit the number of words in a string.
UTF8::words_limit('fòô bà? fòô', 2, ''); // 'fòô bà?'
Wraps a string to a given number of characters
UTF8::wordwrap('Iñtërnâtiônàlizætiøn', 2, '<br>', true)); // 'Iñ<br>të<br>rn<br>ât<br>iô<br>nà<br>li<br>zæ<br>ti<br>øn'
1) Composer is a prerequisite for running the tests.
composer install
2) The tests can be executed by running this command from the root directory:
./vendor/bin/phpunit
For support and donations please visit GitHub | Issues | PayPal | Patreon.
For status updates and release announcements please visit Releases | Twitter | Patreon.
For professional support please contact me.
"Portable UTF8" is free software; you can redistribute it and/or modify it under the terms of the (at your option): - Apache License v2.0, or - GNU General Public License v2.0.
Unicode handling requires tedious work to be implemented and maintained on the long run. As such, contributions such as unit tests, bug reports, comments or patches licensed under both licenses are really welcomed.
Files |
File | Role | Description | ||
---|---|---|---|---|
.github (4 files) | ||||
src (1 directory) | ||||
tests (47 files, 1 directory) | ||||
.bookignore | Data | Auxiliary data | ||
.editorconfig | Data | Auxiliary data | ||
.scrutinizer.yml | Data | Auxiliary data | ||
.styleci.yml | Data | Auxiliary data | ||
.travis.yml | Data | Auxiliary data | ||
appveyor.yml | Data | Auxiliary data | ||
book.json | Data | Auxiliary data | ||
bootstrap.php | Aux. | Auxiliary script | ||
CHANGELOG.md | Data | Auxiliary data | ||
circle.yml | Data | Auxiliary data | ||
composer.json | Data | Auxiliary data | ||
LICENSE-APACHE | Lic. | License text | ||
package.json | Data | Auxiliary data | ||
phpcs.php_cs | Example | Example script | ||
phpstan.neon | Data | Auxiliary data | ||
phpunit.xml | Data | Auxiliary data | ||
psalm.xml | Data | Auxiliary data | ||
README.md | Doc. | Documentation | ||
SECURITY.md | Data | Auxiliary data | ||
SUMMARY.md | Data | Auxiliary data |
Files | / | .github |
File | Role | Description |
---|---|---|
CONTRIBUTING.md | Data | Auxiliary data |
FUNDING.yml | Data | Auxiliary data |
ISSUE_TEMPLATE.md | Data | Auxiliary data |
PULL_REQUEST_TEMPLATE.md | Data | Auxiliary data |
Files | / | src | / | voku | / | helper |
File | Role | Description | ||
---|---|---|---|---|
data (8 files) | ||||
Bootup.php | Class | Class source | ||
UTF8.php | Class | Class source |
Files | / | src | / | voku | / | helper | / | data |
File | Role | Description |
---|---|---|
caseFolding_full.php | Aux. | Auxiliary script |
chr.php | Aux. | Auxiliary script |
emoji.php | Aux. | Auxiliary script |
encodings.php | Aux. | Auxiliary script |
ord.php | Aux. | Auxiliary script |
transliterator_list.php | Aux. | Auxiliary script |
utf8_fix.php | Aux. | Auxiliary script |
win1252_to_utf8.php | Aux. | Auxiliary script |
Files | / | tests |
File | Role | Description | ||
---|---|---|---|---|
fixtures (22 files) | ||||
bootstrap.php | Aux. | Auxiliary script | ||
BootupTest.php | Class | Class source | ||
HhvmTest.php | Class | Class source | ||
ShimIconvTest.php | Class | Class source | ||
ShimIntlTest.php | Class | Class source | ||
ShimMbstringTest.php | Class | Class source | ||
ShimXmlTest.php | Class | Class source | ||
Utf8AccessTest.php | Class | Class source | ||
Utf8AsciiTest.php | Class | Class source | ||
Utf8CodePointsTest.php | Class | Class source | ||
Utf8CompliantTest.php | Class | Class source | ||
Utf8GlobalNonStrictPart1Test.php | Class | Class source | ||
Utf8GlobalNonStrictPart2Test.php | Class | Class source | ||
Utf8GlobalNonStrictPart3Test.php | Class | Class source | ||
Utf8GlobalPart1Test.php | Class | Class source | ||
Utf8GlobalPart2Test.php | Class | Class source | ||
Utf8GlobalPart3Test.php | Class | Class source | ||
Utf8HtmlEncode.php | Class | Class source | ||
Utf8IsUtf8Test.php | Class | Class source | ||
Utf8IsValidTest.php | Class | Class source | ||
Utf8LcfirstTest.php | Class | Class source | ||
Utf8LtrimTest.php | Class | Class source | ||
Utf8OrdTest.php | Class | Class source | ||
Utf8RtrimTest.php | Class | Class source | ||
Utf8StrcasecmpTest.php | Class | Class source | ||
Utf8StrcspnTest.php | Class | Class source | ||
Utf8StrIreplaceTest.php | Class | Class source | ||
Utf8StristrTest.php | Class | Class source | ||
Utf8StrlenTest.php | Class | Class source | ||
Utf8StrPadTest.php | Class | Class source | ||
Utf8StrReplaceTest.php | Class | Class source | ||
Utf8StrrevTest.php | Class | Class source | ||
Utf8StrriposTest.php | Class | Class source | ||
Utf8StrrposTest.php | Class | Class source | ||
Utf8StrSplitTest.php | Class | Class source | ||
Utf8StrspnTest.php | Class | Class source | ||
Utf8StrToUpperTest.php | Class | Class source | ||
Utf8StrTransliterateTest.php | Class | Class source | ||
Utf8StrtToLowerTest.php | Class | Class source | ||
Utf8StrWordwrapTest.php | Class | Class source | ||
Utf8SubstrReplaceTest.php | Class | Class source | ||
Utf8SubstrTest.php | Class | Class source | ||
Utf8ToAsciiTest.php | Class | Class source | ||
Utf8TrimTest.php | Class | Class source | ||
Utf8UcfirstTest.php | Class | Class source | ||
Utf8UcwordsTest.php | Class | Class source | ||
ZNormalizationTest.php | Class | Class source |
Files | / | tests | / | fixtures |
File | Role | Description |
---|---|---|
broken_import.csv | Data | Auxiliary data |
image.png | Data | Auxiliary data |
image_small.png | Data | Auxiliary data |
iso-8859-7.txt | Doc. | Documentation |
latin.txt | Doc. | Documentation |
sample-ascii-chart.txt | Doc. | Documentation |
sample-html.txt | Doc. | Documentation |
sample-unicode-chart.txt | Doc. | Documentation |
sample-utf-16-be-bom-only.txt | Doc. | Documentation |
sample-utf-16-le-bom-only.txt | Doc. | Documentation |
sample-utf-8-bom-only.txt | Doc. | Documentation |
sample-utf-8-bom.txt | Doc. | Documentation |
sample-win1252.html | Doc. | Documentation |
test.js | Data | Auxiliary data |
test.pdf | Data | Auxiliary data |
utf-8-bom.txt | Doc. | Documentation |
utf-8-extra.txt | Doc. | Documentation |
utf-8.txt | Doc. | Documentation |
ZNormalizationTest.50.txt | Doc. | Documentation |
ZNormalizationTest.63.txt | Doc. | Documentation |
ZNormalizationTest.70.txt | Doc. | Documentation |
ZNormalizationTest.80.txt | Doc. | Documentation |
Version Control | Reuses | Unique User Downloads | Download Rankings | ||||||||||||||||
100% | 5 |
|
|
User Ratings | User Comments (2) | |||||||||||||||||||||||||||||||||||||
|
|
Applications that use this package |
If you know an application of this package, send a message to the author to add a link here.
Other classes that need this package |
Class | Why it is needed | Dependency |
---|---|---|
PHP Anti XSS Filter | String-Handling | Required |
PHP HTML Form Validator | UTF-8 support | Required |
PHP URLify | toASCII() is needed | Required |
Simple HTML DOM | Strin | Required |
Simple MySQLi Class | String-Handling | Required |