Class for conversion between charsets.
Author: | Kasper Skårhøj <kasperYYYY@typo3.com> |
Author: | Martin Kutschker <martin.t.kutschker@blackbox.net> |
File Size: | 2367 lines (73 kb) |
Included or required: | 0 times |
Referenced: | 0 times |
Includes or requires: | 0 files |
t3lib_cs:: (39 methods):
__construct()
parse_charset()
get_locale_charset()
conv()
convArray()
utf8_encode()
utf8_decode()
utf8_to_entities()
entities_to_utf8()
utf8_to_numberarray()
UnumberToChar()
utf8CharToUnumber()
initCharset()
initUnicodeData()
initCaseFolding()
initToASCII()
substr()
strlen()
cropMbstring()
crop()
strtrunc()
conv_case()
convCaseFirst()
specCharsToASCII()
getPreferredClientLanguage()
sb_char_mapping()
utf8_substr()
utf8_strlen()
utf8_strtrunc()
utf8_strpos()
utf8_strrpos()
utf8_char2byte_pos()
utf8_byte2char_pos()
utf8_char_mapping()
euc_strtrunc()
euc_substr()
euc_strlen()
euc_char2byte_pos()
euc_char_mapping()
__construct() X-Ref |
Default constructor. |
parse_charset($charset) X-Ref |
Normalize - changes input character set to lowercase letters. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string Input charset return: string Normalized charset |
get_locale_charset($locale) X-Ref |
Get the charset of a locale. ln language ln_CN language / country ln_CN.cs language / country / charset ln_CN.cs@mod language / country / charset / modifier author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string Locale string return: string Charset resolved for locale string |
conv($str, $fromCS, $toCS, $useEntityForNoChar = 0) X-Ref |
Convert from one charset to another charset. param: string Input string param: string From charset (the current charset of the string) param: string To charset (the output charset wanted) param: boolean If set, then characters that are not available in the destination character set will be encoded as numeric entities return: string Converted string |
convArray(&$array, $fromCS, $toCS, $useEntityForNoChar = 0) X-Ref |
Convert all elements in ARRAY with type string from one charset to another charset. NOTICE: Array is passed by reference! param: string Input array, possibly multidimensional param: string From charset (the current charset of the string) param: string To charset (the output charset wanted) param: boolean If set, then characters that are not available in the destination character set will be encoded as numeric entities return: void |
utf8_encode($str, $charset) X-Ref |
Converts $str from $charset to UTF-8 param: string String in local charset to convert to UTF-8 param: string Charset, lowercase. Must be found in csconvtbl/ folder. return: string Output string, converted to UTF-8 |
utf8_decode($str, $charset, $useEntityForNoChar = 0) X-Ref |
Converts $str from UTF-8 to $charset param: string String in UTF-8 to convert to local charset param: string Charset, lowercase. Must be found in csconvtbl/ folder. param: boolean If set, then characters that are not available in the destination character set will be encoded as numeric entities return: string Output string, converted to local charset |
utf8_to_entities($str) X-Ref |
Converts all chars > 127 to numeric entities. param: string Input string return: string Output string |
entities_to_utf8($str, $alsoStdHtmlEnt = FALSE) X-Ref |
Converts numeric entities (UNICODE, eg. decimal (Ӓ) or hexadecimal ()) to UTF-8 multibyte chars param: string Input string, UTF-8 param: boolean If set, then all string-HTML entities (like & or £ will be converted as well) return: string Output string |
utf8_to_numberarray($str, $convEntities = 0, $retChar = 0) X-Ref |
Converts all chars in the input UTF-8 string into integer numbers returned in an array param: string Input string, UTF-8 param: boolean If set, then all HTML entities (like & or £ or { or 㽝) will be detected as characters. param: boolean If set, then instead of integer numbers the real UTF-8 char is returned. return: array Output array with the char numbers |
UnumberToChar($cbyte) X-Ref |
Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence: bytes | bits | representation 1 | 7 | 0vvvvvvv 2 | 11 | 110vvvvv 10vvvvvv 3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv 4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv 5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv param: integer UNICODE integer return: string UTF-8 multibyte character string |
utf8CharToUnumber($str, $hex = 0) X-Ref |
Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper param: string UTF-8 multibyte character string param: boolean If set, then a hex. number is returned. return: integer UNICODE integer |
initCharset($charset) X-Ref |
This will initialize a charset for use if it's defined in the PATH_t3lib.'csconvtbl/' folder This function is automatically called by the conversion functions PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/ param: string The charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl) return: integer Returns '1' if already loaded. Returns FALSE if charset conversion table was not found. Returns '2' if the charset conversion table was found and parsed. |
initUnicodeData($mode = NULL) X-Ref |
This function initializes all UTF-8 character data tables. PLEASE SEE: http://www.unicode.org/Public/UNIDATA/ param: string Mode ("case", "ascii", ...) return: integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). |
initCaseFolding($charset) X-Ref |
This function initializes the folding table for a charset other than UTF-8. This function is automatically called by the case folding functions. param: string Charset for which to initialize case folding. return: integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). |
initToASCII($charset) X-Ref |
This function initializes the to-ASCII conversion table for a charset other than UTF-8. This function is automatically called by the ASCII transliteration functions. param: string Charset for which to initialize conversion. return: integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). |
substr($charset, $string, $start, $len = NULL) X-Ref |
Returns a part of a string. Unit-tested by Kasper (single byte charsets only) author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string The character set param: string Character string param: integer Start position (character position) param: integer Length (in characters) return: string The substring |
strlen($charset, $string) X-Ref |
Counts the number of characters. Unit-tested by Kasper (single byte charsets only) author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string The character set param: string Character string return: integer The number of characters |
cropMbstring($charset, $string, $len, $crop = '') X-Ref |
Method to crop strings using the mb_substr function. param: string The character set param: string String to be cropped param: integer Crop length (in characters) param: string Crop signifier return: string The shortened string |
crop($charset, $string, $len, $crop = '') X-Ref |
Truncates a string and pre-/appends a string. Unit tested by Kasper author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string The character set param: string Character string param: integer Length (in characters) param: string Crop signifier return: string The shortened string |
strtrunc($charset, $string, $len) X-Ref |
Cuts a string short at a given byte length. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string The character set param: string Character string param: integer The byte length return: string The shortened string |
conv_case($charset, $string, $case) X-Ref |
Translates all characters of a string into their respective case values. Unlike strtolower() and strtoupper() this method is locale independent. Note that the string length may change! eg. lower case German "ß" (sharp S) becomes upper case "SS" Unit-tested by Kasper Real case folding is language dependent, this method ignores this fact. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string Character set of string param: string Input string to convert case for param: string Case keyword: "toLower" means lowercase conversion, anything else is uppercase (use "toUpper" ) return: string The converted string |
convCaseFirst($charset, $string, $case) X-Ref |
Equivalent of lcfirst/ucfirst but using character set. param: string $charset param: string $string param: string $case return: string |
specCharsToASCII($charset, $string) X-Ref |
Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.) param: string $charset Character set of string param: string $string Input string to convert return: string The converted string |
getPreferredClientLanguage($languageCodesList) X-Ref |
converts the language codes that we get from the client (usually HTTP_ACCEPT_LANGUAGE) into a TYPO3-readable language code author: Benjamin Mack (benni.typo3.org) param: $languageCodesList list of language codes. something like 'de,en-us;q=0.9,de-de;q=0.7,es-cl;q=0.6,en;q=0.4,es;q=0.3,zh;q=0.1' return: string a preferred language that TYPO3 supports, or "default" if none found |
sb_char_mapping($str, $charset, $mode, $opt = '') X-Ref |
Maps all characters of a string in a single byte charset. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string the string param: string the charset param: string mode: 'case' (case folding) or 'ascii' (ASCII transliteration) param: string 'case': conversion 'toLower' or 'toUpper' return: string the converted string |
utf8_substr($str, $start, $len = NULL) X-Ref |
Returns a part of a UTF-8 string. Unit-tested by Kasper and works 100% like substr() / mb_substr() for full range of $start/$len author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 string param: integer Start position (character position) param: integer Length (in characters) return: string The substring |
utf8_strlen($str) X-Ref |
Counts the number of characters of a string in UTF-8. Unit-tested by Kasper and works 100% like strlen() / mb_strlen() author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 multibyte character string return: integer The number of characters |
utf8_strtrunc($str, $len) X-Ref |
Truncates a string in UTF-8 short at a given byte length. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 multibyte character string param: integer the byte length return: string the shortened string |
utf8_strpos($haystack, $needle, $offset = 0) X-Ref |
Find position of first occurrence of a string, both arguments are in UTF-8. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 string to search in param: string UTF-8 string to search for param: integer Positition to start the search return: integer The character position |
utf8_strrpos($haystack, $needle) X-Ref |
Find position of last occurrence of a char in a string, both arguments are in UTF-8. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 string to search in param: string UTF-8 character to search for (single character) return: integer The character position |
utf8_char2byte_pos($str, $pos) X-Ref |
Translates a character position into an 'absolute' byte position. Unit tested by Kasper. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 string param: integer Character position (negative values start from the end) return: integer Byte position |
utf8_byte2char_pos($str, $pos) X-Ref |
Translates an 'absolute' byte position into a character position. Unit tested by Kasper. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 string param: integer byte position return: integer character position |
utf8_char_mapping($str, $mode, $opt = '') X-Ref |
Maps all characters of an UTF-8 string. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string UTF-8 string param: string mode: 'case' (case folding) or 'ascii' (ASCII transliteration) param: string 'case': conversion 'toLower' or 'toUpper' return: string the converted string |
euc_strtrunc($str, $len, $charset) X-Ref |
Cuts a string in the EUC charset family short at a given byte length. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string EUC multibyte character string param: integer the byte length param: string the charset return: string the shortened string |
euc_substr($str, $start, $charset, $len = NULL) X-Ref |
Returns a part of a string in the EUC charset family. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string EUC multibyte character string param: integer start position (character position) param: string the charset param: integer length (in characters) return: string the substring |
euc_strlen($str, $charset) X-Ref |
Counts the number of characters of a string in the EUC charset family. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string EUC multibyte character string param: string the charset return: integer the number of characters |
euc_char2byte_pos($str, $pos, $charset) X-Ref |
Translates a character position into an 'absolute' byte position. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string EUC multibyte character string param: integer character position (negative values start from the end) param: string the charset return: integer byte position |
euc_char_mapping($str, $charset, $mode, $opt = '') X-Ref |
Maps all characters of a string in the EUC charset family. author: Martin Kutschker <martin.t.kutschker@blackbox.net> param: string EUC multibyte character string param: string the charset param: string mode: 'case' (case folding) or 'ascii' (ASCII transliteration) param: string 'case': conversion 'toLower' or 'toUpper' return: string the converted string |