Search moodle.org's
Developer Documentation

See Release Notes

  • Bug fixes for general core bugs in 3.10.x will end 8 November 2021 (12 months).
  • Bug fixes for security issues in 3.10.x will end 9 May 2022 (18 months).
  • PHP version: minimum PHP 7.2.0 Note: minimum PHP version has increased since Moodle 3.8. PHP 7.3.x and 7.4.x are supported too.

Class for conversion between charsets.

Author: Kasper Skårhøj <kasperYYYY@typo3.com>
Author: Martin Kutschker <martin.t.kutschker@blackbox.net>
File Size: 2367 lines (73 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 1 class


Class: t3lib_cs  - X-Ref

Class for conversion between charsets

__construct()   X-Ref
Default constructor.


parse_charset($charset)   X-Ref
Normalize - changes input character set to lowercase letters.

param: string        Input charset
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        Normalized charset

get_locale_charset($locale)   X-Ref
Get the charset of a locale.

ln            language
ln_CN         language / country
ln_CN.cs      language / country / charset
ln_CN.cs@mod  language / country / charset / modifier

param: string        Locale string
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        Charset resolved for locale string

conv($str, $fromCS, $toCS, $useEntityForNoChar = 0)   X-Ref
Convert from one charset to another charset.

param: string        Input string
param: string        From charset (the current charset of the string)
param: string        To charset (the output charset wanted)
param: boolean        If set, then characters that are not available in the destination character set will be encoded as numeric entities
return: string        Converted string

convArray(&$array, $fromCS, $toCS, $useEntityForNoChar = 0)   X-Ref
Convert all elements in ARRAY with type string from one charset to another charset.
NOTICE: Array is passed by reference!

param: string        Input array, possibly multidimensional
param: string        From charset (the current charset of the string)
param: string        To charset (the output charset wanted)
param: boolean        If set, then characters that are not available in the destination character set will be encoded as numeric entities
return: void

utf8_encode($str, $charset)   X-Ref
Converts $str from $charset to UTF-8

param: string        String in local charset to convert to UTF-8
param: string        Charset, lowercase. Must be found in csconvtbl/ folder.
return: string        Output string, converted to UTF-8

utf8_decode($str, $charset, $useEntityForNoChar = 0)   X-Ref
Converts $str from UTF-8 to $charset

param: string        String in UTF-8 to convert to local charset
param: string        Charset, lowercase. Must be found in csconvtbl/ folder.
param: boolean        If set, then characters that are not available in the destination character set will be encoded as numeric entities
return: string        Output string, converted to local charset

utf8_to_entities($str)   X-Ref
Converts all chars > 127 to numeric entities.

param: string        Input string
return: string        Output string

entities_to_utf8($str, $alsoStdHtmlEnt = FALSE)   X-Ref
Converts numeric entities (UNICODE, eg. decimal (&#1234;) or hexadecimal (&#x1b;)) to UTF-8 multibyte chars

param: string        Input string, UTF-8
param: boolean        If set, then all string-HTML entities (like &amp; or &pound; will be converted as well)
return: string        Output string

utf8_to_numberarray($str, $convEntities = 0, $retChar = 0)   X-Ref
Converts all chars in the input UTF-8 string into integer numbers returned in an array

param: string        Input string, UTF-8
param: boolean        If set, then all HTML entities (like &amp; or &pound; or &#123; or &#x3f5d;) will be detected as characters.
param: boolean        If set, then instead of integer numbers the real UTF-8 char is returned.
return: array        Output array with the char numbers

UnumberToChar($cbyte)   X-Ref
Converts a UNICODE number to a UTF-8 multibyte character
Algorithm based on script found at From: http://czyborra.com/utf/
Unit-tested by Kasper

The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:

bytes | bits | representation
1 |    7 | 0vvvvvvv
2 |   11 | 110vvvvv 10vvvvvv
3 |   16 | 1110vvvv 10vvvvvv 10vvvvvv
4 |   21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
5 |   26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
6 |   31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv

param: integer        UNICODE integer
return: string        UTF-8 multibyte character string

utf8CharToUnumber($str, $hex = 0)   X-Ref
Converts a UTF-8 Multibyte character to a UNICODE number
Unit-tested by Kasper

param: string        UTF-8 multibyte character string
param: boolean        If set, then a hex. number is returned.
return: integer        UNICODE integer

initCharset($charset)   X-Ref
This will initialize a charset for use if it's defined in the PATH_t3lib.'csconvtbl/' folder
This function is automatically called by the conversion functions

PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/

param: string        The charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)
return: integer        Returns '1' if already loaded. Returns FALSE if charset conversion table was not found. Returns '2' if the charset conversion table was found and parsed.

initUnicodeData($mode = NULL)   X-Ref
This function initializes all UTF-8 character data tables.

PLEASE SEE: http://www.unicode.org/Public/UNIDATA/

param: string        Mode ("case", "ascii", ...)
return: integer        Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached).

initCaseFolding($charset)   X-Ref
This function initializes the folding table for a charset other than UTF-8.
This function is automatically called by the case folding functions.

param: string        Charset for which to initialize case folding.
return: integer        Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached).

initToASCII($charset)   X-Ref
This function initializes the to-ASCII conversion table for a charset other than UTF-8.
This function is automatically called by the ASCII transliteration functions.

param: string        Charset for which to initialize conversion.
return: integer        Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached).

substr($charset, $string, $start, $len = NULL)   X-Ref
Returns a part of a string.
Unit-tested by Kasper (single byte charsets only)

param: string        The character set
param: string        Character string
param: integer        Start position (character position)
param: integer        Length (in characters)
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        The substring

strlen($charset, $string)   X-Ref
Counts the number of characters.
Unit-tested by Kasper (single byte charsets only)

param: string        The character set
param: string        Character string
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        The number of characters

cropMbstring($charset, $string, $len, $crop = '')   X-Ref
Method to crop strings using the mb_substr function.

param: string        The character set
param: string        String to be cropped
param: integer        Crop length (in characters)
param: string        Crop signifier
return: string        The shortened string

crop($charset, $string, $len, $crop = '')   X-Ref
Truncates a string and pre-/appends a string.
Unit tested by Kasper

param: string        The character set
param: string        Character string
param: integer        Length (in characters)
param: string        Crop signifier
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        The shortened string

strtrunc($charset, $string, $len)   X-Ref
Cuts a string short at a given byte length.

param: string        The character set
param: string        Character string
param: integer        The byte length
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        The shortened string

conv_case($charset, $string, $case)   X-Ref
Translates all characters of a string into their respective case values.
Unlike strtolower() and strtoupper() this method is locale independent.
Note that the string length may change!
eg. lower case German "ß" (sharp S) becomes upper case "SS"
Unit-tested by Kasper
Real case folding is language dependent, this method ignores this fact.

param: string        Character set of string
param: string        Input string to convert case for
param: string        Case keyword: "toLower" means lowercase conversion, anything else is uppercase (use "toUpper" )
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        The converted string

convCaseFirst($charset, $string, $case)   X-Ref
Equivalent of lcfirst/ucfirst but using character set.

param: string $charset
param: string $string
param: string $case
return: string

specCharsToASCII($charset, $string)   X-Ref
Converts special chars (like æøåÆØÅ, umlauts etc) to ascii equivalents (usually double-bytes, like æ => ae etc.)

param: string $charset Character set of string
param: string $string Input string to convert
return: string The converted string

getPreferredClientLanguage($languageCodesList)   X-Ref
converts the language codes that we get from the client (usually HTTP_ACCEPT_LANGUAGE)
into a TYPO3-readable language code

param: $languageCodesList    list of language codes. something like 'de,en-us;q=0.9,de-de;q=0.7,es-cl;q=0.6,en;q=0.4,es;q=0.3,zh;q=0.1'
author: Benjamin Mack (benni.typo3.org)
return: string    a preferred language that TYPO3 supports, or "default" if none found

sb_char_mapping($str, $charset, $mode, $opt = '')   X-Ref
Maps all characters of a string in a single byte charset.

param: string        the string
param: string        the charset
param: string        mode: 'case' (case folding) or 'ascii' (ASCII transliteration)
param: string        'case': conversion 'toLower' or 'toUpper'
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        the converted string

utf8_substr($str, $start, $len = NULL)   X-Ref
Returns a part of a UTF-8 string.
Unit-tested by Kasper and works 100% like substr() / mb_substr() for full range of $start/$len

param: string        UTF-8 string
param: integer        Start position (character position)
param: integer        Length (in characters)
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        The substring

utf8_strlen($str)   X-Ref
Counts the number of characters of a string in UTF-8.
Unit-tested by Kasper and works 100% like strlen() / mb_strlen()

param: string        UTF-8 multibyte character string
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        The number of characters

utf8_strtrunc($str, $len)   X-Ref
Truncates a string in UTF-8 short at a given byte length.

param: string        UTF-8 multibyte character string
param: integer        the byte length
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        the shortened string

utf8_strpos($haystack, $needle, $offset = 0)   X-Ref
Find position of first occurrence of a string, both arguments are in UTF-8.

param: string        UTF-8 string to search in
param: string        UTF-8 string to search for
param: integer        Positition to start the search
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        The character position

utf8_strrpos($haystack, $needle)   X-Ref
Find position of last occurrence of a char in a string, both arguments are in UTF-8.

param: string        UTF-8 string to search in
param: string        UTF-8 character to search for (single character)
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        The character position

utf8_char2byte_pos($str, $pos)   X-Ref
Translates a character position into an 'absolute' byte position.
Unit tested by Kasper.

param: string        UTF-8 string
param: integer        Character position (negative values start from the end)
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        Byte position

utf8_byte2char_pos($str, $pos)   X-Ref
Translates an 'absolute' byte position into a character position.
Unit tested by Kasper.

param: string        UTF-8 string
param: integer        byte position
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        character position

utf8_char_mapping($str, $mode, $opt = '')   X-Ref
Maps all characters of an UTF-8 string.

param: string        UTF-8 string
param: string        mode: 'case' (case folding) or 'ascii' (ASCII transliteration)
param: string        'case': conversion 'toLower' or 'toUpper'
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        the converted string

euc_strtrunc($str, $len, $charset)   X-Ref
Cuts a string in the EUC charset family short at a given byte length.

param: string        EUC multibyte character string
param: integer        the byte length
param: string        the charset
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        the shortened string

euc_substr($str, $start, $charset, $len = NULL)   X-Ref
Returns a part of a string in the EUC charset family.

param: string        EUC multibyte character string
param: integer        start position (character position)
param: string        the charset
param: integer        length (in characters)
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        the substring

euc_strlen($str, $charset)   X-Ref
Counts the number of characters of a string in the EUC charset family.

param: string        EUC multibyte character string
param: string        the charset
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        the number of characters

euc_char2byte_pos($str, $pos, $charset)   X-Ref
Translates a character position into an 'absolute' byte position.

param: string        EUC multibyte character string
param: integer        character position (negative values start from the end)
param: string        the charset
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: integer        byte position

euc_char_mapping($str, $charset, $mode, $opt = '')   X-Ref
Maps all characters of a string in the EUC charset family.

param: string        EUC multibyte character string
param: string        the charset
param: string        mode: 'case' (case folding) or 'ascii' (ASCII transliteration)
param: string        'case': conversion 'toLower' or 'toUpper'
author: Martin Kutschker <martin.t.kutschker@blackbox.net>
return: string        the converted string