Search moodle.org's
Developer Documentation

See Release Notes

  • Bug fixes for general core bugs in 4.0.x will end 8 May 2023 (12 months).
  • Bug fixes for security issues in 4.0.x will end 13 November 2023 (18 months).
  • PHP version: minimum PHP 7.3.0 Note: the minimum PHP version has increased since Moodle 3.10. PHP 7.4.x is also supported.

Differences Between: [Versions 310 and 400] [Versions 311 and 400] [Versions 39 and 400] [Versions 400 and 401] [Versions 400 and 402] [Versions 400 and 403]

Defines string apis

Copyright: (C) 2001-3001 Eloy Lafuente (stronk7) {@link http://contiento.com}
License: http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
File Size: 668 lines (25 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 1 class

core_text:: (24 methods):
  is_charset_supported()
  reset_caches()
  parse_charset()
  convert()
  substr()
  str_max_bytes()
  strrchr()
  strlen()
  strtolower()
  strtoupper()
  strpos()
  strrpos()
  strrev()
  specialtoascii()
  encode_mimeheader()
  get_entities_table()
  entities_to_utf8()
  utf8_to_entities()
  trim_utf8_bom()
  remove_unicode_non_characters()
  get_encodings()
  code2utf8()
  utf8ord()
  strtotitle()


Class: core_text  - X-Ref

defines string api's for manipulating strings

This class is used to manipulate strings under Moodle 1.6 an later. As
utf-8 text become mandatory a pool of safe functions under this encoding
become necessary. The name of the methods is exactly the
same than their PHP originals.

This class was previously based on Typo3 which has now been removed and uses
native functions now.

is_charset_supported(string $charset)   X-Ref
Check whether the charset is supported by mbstring.

return: bool
param: string $charset Normalised charset

reset_caches()   X-Ref
Reset internal textlib caches.


parse_charset($charset)   X-Ref
Standardise charset name

Please note it does not mean the returned charset is actually supported.

return: string normalised lowercase charset name
param: string $charset raw charset name

convert($text, $fromCS, $toCS='utf-8')   X-Ref
Converts the text between different encodings. It uses iconv extension with //TRANSLIT parameter.
If both source and target are utf-8 it tries to fix invalid characters only.

return: string|bool converted string or false on error
param: string $text
param: string $fromCS source encoding
param: string $toCS result encoding

substr($text, $start, $len=null, $charset='utf-8')   X-Ref
Multibyte safe substr() function, uses mbstring or iconv

return: string portion of string specified by the $start and $len
param: string $text string to truncate
param: int $start negative value means from end
param: int $len maximum length of characters beginning from start
param: string $charset encoding of the text

str_max_bytes($string, $bytes)   X-Ref
Truncates a string to no more than a certain number of bytes in a multi-byte safe manner.
UTF-8 only!

return: string Portion of string specified by $bytes
param: string $string String to truncate
param: int $bytes Maximum length of bytes in the result

strrchr($haystack, $needle, $part = false)   X-Ref
Finds the last occurrence of a character in a string within another.
UTF-8 ONLY safe mb_strrchr().

return: string|false False when not found.
param: string $haystack The string from which to get the last occurrence of needle.
param: string $needle The string to find in haystack.
param: boolean $part If true, returns the portion before needle, else return the portion after (including needle).

strlen($text, $charset='utf-8')   X-Ref
Multibyte safe strlen() function, uses mbstring or iconv

return: int number of characters
param: string $text input string
param: string $charset encoding of the text

strtolower($text, $charset='utf-8')   X-Ref
Multibyte safe strtolower() function, uses mbstring.

return: string lower case text
param: string $text input string
param: string $charset encoding of the text (may not work for all encodings)

strtoupper($text, $charset='utf-8')   X-Ref
Multibyte safe strtoupper() function, uses mbstring.

return: string upper case text
param: string $text input string
param: string $charset encoding of the text (may not work for all encodings)

strpos($haystack, $needle, $offset=0)   X-Ref
Find the position of the first occurrence of a substring in a string.
UTF-8 ONLY safe strpos(), uses mbstring

return: int the numeric position of the first occurrence of needle in haystack.
param: string $haystack the string to search in
param: string $needle one or more charachters to search for
param: int $offset offset from begining of string

strrpos($haystack, $needle)   X-Ref
Find the position of the last occurrence of a substring in a string
UTF-8 ONLY safe strrpos(), uses mbstring

return: int the numeric position of the last occurrence of needle in haystack
param: string $haystack the string to search in
param: string $needle one or more charachters to search for

strrev($str)   X-Ref
Reverse UTF-8 multibytes character sets (used for RTL languages)
(We only do this because there is no mb_strrev or iconv_strrev)

return: string the reversed multi byte string
param: string $str the multibyte string to reverse

specialtoascii($text, $charset='utf-8')   X-Ref
Try to convert upper unicode characters to plain ascii,
the returned string may contain unconverted unicode characters.

With the removal of typo3, iconv conversions was found to be the best alternative to Typo3's function.
However using the standard iconv call
iconv($charset, 'ASCII//TRANSLIT//IGNORE', (string) $text);
resulted in invalid strings with special character from Russian/Japanese. To solve this, the transliterator was
used but this resulted in empty strings for certain strings in our test. It was decided to use a combo of the 2
to cover all our bases. Refer MDL-53544 for further information.

return: string converted ascii string
param: string $text input string
param: string $charset encoding of the text

encode_mimeheader($text, $charset='utf-8')   X-Ref
Generate a correct base64 encoded header to be used in MIME mail messages.
This function seems to be 100% compliant with RFC1342. Credits go to:
paravoid (http://www.php.net/manual/en/function.mb-encode-mimeheader.php#60283).

return: string base64 encoded header
param: string $text input string
param: string $charset encoding of the text

get_entities_table()   X-Ref
Returns HTML entity transliteration table.

return: array with (html entity => utf-8) elements

entities_to_utf8($str, $htmlent=true)   X-Ref
Converts all the numeric entities &#nnnn; or &#xnnn; to UTF-8
Original from laurynas dot butkus at gmail at:
http://php.net/manual/en/function.html-entity-decode.php#75153
with some custom mods to provide more functionality

return: string encoded UTF-8 string
param: string $str input string
param: boolean $htmlent convert also html entities (defaults to true)

utf8_to_entities($str, $dec=false, $nonnum=false)   X-Ref
Converts all Unicode chars > 127 to numeric entities &#nnnn; or &#xnnn;.

return: string converted string
param: string $str input string
param: boolean $dec output decadic only number entities
param: boolean $nonnum remove all non-numeric entities

trim_utf8_bom($str)   X-Ref
Removes the BOM from unicode string {@link http://unicode.org/faq/utf_bom.html}

return: string
param: string $str input string

remove_unicode_non_characters($value)   X-Ref
There are a number of Unicode non-characters including the byte-order mark (which may appear
multiple times in a string) and also other ranges. These can cause problems for some
processing.

This function removes the characters using string replace, so that the rest of the string
remains unchanged.

return: string Cleaned string value
param: string $value Input string

get_encodings()   X-Ref
Returns encoding options for select boxes, utf-8 and platform encoding first

return: array encodings

code2utf8($num)   X-Ref
Returns the utf8 string corresponding to the unicode value
(from php.net, courtesy - romans@void.lv)

return: string the UTF-8 char corresponding to the unicode value
param: int    $num one unicode value

utf8ord($utf8char)   X-Ref
Returns the code of the given UTF-8 character

return: int    the code of the given character
param: string $utf8char one UTF-8 character

strtotitle($text)   X-Ref
Makes first letter of each word capital - words must be separated by spaces.
Use with care, this function does not work properly in many locales!!!

return: string
param: string $text input string