Search moodle.org's
Developer Documentation

See Release Notes

  • Bug fixes for general core bugs in 4.2.x will end 22 April 2024 (12 months).
  • Bug fixes for security issues in 4.2.x will end 7 October 2024 (18 months).
  • PHP version: minimum PHP 8.0.0 Note: minimum PHP version has increased since Moodle 4.1. PHP 8.1.x is supported too.

Differences Between: [Versions 310 and 402] [Versions 311 and 402] [Versions 39 and 402] [Versions 400 and 402]

Defines string apis

Copyright: (C) 2001-3001 Eloy Lafuente (stronk7) {@link http://contiento.com}
License: http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
File Size: 679 lines (25 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 1 class

core_text:: (24 methods):
  is_charset_supported()
  reset_caches()
  parse_charset()
  convert()
  substr()
  str_max_bytes()
  strrchr()
  strlen()
  strtolower()
  strtoupper()
  strpos()
  strrpos()
  strrev()
  specialtoascii()
  encode_mimeheader()
  get_entities_table()
  entities_to_utf8()
  utf8_to_entities()
  trim_utf8_bom()
  remove_unicode_non_characters()
  get_encodings()
  code2utf8()
  utf8ord()
  strtotitle()


Class: core_text  - X-Ref

defines string api's for manipulating strings

This class is used to manipulate strings under Moodle 1.6 an later. As
utf-8 text become mandatory a pool of safe functions under this encoding
become necessary. The name of the methods is exactly the
same than their PHP originals.

This class was previously based on Typo3 which has now been removed and uses
native functions now.

is_charset_supported(string $charset)   X-Ref
Check whether the charset is supported by mbstring.

param: string $charset Normalised charset
return: bool

reset_caches()   X-Ref
Reset internal textlib caches.


parse_charset($charset)   X-Ref
Standardise charset name

Please note it does not mean the returned charset is actually supported.

param: string $charset raw charset name
return: string normalised lowercase charset name

convert($text, $fromCS, $toCS='utf-8')   X-Ref
Converts the text between different encodings. It uses iconv extension with //TRANSLIT parameter.
If both source and target are utf-8 it tries to fix invalid characters only.

param: string $text
param: string $fromCS source encoding
param: string $toCS result encoding
return: string|bool converted string or false on error

substr($text, $start, $len=null, $charset='utf-8')   X-Ref
Multibyte safe substr() function, uses mbstring or iconv

param: string $text string to truncate
param: int $start negative value means from end
param: int $len maximum length of characters beginning from start
param: string $charset encoding of the text
return: string portion of string specified by the $start and $len

str_max_bytes($string, $bytes)   X-Ref
Truncates a string to no more than a certain number of bytes in a multi-byte safe manner.
UTF-8 only!

param: string $string String to truncate
param: int $bytes Maximum length of bytes in the result
return: string Portion of string specified by $bytes

strrchr($haystack, $needle, $part = false)   X-Ref
Finds the last occurrence of a character in a string within another.
UTF-8 ONLY safe mb_strrchr().

param: string $haystack The string from which to get the last occurrence of needle.
param: string $needle The string to find in haystack.
param: boolean $part If true, returns the portion before needle, else return the portion after (including needle).
return: string|false False when not found.

strlen($text, $charset='utf-8')   X-Ref
Multibyte safe strlen() function, uses mbstring or iconv

param: string $text input string
param: string $charset encoding of the text
return: int number of characters

strtolower($text, $charset='utf-8')   X-Ref
Multibyte safe strtolower() function, uses mbstring.

param: string $text input string
param: string $charset encoding of the text (may not work for all encodings)
return: string lower case text

strtoupper($text, $charset='utf-8')   X-Ref
Multibyte safe strtoupper() function, uses mbstring.

param: string $text input string
param: string $charset encoding of the text (may not work for all encodings)
return: string upper case text

strpos($haystack, $needle, $offset=0)   X-Ref
Find the position of the first occurrence of a substring in a string.
UTF-8 ONLY safe strpos(), uses mbstring

param: string $haystack the string to search in
param: string $needle one or more charachters to search for
param: int $offset offset from begining of string
return: int the numeric position of the first occurrence of needle in haystack.

strrpos($haystack, $needle)   X-Ref
Find the position of the last occurrence of a substring in a string
UTF-8 ONLY safe strrpos(), uses mbstring

param: string $haystack the string to search in
param: string $needle one or more charachters to search for
return: int the numeric position of the last occurrence of needle in haystack

strrev($str)   X-Ref
Reverse UTF-8 multibytes character sets (used for RTL languages)
(We only do this because there is no mb_strrev or iconv_strrev)

param: string $str the multibyte string to reverse
return: string the reversed multi byte string

specialtoascii($text, $charset='utf-8')   X-Ref
Try to convert upper unicode characters to plain ascii,
the returned string may contain unconverted unicode characters.

With the removal of typo3, iconv conversions was found to be the best alternative to Typo3's function.
However using the standard iconv call
iconv($charset, 'ASCII//TRANSLIT//IGNORE', (string) $text);
resulted in invalid strings with special character from Russian/Japanese. To solve this, the transliterator was
used but this resulted in empty strings for certain strings in our test. It was decided to use a combo of the 2
to cover all our bases. Refer MDL-53544 for further information.

param: string $text input string
param: string $charset encoding of the text
return: string converted ascii string

encode_mimeheader($text, $charset='utf-8')   X-Ref
Generate a correct base64 encoded header to be used in MIME mail messages.
This function seems to be 100% compliant with RFC1342. Credits go to:
paravoid (http://www.php.net/manual/en/function.mb-encode-mimeheader.php#60283).

param: string $text input string
param: string $charset encoding of the text
return: string base64 encoded header

get_entities_table()   X-Ref
Returns HTML entity transliteration table.

return: array with (html entity => utf-8) elements

entities_to_utf8($str, $htmlent=true)   X-Ref
Converts all the numeric entities &#nnnn; or &#xnnn; to UTF-8
Original from laurynas dot butkus at gmail at:
http://php.net/manual/en/function.html-entity-decode.php#75153
with some custom mods to provide more functionality

param: string $str input string
param: boolean $htmlent convert also html entities (defaults to true)
return: string encoded UTF-8 string

utf8_to_entities($str, $dec=false, $nonnum=false)   X-Ref
Converts all Unicode chars > 127 to numeric entities &#nnnn; or &#xnnn;.

param: string $str input string
param: boolean $dec output decadic only number entities
param: boolean $nonnum remove all non-numeric entities
return: string converted string

trim_utf8_bom($str)   X-Ref
Removes the BOM from unicode string {@link http://unicode.org/faq/utf_bom.html}

param: string $str input string
return: string

remove_unicode_non_characters($value)   X-Ref
There are a number of Unicode non-characters including the byte-order mark (which may appear
multiple times in a string) and also other ranges. These can cause problems for some
processing.

This function removes the characters using string replace, so that the rest of the string
remains unchanged.

param: string $value Input string
return: string Cleaned string value

get_encodings()   X-Ref
Returns encoding options for select boxes, utf-8 and platform encoding first

return: array encodings

code2utf8($num)   X-Ref
Returns the utf8 string corresponding to the unicode value
(from php.net, courtesy - romans@void.lv)

param: int    $num one unicode value
return: string the UTF-8 char corresponding to the unicode value

utf8ord($utf8char)   X-Ref
Returns the code of the given UTF-8 character

param: string $utf8char one UTF-8 character
return: int    the code of the given character

strtotitle($text)   X-Ref
Makes first letter of each word capital - words must be separated by spaces.
Use with care, this function does not work properly in many locales!!!

param: string $text input string
return: string