Search moodle.org's
Developer Documentation

See Release Notes
Long Term Support Release

  • Bug fixes for general core bugs in 4.1.x will end 13 November 2023 (12 months).
  • Bug fixes for security issues in 4.1.x will end 10 November 2025 (36 months).
  • PHP version: minimum PHP 7.4.0 Note: minimum PHP version has increased since Moodle 4.0. PHP 8.0.x is supported too.

Differences Between: [Versions 310 and 401] [Versions 311 and 401] [Versions 39 and 401] [Versions 400 and 401]

(no description)

File Size: 617 lines (26 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 1 class

HTMLPurifier_Encoder:: (12 methods):
  __construct()
  muteErrorHandler()
  unsafeIconv()
  iconv()
  cleanUTF8()
  unichr()
  iconvAvailable()
  convertToUTF8()
  convertFromUTF8()
  convertToASCIIDumbLossless()
  testIconvTruncateBug()
  testEncodingSupportsASCII()


Class: HTMLPurifier_Encoder  - X-Ref

A UTF-8 specific character encoder that handles cleaning and transforming.

__construct()   X-Ref
Constructor throws fatal error if you attempt to instantiate class


muteErrorHandler()   X-Ref
Error-handler that mutes errors, alternative to shut-up operator.


unsafeIconv($in, $out, $text)   X-Ref
iconv wrapper which mutes errors, but doesn't work around bugs.

param: string $in Input encoding
param: string $out Output encoding
param: string $text The text to convert
return: string

iconv($in, $out, $text, $max_chunk_size = 8000)   X-Ref
iconv wrapper which mutes errors and works around bugs.

param: string $in Input encoding
param: string $out Output encoding
param: string $text The text to convert
param: int $max_chunk_size
return: string

cleanUTF8($str, $force_php = false)   X-Ref
Cleans a UTF-8 string for well-formedness and SGML validity

It will parse according to UTF-8 and return a valid UTF8 string, with
non-SGML codepoints excluded.

Specifically, it will permit:
\x{9}\x{A}\x{D}\x{20}-\x{7E}\x{A0}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}
Source: https://www.w3.org/TR/REC-xml/#NT-Char
Arguably this function should be modernized to the HTML5 set
of allowed characters:
https://www.w3.org/TR/html5/syntax.html#preprocessing-the-input-stream
which simultaneously expand and restrict the set of allowed characters.

param: string $str The string to clean
param: bool $force_php
return: string

unichr($code)   X-Ref
No description

iconvAvailable()   X-Ref

return: bool

convertToUTF8($str, $config, $context)   X-Ref
Convert a string to UTF-8 based on configuration.

param: string $str The string to convert
param: HTMLPurifier_Config $config
param: HTMLPurifier_Context $context
return: string

convertFromUTF8($str, $config, $context)   X-Ref
Converts a string from UTF-8 based on configuration.

param: string $str The string to convert
param: HTMLPurifier_Config $config
param: HTMLPurifier_Context $context
return: string

convertToASCIIDumbLossless($str)   X-Ref
Lossless (character-wise) conversion of HTML to ASCII

param: string $str UTF-8 string to be converted to ASCII
return: string ASCII encoded string with non-ASCII character entity-ized

testIconvTruncateBug()   X-Ref
glibc iconv has a known bug where it doesn't handle the magic
//IGNORE stanza correctly.  In particular, rather than ignore
characters, it will return an EILSEQ after consuming some number
of characters, and expect you to restart iconv as if it were
an E2BIG.  Old versions of PHP did not respect the errno, and
returned the fragment, so as a result you would see iconv
mysteriously truncating output. We can work around this by
manually chopping our input into segments of about 8000
characters, as long as PHP ignores the error code.  If PHP starts
paying attention to the error code, iconv becomes unusable.

return: int Error code indicating severity of bug.

testEncodingSupportsASCII($encoding, $bypass = false)   X-Ref
This expensive function tests whether or not a given character
encoding supports ASCII. 7/8-bit encodings like Shift_JIS will
fail this test, and require special processing. Variable width
encodings shouldn't ever fail.

param: string $encoding Encoding name to test, as per iconv format
param: bool $bypass Whether or not to bypass the precompiled arrays.
return: Array of UTF-8 characters to their corresponding ASCII,