Differences Between: [Versions 310 and 402] [Versions 311 and 402] [Versions 39 and 402] [Versions 400 and 402]
(no description)
File Size: | 617 lines (26 kb) |
Included or required: | 0 times |
Referenced: | 0 times |
Includes or requires: | 0 files |
HTMLPurifier_Encoder:: (12 methods):
__construct()
muteErrorHandler()
unsafeIconv()
iconv()
cleanUTF8()
unichr()
iconvAvailable()
convertToUTF8()
convertFromUTF8()
convertToASCIIDumbLossless()
testIconvTruncateBug()
testEncodingSupportsASCII()
Class: HTMLPurifier_Encoder - X-Ref
A UTF-8 specific character encoder that handles cleaning and transforming.__construct() X-Ref |
Constructor throws fatal error if you attempt to instantiate class |
muteErrorHandler() X-Ref |
Error-handler that mutes errors, alternative to shut-up operator. |
unsafeIconv($in, $out, $text) X-Ref |
iconv wrapper which mutes errors, but doesn't work around bugs. param: string $in Input encoding param: string $out Output encoding param: string $text The text to convert return: string |
iconv($in, $out, $text, $max_chunk_size = 8000) X-Ref |
iconv wrapper which mutes errors and works around bugs. param: string $in Input encoding param: string $out Output encoding param: string $text The text to convert param: int $max_chunk_size return: string |
cleanUTF8($str, $force_php = false) X-Ref |
Cleans a UTF-8 string for well-formedness and SGML validity It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded. Specifically, it will permit: \x{9}\x{A}\x{D}\x{20}-\x{7E}\x{A0}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF} Source: https://www.w3.org/TR/REC-xml/#NT-Char Arguably this function should be modernized to the HTML5 set of allowed characters: https://www.w3.org/TR/html5/syntax.html#preprocessing-the-input-stream which simultaneously expand and restrict the set of allowed characters. param: string $str The string to clean param: bool $force_php return: string |
unichr($code) X-Ref |
No description |
iconvAvailable() X-Ref |
return: bool |
convertToUTF8($str, $config, $context) X-Ref |
Convert a string to UTF-8 based on configuration. param: string $str The string to convert param: HTMLPurifier_Config $config param: HTMLPurifier_Context $context return: string |
convertFromUTF8($str, $config, $context) X-Ref |
Converts a string from UTF-8 based on configuration. param: string $str The string to convert param: HTMLPurifier_Config $config param: HTMLPurifier_Context $context return: string |
convertToASCIIDumbLossless($str) X-Ref |
Lossless (character-wise) conversion of HTML to ASCII param: string $str UTF-8 string to be converted to ASCII return: string ASCII encoded string with non-ASCII character entity-ized |
testIconvTruncateBug() X-Ref |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable. return: int Error code indicating severity of bug. |
testEncodingSupportsASCII($encoding, $bypass = false) X-Ref |
This expensive function tests whether or not a given character encoding supports ASCII. 7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail. param: string $encoding Encoding name to test, as per iconv format param: bool $bypass Whether or not to bypass the precompiled arrays. return: Array of UTF-8 characters to their corresponding ASCII, |