Search moodle.org's
Developer Documentation

See Release Notes

  • Bug fixes for general core bugs in 4.3.x will end 7 October 2024 (12 months).
  • Bug fixes for security issues in 4.3.x will end 21 April 2025 (18 months).
  • PHP version: minimum PHP 8.0.0 Note: minimum PHP version has increased since Moodle 4.1. PHP 8.2.x is supported too.

Copyright (c) 2008, David R. Nadeau, NadeauSoftware.com. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

File Size: 756 lines (26 kb)
Included or required: 1 time
Referenced: 0 times
Includes or requires: 0 files

Defines 7 functions


Functions that are not part of a class:

url_to_absolute( $baseUrl, $relativeUrl )   X-Ref
Combine a base URL and a relative URL to produce a new
absolute URL.  The base URL is often the URL of a page,
and the relative URL is a URL embedded on that page.

This function implements the "absolutize" algorithm from
the RFC3986 specification for URLs.

This function supports multi-byte characters with the UTF-8 encoding,
per the URL specification.

Parameters:
baseUrl        the absolute base URL.

url        the relative URL to convert.

Return values:
An absolute URL that combines parts of the base and relative
URLs, or FALSE if the base URL is not absolute or if either
URL cannot be parsed.

url_remove_dot_segments( $path )   X-Ref
Filter out "." and ".." segments from a URL's path and return
the result.

This function implements the "remove_dot_segments" algorithm from
the RFC3986 specification for URLs.

This function supports multi-byte characters with the UTF-8 encoding,
per the URL specification.

Parameters:
path    the path to filter

Return values:
The filtered path with "." and ".." removed.

split_url( $url, $decode=FALSE)   X-Ref
This function parses an absolute or relative URL and splits it
into individual components.

RFC3986 specifies the components of a Uniform Resource Identifier (URI).
A portion of the ABNFs are repeated here:

URI-reference    = URI
/ relative-ref

URI        = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

relative-ref    = relative-part [ "?" query ] [ "#" fragment ]

hier-part    = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty

relative-part    = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty

authority    = [ userinfo "@" ] host [ ":" port ]

So, a URL has the following major components:

scheme
The name of a method used to interpret the rest of
the URL.  Examples:  "http", "https", "mailto", "file'.

authority
The name of the authority governing the URL's name
space.  Examples:  "example.com", "user@example.com",
"example.com:80", "user:password@example.com:80".

The authority may include a host name, port number,
user name, and password.

The host may be a name, an IPv4 numeric address, or
an IPv6 numeric address.

path
The hierarchical path to the URL's resource.
Examples:  "/index.htm", "/scripts/page.php".

query
The data for a query.  Examples:  "?search=google.com".

fragment
The name of a secondary resource relative to that named
by the path.  Examples:  "#section1", "#header".

An "absolute" URL must include a scheme and path.  The authority, query,
and fragment components are optional.

A "relative" URL does not include a scheme and must include a path.  The
authority, query, and fragment components are optional.

This function splits the $url argument into the following components
and returns them in an associative array.  Keys to that array include:

"scheme"    The scheme, such as "http".
"host"        The host name, IPv4, or IPv6 address.
"port"        The port number.
"user"        The user name.
"pass"        The user password.
"path"        The path, such as a file path for "http".
"query"        The query.
"fragment"    The fragment.

One or more of these may not be present, depending upon the URL.

Optionally, the "user", "pass", "host" (if a name, not an IP address),
"path", "query", and "fragment" may have percent-encoded characters
decoded.  The "scheme" and "port" cannot include percent-encoded
characters and are never decoded.  Decoding occurs after the URL has
been parsed.

Parameters:
url        the URL to parse.

decode        an optional boolean flag selecting whether
to decode percent encoding or not.  Default = TRUE.

Return values:
the associative array of URL parts, or FALSE if the URL is
too malformed to recognize any parts.

join_url( $parts, $encode=FALSE)   X-Ref
This function joins together URL components to form a complete URL.

RFC3986 specifies the components of a Uniform Resource Identifier (URI).
This function implements the specification's "component recomposition"
algorithm for combining URI components into a full URI string.

The $parts argument is an associative array containing zero or
more of the following:

"scheme"    The scheme, such as "http".
"host"        The host name, IPv4, or IPv6 address.
"port"        The port number.
"user"        The user name.
"pass"        The user password.
"path"        The path, such as a file path for "http".
"query"        The query.
"fragment"    The fragment.

The "port", "user", and "pass" values are only used when a "host"
is present.

The optional $encode argument indicates if appropriate URL components
should be percent-encoded as they are assembled into the URL.  Encoding
is only applied to the "user", "pass", "host" (if a host name, not an
IP address), "path", "query", and "fragment" components.  The "scheme"
and "port" are never encoded.  When a "scheme" and "host" are both
present, the "path" is presumed to be hierarchical and encoding
processes each segment of the hierarchy separately (i.e., the slashes
are left alone).

The assembled URL string is returned.

Parameters:
parts        an associative array of strings containing the
individual parts of a URL.

encode        an optional boolean flag selecting whether
to do percent encoding or not.  Default = true.

Return values:
Returns the assembled URL string.  The string is an absolute
URL if a scheme is supplied, and a relative URL if not.  An
empty string is returned if the $parts array does not contain
any of the needed values.

encode_url($url)   X-Ref
This function encodes URL to form a URL which is properly
percent encoded to replace disallowed characters.

RFC3986 specifies the allowed characters in the URL as well as
reserved characters in the URL. This function replaces all the
disallowed characters in the URL with their repective percent
encodings. Already encoded characters are not encoded again,
such as '%20' is not encoded to '%2520'.

Parameters:
url        the url to encode.

Return values:
Returns the encoded URL string.

extract_html_urls( $text )   X-Ref
Extract URLs from a web page.

URLs are extracted from a long list of tags and attributes as defined
by the HTML 2.0, HTML 3.2, HTML 4.01, and draft HTML 5.0 specifications.
URLs are also extracted from tags and attributes that are common
extensions of HTML, from the draft Forms 2.0 specification, from XHTML,
and from WML 1.3 and 2.0.

The function returns an associative array of associative arrays of
arrays of URLs.  The outermost array's keys are the tag (element) name,
such as "a" for <a> or "img" for <img>.  The values for these entries
are associative arrays where the keys are attribute names for those
tags, such as "href" for <a href="...">.  Finally, the values for
those arrays are URLs found in those tags and attributes throughout
the text.

Parameters:
text        the UTF-8 text to scan

Return values:
an associative array where keys are tags and values are an
associative array where keys are attributes and values are
an array of URLs.

See:
http://nadeausoftware.com/articles/2008/01/php_tip_how_extract_urls_web_page

extract_css_urls( $text )   X-Ref
Extract URLs from UTF-8 CSS text.

URLs within @import statements and url() property functions are extracted
and returned in an associative array of arrays.  Array keys indicate
the use context for the URL, including:

"import"
"property"

Each value in the associative array is an array of URLs.

Parameters:
text        the UTF-8 text to scan

Return values:
an associative array of arrays of URLs.

See:
http://nadeausoftware.com/articles/2008/01/php_tip_how_extract_urls_css_file