Sitemap generator class

Here is Curl multi sitemap class, which can be used to generate multiple sitemaps by crawling sites

It can crawl one or more sites to retrieve its pages and follow links recursively and determine the addresses of all pages to include in a XML sitemap.

It can ignore given URLs to avoid crawling and including in the sitemap.

The class uses the Curl extension multi-request support to retrieve multiple site pages at the same time. More about it's efficiency here at comparing filegetcontents, curl and curl multi-request

The class may also notify Google, Bing, Yahoo, Ask and Weblogs when the sitemap is updated.

Here is an example of this class usage:



Example codes

//setting to no time limit, 

//declaring class instance
$sitemap = new sitemap();

//optionally set proxy server name and port or ip and port
//comment-out or set to an empty string to disable proxy use

//setting rules to ignore URLs which contains these substrings
        array("javascript:", ".css", ".js", 
              ".ico", ".jpg", ".png", ".jpeg", 
              ".swf", ".gif"));

//parsing one page and gathering links

//parsing other page and gathering links

//return URL list as array
//$arr = $sitemap->get_array();

//echo "<pre>";
//echo "</pre>";

header ("content-type: text/xml");
//generating sitemap
$map = $sitemap->generate_sitemap();

//submitting site map to Google, Yahoo, Bing, Ask and Moreover services

echo $map;

Examples in action

Example scripts provided with package in action:

Method list

Ignore URL substrings

Method nameset_ignore($ignore_list)
DescriptionSet rules to exclude URLs from sitemap, which contains these substrings
Input parametersarray $ignore_list - array with *wildcards* as values to ignore
Example inputset_ignore(array("javascript:", ".css", ".js", ".ico", ".jpg", ".png", ".jpeg", ".swf", ".gif"))

Set proxy

Method nameset_proxy($host_port)
DescriptionSet proxy host and port
Input parametersstring $host_port - profy host and port , for example someproxy:8080 or
Example inputset_proxy("")

Get links

Method nameget_links($domain)
DescriptionParse provided url to collect all urls from same domain
Input parametersstring $domain - url to website
Example inputget_links("")

Get array of URLs

Method nameget_array()
DescriptionReturn array of all colected urls after calling method get_links
Input parametersstring $domain - url to website
Example inputget_links("")

Notify services about your sitemap update

Method nameping($sitemap_url, $title ="", $siteurl = "")
DescriptionNotifies services like Google, Yahoo, Ask, Bing and Moreover about your sitemap and sitemap updates
Input parameters

string $sitemap_url - url to generated sitemap

string title - your website title (optional)

string $siteurl - URL to your site (optional)

Example inputping("", "Code snippets", $siteurl = "")

Generate XML sitemap

Method namegenerate_sitemap()
DescriptionGenerate XML site map based on method get_links collected URLs and returns it as string

Possible error messages

List of all errors and meanings

Error textMeaningSolution
Provided file IMAGE_PATH isn't correct image formatFile you are trying to add isn't image, or has unsupported format.Convert it to any supported format, like png, jpg or gif
Provided file IMAGE_PATH doesn't existFile you are trying to add doesn't existCheck your specified path and/or file/directory permissions

Latest changes

15.09.2010 - added proxy support

