Posted in ColdFusion | Posted on 11-16-2006 | 7,796 views
Earlier today Yahoo and Google announced their collaboration on Sitemaps.org. Sitemaps provide a way to describe to a search engine what pages make up your web site. I've had sitemap support in BlogCFC for a while, but today I wrote a little UDF you can use to generate sitemap xml. It will take either a list of URLs or a query of URLs. Enjoy. I'll post it to CFLib later in the week.
1<cffunction name="generateSiteMap" output="false" returnType="xml">
2 <cfargument name="data" type="any" required="true">
3 <cfargument name="lastmod" type="date" required="false">
4 <cfargument name="changefreq" type="string" required="false">
5 <cfargument name="priority" type="numeric" required="false">
6
7 <cfset var header = "<?xml version=""1.0"" encoding=""UTF-8""?><urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">">
8 <cfset var result = header>
9 <cfset var aurl = "">
10 <cfset var item = "">
11 <cfset var validChangeFreq = "always,hourly,daily,weekly,monthly,yearly,never">
12 <cfset var newDate = "">
13 <cfset var tz = getTimeZoneInfo().utcHourOffset>
14
15 <cfif structKeyExists(arguments, "changefreq") and not listFindNoCase(validChangeFreq, arguments.changefreq)>
16 <cfthrow message="Invalid changefreq (#arguments.changefreq#) passed. Valid values are #validChangeFreq#">
17 </cfif>
18
19 <cfif structKeyExists(arguments, "priority") and (arguments.priority lt 0 or arguments.priority gt 1)>
20 <cfthrow message="Invalid priority (#arguments.priority#) passed. Must be between 0.0 and 1.0">
21 </cfif>
22
23 <!--- reformat datetime as w3c datetime / http://www.w3.org/TR/NOTE-datetime --->
24 <cfif structKeyExists(arguments, "lastmod")>
25 <cfset newDate = dateFormat(arguments.lastmod, "YYYY-MM-DD") & "T" & timeFormat(arguments.lastmod, "HH:mm")>
26 <cfif tz gte 0>
27 <cfset newDate = newDate & "-" & tz & ":00">
28 <cfelse>
29 <cfset newDate = newDate & "+" & tz & ":00">
30 </cfif>
31 </cfif>
32
33 <!--- Support either a query or list of URLs --->
34 <cfif isSimpleValue(arguments.data)>
35 <cfloop index="aurl" list="#arguments.data#">
36 <cfsavecontent variable="item">
37<cfoutput>
38<url>
39 <loc>#xmlFormat(aurl)#</loc>
40 <cfif structKeyExists(arguments,"lastmod")>
41 <lastmod>#newDate#</lastmod>
42 </cfif>
43 <cfif structKeyExists(arguments,"changefreq")>
44 <changefreq>#arguments.changefreq#</changefreq>
45 </cfif>
46 <cfif structKeyExists(arguments,"priority")>
47 <priority>#arguments.priority#</priority>
48 </cfif>
49</url>
50</cfoutput>
51 </cfsavecontent>
52 <cfset item = trim(item)>
53 <cfset result = result & item>
54 </cfloop>
55
56 <cfelseif isQuery(arguments.data)>
57 <cfloop query="arguments.data">
58 <cfsavecontent variable="item">
59<cfoutput>
60<url>
61 <loc>#xmlFormat(url)#</loc>
62 <cfif listFindNoCase(arguments.data.columnlist,"lastmod")>
63 <cfset newDate = dateFormat(lastmod, "YYYY-MM-DD") & "T" & timeFormat(lastmod, "HH:mm")>
64 <cfif tz gte 0>
65 <cfset newDate = newDate & "-" & tz & ":00">
66 <cfelse>
67 <cfset newDate = newDate & "+" & tz & ":00">
68 </cfif>
69 <lastmod>#newDate#</lastmod>
70 </cfif>
71 <cfif listFindNoCase(arguments.data.columnlist,"changefreq")>
72 <changefreq>#changefreq#</changefreq>
73 </cfif>
74 <cfif listFindNoCase(arguments.data.columnlist,"priority")>
75 <priority>#priority#</priority>
76 </cfif>
77</url>
78</cfoutput>
79 </cfsavecontent>
80 <cfset item = trim(item)>
81 <cfset result = result & item>
82
83 </cfloop>
84 </cfif>
85
86 <cfset result = result & "</urlset>">
87
88 <cfreturn result>
89
90</cffunction>
2 <cfargument name="data" type="any" required="true">
3 <cfargument name="lastmod" type="date" required="false">
4 <cfargument name="changefreq" type="string" required="false">
5 <cfargument name="priority" type="numeric" required="false">
6
7 <cfset var header = "<?xml version=""1.0"" encoding=""UTF-8""?><urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">">
8 <cfset var result = header>
9 <cfset var aurl = "">
10 <cfset var item = "">
11 <cfset var validChangeFreq = "always,hourly,daily,weekly,monthly,yearly,never">
12 <cfset var newDate = "">
13 <cfset var tz = getTimeZoneInfo().utcHourOffset>
14
15 <cfif structKeyExists(arguments, "changefreq") and not listFindNoCase(validChangeFreq, arguments.changefreq)>
16 <cfthrow message="Invalid changefreq (#arguments.changefreq#) passed. Valid values are #validChangeFreq#">
17 </cfif>
18
19 <cfif structKeyExists(arguments, "priority") and (arguments.priority lt 0 or arguments.priority gt 1)>
20 <cfthrow message="Invalid priority (#arguments.priority#) passed. Must be between 0.0 and 1.0">
21 </cfif>
22
23 <!--- reformat datetime as w3c datetime / http://www.w3.org/TR/NOTE-datetime --->
24 <cfif structKeyExists(arguments, "lastmod")>
25 <cfset newDate = dateFormat(arguments.lastmod, "YYYY-MM-DD") & "T" & timeFormat(arguments.lastmod, "HH:mm")>
26 <cfif tz gte 0>
27 <cfset newDate = newDate & "-" & tz & ":00">
28 <cfelse>
29 <cfset newDate = newDate & "+" & tz & ":00">
30 </cfif>
31 </cfif>
32
33 <!--- Support either a query or list of URLs --->
34 <cfif isSimpleValue(arguments.data)>
35 <cfloop index="aurl" list="#arguments.data#">
36 <cfsavecontent variable="item">
37<cfoutput>
38<url>
39 <loc>#xmlFormat(aurl)#</loc>
40 <cfif structKeyExists(arguments,"lastmod")>
41 <lastmod>#newDate#</lastmod>
42 </cfif>
43 <cfif structKeyExists(arguments,"changefreq")>
44 <changefreq>#arguments.changefreq#</changefreq>
45 </cfif>
46 <cfif structKeyExists(arguments,"priority")>
47 <priority>#arguments.priority#</priority>
48 </cfif>
49</url>
50</cfoutput>
51 </cfsavecontent>
52 <cfset item = trim(item)>
53 <cfset result = result & item>
54 </cfloop>
55
56 <cfelseif isQuery(arguments.data)>
57 <cfloop query="arguments.data">
58 <cfsavecontent variable="item">
59<cfoutput>
60<url>
61 <loc>#xmlFormat(url)#</loc>
62 <cfif listFindNoCase(arguments.data.columnlist,"lastmod")>
63 <cfset newDate = dateFormat(lastmod, "YYYY-MM-DD") & "T" & timeFormat(lastmod, "HH:mm")>
64 <cfif tz gte 0>
65 <cfset newDate = newDate & "-" & tz & ":00">
66 <cfelse>
67 <cfset newDate = newDate & "+" & tz & ":00">
68 </cfif>
69 <lastmod>#newDate#</lastmod>
70 </cfif>
71 <cfif listFindNoCase(arguments.data.columnlist,"changefreq")>
72 <changefreq>#changefreq#</changefreq>
73 </cfif>
74 <cfif listFindNoCase(arguments.data.columnlist,"priority")>
75 <priority>#priority#</priority>
76 </cfif>
77</url>
78</cfoutput>
79 </cfsavecontent>
80 <cfset item = trim(item)>
81 <cfset result = result & item>
82
83 </cfloop>
84 </cfif>
85
86 <cfset result = result & "</urlset>">
87
88 <cfreturn result>
89
90</cffunction>


http://www.cflib.org/udf.cfm?id=1596
Code changes occur after the comment "reformat datetime as w3c datetime / http://www.w3.org/TR/NOTE-datetime".
1. Change the test of tz to be "gt" rather than "gte". To be honest this is really just a personal style thing, +00:00 looks better than -00:00 to me, and doesn't seem to effect Google.
2. Make the hour number format "00" for the newDate offset eg. numberFormat(tz,"00"). So the lines should read newDate = newDate & "-" & numberFormat(tz,"00") & ":00" and newDate = newDate & "+" & numberFormat(tz,"00") & ":00"
HTH
I am having issues with cfdirectory w/recursion at webroot. I get the pesky null pointer error, which I am attributing to archived directories, etc bloating the query. I still need to prove that is the cause.
http://www.adobe.com/go/wish
Is it supposed to generate a sitemap.xml file ?
And if so, how? I can't find any option to do this.
Or do I need to update as I am on blogCFC 5.5
<cfset siteMapXML = generateSiteMap(data=urls,changefreq="daily",priority="1.0", lastmod=now())>
<cfdump var="#xmlParse(siteMapXML)#">
<cfset siteMapXML = generateSiteMap(qurls)>
<cfdump var="#xmlParse(siteMapXML)#">
I want these combined as a need to put it all to one xml sitemap, the .cfm sitemap takes to long to load, big sitemap.
thanks
I'm with Adam on this. As indicated at siteMap.org, that the search engine will look for an XML document called sitemap is clear.
That the generateSiteMap functions returns this XML code for the urls provided is also clear. That the xmlparse function turns this into an xml object (and the xml code is visible in codeview on the webpage) is clear.
So instead of using cfdump, you pick whichever of the three options you prefer and then just surround the xmlparse with cfoutput tags and this hands the xml to the search engine?
Its one of those things where you could end up ten years later in a bar discussing XML and girls only to find out that you've been setting up invalid site maps for ten years.
[Add Comment] [Subscribe to Comments]