Today I wanted to take a look at the logs for CFBloggers. I normally just use Google Analytics, but with CFBloggers being so Ajax-heavy, I was curious what the IIS logs would be like compared to Google. I downloaded the logs from the last two months and then realized I had no idea how to parse them. I used to have some decent tools for Windows, but didn't know of anything for OSX.
So I thought to myself, "Self - ColdFusion 8 reads files super quick, why not do it in CF?" I agreed with myself and decided to give it a go. It reminded me that one of the very first apps I wrote, back in 92ish, was a Perl app to parse web site log files.
To start off, I decided that I'd just parse one log file. I was going to make the code parse a directory of log files and use CFTHREAD, but shoot, I really just wanted to look at a day and see what it was like compared to Google. So I began with:
<cfset fileToRead = "/Users/ray/Desktop/cfbloggers logs/ex071219.log">
And then I had some coffee and came back two hours later. Next I wrote the meat of the code. First off - ColdFusion 8 makes it super easy to read in a file line by line:
<cfloop index="line" file="#fileToRead#">
Now how do you parse IIS? I looked at the file and noticed it began with a few lines that started with #. One of these lines was a 'header' line that defined the columns. So I began with logic that said: If start with #, ignore unless header line, and create a query otherwise.
<cfif left(line, 1) is "##">
<cfif findNoCase("##Fields:", line) is 1 and not isDefined("data")>
Data is the name of the query I'm going to create. Why bother checking it? Turns out IIS will rewrite the header info multiple times in the log file. I couldn't understand why my 5k line log file kept saying I had 200 requests. It was the multiple headers causing my query to recreate itself. In theory I'm thinking it's possible for IIS to change format mid-file, but I pretended that I didn't know that.
So once I find the line that begins with #Fields, I had to do some parsing:
<cfset cols = replace(line, "##Fields: ","")>
<!--- iis has cols with - in it, change to _ --->
<cfset cols = replace(cols, "-", "_", "all")>
<!--- also may have ...() --->
<cfset cols = replace(cols, "(","_","all")>
<cfset cols = replace(cols, ")","_","all")>
<cfset colArray = listToArray(cols," ")>
First off I remove the pretext. Next I change any - to _ since - isn't valid in a query column. Next I replace ( and ) with underscores. I create an array of my column names (even though I'll be using a list again in a bit) to make the code run a bit faster.
<!--- search for a col with trailing _, not critical, but nice --->
<cfloop index="x" from="1" to="#arrayLen(colArray)#">
<cfif right(colArray[x],1) is "_">
<cfset colArray[x] = left(colArray[x], len(colArray[x])-1)>
</cfif>
</cfloop>
This last bit of code simply looks for columns with an _ at the end. This represents an IIS header column named something(foo). I didn't like the dangling underscore so I cleaned it up. If I wanted to, I could also give nicer names to the IIS headers.
Lastly I create the query:
<cfset data = queryNew(arrayToList(colArray))>
Ok, so thats the first branch of the CFIF inside my file loop. The second branch simply parses the line based on the columns:
<cfset queryAddRow(data)>
<!--- begin parsing --->
<cfloop index="x" from="1" to="#arrayLen(colArray)#">
<cfset value = listGetAt(line, x," ")>
<cfset col = colArray[x]>
<cfset querySetCell(data, col, value)>
</cfloop>
And that's it. I wrote a quick query of query to give me a report of my top files:
<cfoutput>Total number of requests: #data.recordCount#</cfoutput>
<cfquery name="test" dbtype="query">
select count(cs_uri_stem) as total, cs_uri_stem as page
from data
group by cs_uri_stem
order by total desc
</cfquery>
<cfdump var="#test#" top="100">
It was here where I discovered that the silly little "stats" pod on CFBloggers got 4k hits yesterday (it auto-reloads). After finding that, I slowed down the reload to 240 seconds. Anyway, enjoy. If folks want to see an example using CFTHREAD, let me know.
Complete Code:
<cfset fileToRead = "/Users/ray/Desktop/cfbloggers logs/ex071219.log">
<cfloop index="line" file="#fileToRead#"> <!--- ignore lines with # in front ---> <!--- except for fields, which will help us define the query, and SHOULD come first ---> <!--- also, IIS tends to repeat, so if we have data already, ignore ---> <cfif left(line, 1) is "##"> <cfif findNoCase("##Fields:", line) is 1 and not isDefined("data")> <cfset cols = replace(line, "##Fields: ","")> <!--- iis has cols with - in it, change to _ ---> <cfset cols = replace(cols, "-", "", "all")> <!--- also may have ...() ---> <cfset cols = replace(cols, "(","","all")> <cfset cols = replace(cols, ")","_","all")> <cfset colArray = listToArray(cols," ")> <!--- search for a col with trailing , not critical, but nice ---> <cfloop index="x" from="1" to="#arrayLen(colArray)#"> <cfif right(colArray[x],1) is ""> <cfset colArray[x] = left(colArray[x], len(colArray[x])-1)> </cfif> </cfloop> <cfset data = queryNew(arrayToList(colArray))> </cfif> <cfelse> <cfif not isDefined("data")> <cfthrow message="#fileToRead# seems to be invalid. No Fields line found."> </cfif>
<cfset queryAddRow(data)>
<!--- begin parsing ---> <cfloop index="x" from="1" to="#arrayLen(colArray)#"> <cfset value = listGetAt(line, x," ")> <cfset col = colArray[x]> <cfset querySetCell(data, col, value)> </cfloop>
</cfif> </cfloop>
<cfoutput>Total number of requests: #data.recordCount#</cfoutput>
<cfquery name="test" dbtype="query"> select count(cs_uri_stem) as total, cs_uri_stem as page from data group by cs_uri_stem order by total desc </cfquery> <cfdump var="#test#" top="100">
p.s. It just occurred to me... I should consider writing an AIR app. :)
Archived Comments
Hi Ray,
There is a skunkworks project out of MS that is incredibly useful when dealing with IIS, and all kinds of other log files.
I highly recomend checking out LogParser ( now on version 2.2). It is incredibly fast and useful.
http://www.microsoft.com/do...
Oops... meant to include this from description:
Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.
Not to distract from the nifty CF you wrote, but I've found AWStats to be VERY good at parsing IIS logfiles and showing a readable result. It's a web based server side app that runs on Perl. I believe you'll need administrative access to the server to set it up, but otherwise this program has been working very well for me for 3 years.
http://awstats.sourceforge....
I think the first 2 posters have missed the point! of course there are tools to read IIS log files. the point is you can now do it in CF!
Why is this cool ? cos once you have parsed the logs into a databse (why they dont just write them directly to a SQL datasource ill never know...) you can do some funky things with CFChart and build a mini web stats engine within your own website.
we are thinking of doing this ourselves to complement our existing web stats package with some extra charting features.
@Pat: I was responding directly to Ray's comment, "...realized I had no idea how to parse them. I used to have some decent tools for Windows, but didn't know of anything for OSX".
I think what Ray did with the log file parsing is pretty cool and you have a good point about extending it to your own needs. For me, AWStats is a well maintained, cross platform solution with more functionality that I'd like to replicate.
@Pat: Actually, with IIS, you can set the logs to write directly to a database instead of the text files. I tried it at work once, but with our level of traffic, the database was too large.
@Pat: I didn't miss the point at all! I was simply letting Ray and his readers about another tool.
If you want to keep CF in the mix, you call logparser using CFEXECUTE. This will be considerable faster than reading in the file.
For others who may be interested to learn more about LogParser, I'll share that I'd done an article on it (and some features that may interest CF folks) in the now defunct CFDJ. It's still available online:
http://coldfusion.sys-con.c...
Even so, yes, thanks Ray for showing how folks could do it in CFML if they wanted. As Gus notes (and I state in the article), the LogParser tool really does so much more than most realize.
If one wanted to read other than IIS logs, then rather than tweak Ray's code for each type of log, it instead knows how to read various kinds of logs and could itself output a uniform stream (or simply a DB) that could then be read in by a CF tool. Just one of many possibilities to think about.
FWIW, when I tried this on one of my IIS logs that was still being written to, the code got an error when it was processing the last line because it was empty, as far as CF was concerned. The error was on the ListGetAt. I solved it by wrapping a test for <cfif trim(line) is not ""> around the loop containing it, which solved the problem.
Good catch. It may be even better to also check the listLen. If not == the number of columns, ignore the line.
Spectacular. This probably just saved me a good hour. Props.
"why they dont just write them directly to a SQL datasource ill never know..."
1 - You can: http://support.microsoft.co...
2 - You must not have very high traffic sites. I have a couple of locations that generate half a gig of logfiles a day, each. That might get a little expensive in SQL. As files, I just have a CF script running every night:
zip = CreateObject("component", "Zip");
status = zip.gzipAddFile("#workingdir#", "#workingdir##yesterdayfile#");
looping through the .log files, gzipping them, then deleting them. A couple hours later a second scheduled task uses robocopy to move all the gzipped files off to the box I use to run web reports with FunnelWeb.
BTW -- FunnelWeb Enterprise is free now (Quest stopped developing and supporting it), and has versions for Windows, Linux, Solaris, and OSX.
http://www.quest.com/funnel...
Thanks for the info JC. I had used Funnel Web in the past, but the last time I tried to use it I could never find the free version. I didn't know they had an OSX version. That's worth taking a look at for sure.
Hello Ray,
Is there a way to use CF to create an app that removes unwanted entries from the IIS log files? Something that is similar to the Log Scrubber software at this page: http://www.knechtology.com/... . Any suggestions you can provide is appreciated. Thanks.
--Chung Lee
You could modify my code above to simply exclude things.