ColdFusion 8 Tip - Reading the top (or another slice) of a file

In the ColdFusion IRC channel today, someone asked about reading just the top portion of a file. While she was looking for a command line solution and not ColdFusion, I thought it would be interesting to share how easy it is in ColdFusion 8 using the new file attribute to CFLOOP. This code will loop over the first ten lines of a file and display them:

<cfset myfile = server.coldfusion.rootdir & "/logs/server.log">

<cfset c = 0> <cfloop file="#myfile#" index="line"> <cfoutput>#line#<br /></cfoutput> <cfset c++> <cfif c gte 10> <cfbreak> </cfif> </cfloop>

I first create a variable to point to my server.log file. I then create a counter variable "c". Then I simply use the file attribute for cfloop to loop over the file. When I hit 10 lines, I break. No matter how big the file is, this code will run extremely fast as it won't need to parse in the entire file. My server.log file could be 10 gigs and this would still run quickly.

But wait - it gets betteer. TJ Downes pointed out that you can provide a FROM and TO and the tag will actually display a slice, or portion, of the file. This is not documented as far as I know. The following code is shorter and equivalent to the earlier listing:

<cfset myfile = server.coldfusion.rootdir & "/logs/server.log">

<cfloop file="#myfile#" index="line" from="1" to="10"> <cfoutput>#line#<br /></cfoutput> </cfloop>

One thing to watch out - if you try to read beyond the size of the file, you will get an error. In that case, the first listing would be safer as it would support a file of any size.

Archived Comments

Comment 1 by Shane Zehnder posted on 9/7/2007 at 8:09 PM

That would have come in really handy back when I was doing a flat-file db conversion. Much easier to just read it in one line at a time instead of parsing it out of one big chunk of text.

Comment 2 by TJ Downes posted on 9/7/2007 at 8:20 PM

If I recall, the person who told that performance-wise this was far faster than parsing line by line, especially if you do not need to start at the top of the file. I haven't run the tests to be certain.

Given that, I think I would just toss in a cftry to catch the EoF error and handle it elegantly using a break. I think Ill run some tests to see how much of a performance gain you get. Now just to find a massive log file......

Comment 3 by Raymond Camden posted on 9/7/2007 at 8:24 PM

TJ, you are indeed right. cfloop/file, and the new file funcs, are all faster than the cffile tag, which reads in the entire file in memory.

Comment 4 by TJ Downes posted on 9/7/2007 at 8:37 PM

Unfortunately you cant break out of the loop with a EoF error. Its the tag itself reaching the EoF and catching that simply stops processing of the page. So I guess the rules of thumb for using from & to attributes when reading a file is that you must know the file's length.

Comment 5 by Eric Roberts posted on 9/7/2007 at 10:14 PM

Couldn't you just get the file size and use that as a limiting factor for the "to" attribute (make it a variable and throw in some logic before the loop if x gt file length, x=file length?

Eric

Comment 6 by Raymond Camden posted on 9/7/2007 at 10:19 PM

File size though has nothing to do with the number of lines. You could have one VERY long line and a bunch of small ones.

Comment 7 by Mike Benner posted on 9/7/2007 at 10:30 PM

It is documented in the error/debug information sent to the browser. Now to find the time to break every tag to find hidden documentation.

Attribute validation error for the CFLOOP tag.
# The tag has an invalid attribute combination: condition,file,index. Possible combinations are:Required attributes: 'file,index'. Optional attributes: 'charset,from,to'.

Comment 8 by Mike Benner posted on 9/7/2007 at 10:46 PM

TJ,

You can do the following:

<code>
<cfset myfile = server.coldfusion.rootdir & "/logs/server.log">
<cftry>
--------BoF----------<br />
<cfloop file="#myfile#" index="line" from="1" to="10">
<cfoutput>#line#<br /></cfoutput>
</cfloop>
<cfcatch>
--------EOF-------<br />
</cfcatch>
</cftry>
....More Processing....
</code>

All lines are output, no error is displayed to the user and it allows you process the rest of the page.

Comment 9 by Raymond Camden posted on 9/7/2007 at 10:48 PM

Mike, to make your code more safe, you should check the exception type in catch. For example, if I provide a file that doesn't exist, an error will be thrown, but you don't want to ignore that error.

Comment 10 by TJ Downes posted on 9/8/2007 at 2:27 AM

Thanks Mike, I thought I tried that method and got the EoF error... Ill have to try it again

Comment 11 by Mike Benner posted on 9/8/2007 at 2:45 AM

TJ,

I tried it on my fusion reactor logs and it worked great. Couldn't be better timing as I was working with files.

Ray,

In this instance couldn't I just do fileExists() before the loop? Otherwise, how would you recommend doing the try/catch this instance?

Comment 12 by Raymond Camden posted on 9/8/2007 at 6:54 AM

Mike - sure, fileExists would catch it, but it wouldn't catch CF not being able to read it. getFileInfo would check that. My point is though - you can have cfcatch look for a specific type of exception. You would want to do that so you ONLY catch that error, and not others.

Comment 13 by Rupesh kumar posted on 9/8/2007 at 9:02 PM

One way to break out of loop upon end of file is use the actual exception thrown. Here is the actual catch statement that you can use.
<cfcatch type="coldfusion.tagext.io.FileUtils$EndOfFileException">

On a related note, I blogged about it a while back at
http://coldfused.blogspot.c...
http://coldfused.blogspot.c...

A better and elegant way to do what you want is using new File IO using file handle where you can actually check if you have reached the end of file.

Comment 14 by Raymond Camden posted on 9/8/2007 at 9:09 PM

Thanks for adding those links Rupesh.

The _one_ function I wish existed was a FileSeek. That would be useful for jumping to a position in a file (like to examine MP3 files)

Comment 15 by TT posted on 9/10/2007 at 8:46 PM

Just curious, does anyone know how cf8 knows the end of a line? chr(10) or chr(13) or both?

Comment 16 by Raymond Camden posted on 9/10/2007 at 10:07 PM

There is a Java property, or method for it, so I assume they use that. Or they sniff the first part of the file.