ColdFusion Sample - Upload and Parse an Excel File

This post is more than 2 years old.

In this ColdFusion sample I'm going to demonstrate how to allow users to upload Excel files and use ColdFusion to both validate and read the content within. Let's begin by designing a simple upload form.


<cfif structKeyExists(variables, "errors")>
	<cfoutput>
	<p>
	<b>Error: #variables.errors#</b>
	</p>
	</cfoutput>
</cfif>
	
<form action="test.cfm" enctype="multipart/form-data" method="post">
		  
	  <input type="file" name="xlsfile" required>
	  <input type="submit" value="Upload XLS File">
		  
</form>

Nothing too complex here. The form has a grand total of one field - the file field named xlsfile. Note above the form is a simple set of ColdFusion logic to notice an errors variable and output it. In case you're curious, this value will be created a bit later in our example. So - let's process the upload. Here's the code that handles that.


<cfif structKeyExists(form, "xlsfile") and len(form.xlsfile)>

	<!--- Destination outside of web root --->
	<cfset dest = getTempDirectory()>

	<cffile action="upload" destination="#dest#" filefield="xlsfile" result="upload" nameconflict="makeunique">

	<cfif upload.fileWasSaved>
		<cfset theFile = upload.serverDirectory & "/" & upload.serverFile>
		<cfif isSpreadsheetFile(theFile)>
			<cfspreadsheet action="read" src="#theFile#" query="data" headerrow="1">
			<cffile action="delete" file="#theFile#">
			<cfset showForm = false>
		<cfelse>
			<cfset errors = "The file was not an Excel file.">
			<cffile action="delete" file="#theFile#">
		</cfif>
	<cfelse>
		<cfset errors = "The file was not properly uploaded.">	
	</cfif>
		
</cfif>

This code block begins with the field check used for our upload. If it exists, and has a value, we have to do some processing. We need a place to store the upload, and as we all know, you never upload files to a directory under web root. Therefore I used the temp directory as a quick storage place. I upload the file using cffile/action=upload. If the file was successfully uploaded, I use isSpreadsheetFile() to determine if the file was a valid spreadsheet. This covers XLS, XLSX, and even OpenOffice documents. If it is a valid spreadsheet, I read it in using the cfspreadsheet tag. Notice the last two arguments.

The query argument tells ColdFusion to parse the spreadsheet data into a query. This assumes we only want the first sheet. If you want to work with other sheets, that's definitely possible.

The last argument, headerrow, tells ColdFusion to consider the first row to be column headers. It may not always be advisable to assume this. But for now, we will.

The rest of that block simply handles errors and specifying if we should show the form again. If the user uploaded a valid spreadsheet we don't want to show the form. Instead, we want to display the contents. Let's look at how I did this.


<style>
.ssTable { width: 100%; 
		   border-style:solid;
		   border-width:thin;
}
.ssHeader { background-color: #ffff00; }
.ssTable td, .ssTable th { 
	padding: 10px; 
	border-style:solid;
	border-width:thin;
}
</style>

<p>
Here is the data in your Excel sheet (assuming first row as headers):
</p>

<cfset metadata = getMetadata(data)>
<cfset colList = "">
<cfloop index="col" array="#metadata#">
	<cfset colList = listAppend(colList, col.name)>
</cfloop>

<cfif data.recordCount is 1>
	<p>
	This spreadsheet appeared to have no data.
	</p>
<cfelse>
	<table class="ssTable">
		<tr class="ssHeader">
			<cfloop index="c" list="#colList#">
				<cfoutput><th>#c#</th></cfoutput>
			</cfloop>
		</tr>
		<cfoutput query="data" startRow="2">
			<tr>
			<cfloop index="c" list="#colList#">
				<td>#data[c][currentRow]#</td>
			</cfloop>
			</tr>					
		</cfoutput>
	</table>
</cfif>

So skipping over the CSS, the real meat of the work begins when we get the metadata. Why do we do this? ColdFusion's query object does not maintain the same order of columns that our spreadsheet had. I can use the getMetadata function on the query to get the proper column order. That's the array list you see there.

Next - we do a quick check of the size of the query. We are assuming our spreadsheet has a first row being used as headers. So if we assume that, and there is only one row, then we really don't have any data. Notice then in the next block of the conditional, we use startRow=2 to begin with where we figure the real data starts. After that it's a simple matter of outputting the query dynamically. (For an example of working with dynamic ColdFusion queries, see this blog entry.)

How does it look? Here's the result of uploading a sample XLS sheet.

And below is the complete template. Read on though for more...


<cfset showForm = true>
<cfif structKeyExists(form, "xlsfile") and len(form.xlsfile)>

	<!--- Destination outside of web root --->
	<cfset dest = getTempDirectory()>

	<cffile action="upload" destination="#dest#" filefield="xlsfile" result="upload" nameconflict="makeunique">

	<cfif upload.fileWasSaved>
		<cfset theFile = upload.serverDirectory & "/" & upload.serverFile>
		<cfif isSpreadsheetFile(theFile)>
			<cfspreadsheet action="read" src="#theFile#" query="data" headerrow="1">
			<cffile action="delete" file="#theFile#">
			<cfset showForm = false>
		<cfelse>
			<cfset errors = "The file was not an Excel file.">
			<cffile action="delete" file="#theFile#">
		</cfif>
	<cfelse>
		<cfset errors = "The file was not properly uploaded.">	
	</cfif>
		
</cfif>

<cfif showForm>
	<cfif structKeyExists(variables, "errors")>
		<cfoutput>
		<p>
		<b>Error: #variables.errors#</b>
		</p>
		</cfoutput>
	</cfif>
	
	<form action="test.cfm" enctype="multipart/form-data" method="post">
		  
		  <input type="file" name="xlsfile" required>
		  <input type="submit" value="Upload XLS File">
		  
	</form>
<cfelse>

	<style>
	.ssTable { width: 100%; 
			   border-style:solid;
			   border-width:thin;
	}
	.ssHeader { background-color: #ffff00; }
	.ssTable td, .ssTable th { 
		padding: 10px; 
		border-style:solid;
		border-width:thin;
	}
	</style>
	
	<p>
	Here is the data in your Excel sheet (assuming first row as headers):
	</p>
	
	<cfset metadata = getMetadata(data)>
	<cfset colList = "">
	<cfloop index="col" array="#metadata#">
		<cfset colList = listAppend(colList, col.name)>
	</cfloop>
	
	<cfif data.recordCount is 1>
		<p>
		This spreadsheet appeared to have no data.
		</p>
	<cfelse>
		<table class="ssTable">
			<tr class="ssHeader">
				<cfloop index="c" list="#colList#">
					<cfoutput><th>#c#</th></cfoutput>
				</cfloop>
			</tr>
			<cfoutput query="data" startRow="2">
				<tr>
				<cfloop index="c" list="#colList#">
					<td>#data[c][currentRow]#</td>
				</cfloop>
				</tr>					
			</cfoutput>
		</table>
	</cfif>
	
</cfif>	

Your Homework!

Your homework, if you chose to accept it, is to simply take the template and add a checkbox to toggle if the code should assume the first row is the header. It's not as simple as you think. Sure you can just get rid of that attribute, but you also have to update the display as well. Post your code to Pastebin and then share the url.

Notes

Why didn't I use the VFS to store the file? I did - but isSpreadsheetFile() always returns false on an XLS file in the VFS. Boo!

Like the style of this blog entry? (Simple example with a homework assignment.) If so - I'm thinking of doing more like it.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Dinesh Kanwar posted on 6/1/2011 at 1:24 PM

Few months back, I was trying to figure out to read an excel file. Ben Nadal's POI utility helped me. But this seems easiest way to do that.
Thankyou very much Ray. If possible, can you share something on writing to an excel file? I know its pretty simple, but faced problems in writing data to excel when you have single/double qoutes, space or some special characters in data.

Comment 2 by Joe Brislin posted on 6/1/2011 at 5:00 PM

@Ray Is there any way to remove empty rows from the Excel import? I can't seem to find a way with cfspreadsheet and also haven't been able to figure out a good way with Query of Queries. Any thoughts?

Comment 3 by Raymond Camden posted on 6/1/2011 at 5:09 PM

@Dinesh - yes, I can do a simple write example.
@Joe - let me dig into that.

Comment 4 by Sam Farmer posted on 6/1/2011 at 5:31 PM

If you are happy to give over control of the html the cfspreadsheet tag will return html:

<cfspreadsheet action="read" src="[FILE]" format="html" name="getHTML">
<cfoutput><table>#getHTML#</table></cfoutput>

Its crude output but very quick.

Comment 5 by Jaana Gilbert posted on 6/1/2011 at 5:33 PM

@Ray, also while you're writing your extra sample for Dinesh, could you add to it on how to hide a column please (pretty please)?

One of our local user group member is having a problem with the hidden="true/false" attribute in the CF9 built-in spreadsheet function SpreadsheetFormatCell. The hidden attribute is not doing it's job. Based on the Cf9 function reference at http://help.adobe.com/en_US... it should work.

Would this be possible to be added please? :)

TIA
Jaana

Comment 6 by Raymond Camden posted on 6/1/2011 at 5:47 PM

@Sam: I had forgotten about that option. It's damn slick. Is the HTML nicely formatted? I'll have to try and view source. Does it respect column order? If so - a lot of my blog entry gets simplified. ;) But I could see folks wanting greater control over the HTML.

@Jaana: To be honest, I'm surprised by all the questions. This gives me an opportunity to finally play more with the functions so sure - yes - I'll happily look at your question as well.

Comment 7 by Jeff Price posted on 6/1/2011 at 5:48 PM

That user with the hidden attribute would be me on the SpreadSheetFormatRow() function, I just gave the TACFUG peeps that FormatCell() docs because that's the one that talks all about the format structure.

Comment 8 by Jaana Gilbert posted on 6/1/2011 at 5:52 PM

:) there's Jeff :)

Thanks Ray, your samples are by far the best and always easy to understand :)

Comment 9 by Sam Farmer posted on 6/1/2011 at 6:37 PM

@Ray: the html is basic tr and td's and does seem to respect order. With some css you could make it look pretty good.

Comment 10 by John Sieber posted on 6/1/2011 at 10:53 PM

@Ray - I have nothad a chance to work on the homework, but I like the concept and would enjoy seeing more post in that fashion.

Comment 11 by Raymond Camden posted on 6/2/2011 at 5:42 AM

@Joe: When I used my code above, and uploaded my Excel sheet (after adding a blank and then a row with data), CF removed the empty row automatically.

Comment 12 by Raymond Camden posted on 6/2/2011 at 6:29 AM
Comment 13 by Raymond Camden posted on 6/2/2011 at 6:24 PM

@Jaana: I'm not quite sure I get what you mean. In my code I loop over all the columns. If you knew a column that you always wanted to hide, like Beers, you would just check for that in your loop. That would only be helpful in cases where folks are uploaded a standard XLS file (ie one you expect in a certain format). It should be as simple as that, right?

Comment 14 by Joe Brislin posted on 6/3/2011 at 12:00 AM

@Ray - There is a spreadsheet that I have when uploaded using your code leaves the emtpy rows. I think it has to do with how the data is deleted from the row, i.e. Clear Contents vs. Delete Row. Would you mind if I email you the file so that you can try it?

Comment 15 by Raymond Camden posted on 6/3/2011 at 12:03 AM

Sure, email it over.

Comment 16 by Raymond Camden posted on 6/3/2011 at 1:52 AM

I can confirm what Joe found. His XLS file ended up with empty rows. So I think an obvious fix would be to simply update the code to look for no data. One sec.

Comment 17 by Raymond Camden posted on 6/3/2011 at 1:55 AM

Joe, check out the mod in this pastebin. Sure it's a bit hacky, but it seems to work:

http://pastebin.com/ECtjEVxY

Comment 18 by Les Mizzell posted on 1/5/2012 at 7:21 PM

Does this work with .xls and .xlsx files?

Or is there a limitation as to which Excel file version(s) work properly?

Comment 19 by Raymond Camden posted on 1/5/2012 at 7:22 PM

It should work in both.

Comment 20 by Ed Salsberg posted on 4/17/2012 at 6:01 PM

I so needed this!!!

My only question is what is the character between rows? I'm trying to import this into a db and don't know how to distinguish between rows.

#data[c][currentRow]#<cfif data.currentrow neq data.rowcount>, </cfif> No worky :(

Comment 21 by Raymond Camden posted on 4/17/2012 at 6:09 PM

I'm sorry - what? I can't understand your question.

Comment 22 by Ed Salsberg posted on 4/17/2012 at 6:21 PM

When I output #data[c][currentRow]# there is a " " in between records/rows. Is that a space or tab or return? My SQL Insert fails as there is no way for me to distinguish between field content.

Comment 23 by Raymond Camden posted on 4/17/2012 at 6:28 PM

I'm still not getting you. When you output one value, that isn't a row. It's one value. If you are saying that one of the cells has additional whitespace, why not use trim() on it?

Comment 24 by David Jaocbson posted on 5/17/2012 at 9:25 PM

Hey Ray, It seems im late to the party on this issue, however, I now have a task to read one spreedsheet with multiple tabs within it. How would your example be modified to do that?

Comment 25 by Raymond Camden posted on 5/18/2012 at 4:59 PM

Those tabs are sheets, and CF lets you read a specific sheet. (See the CFML Ref for details.)

Comment 26 by Dan posted on 9/5/2012 at 7:24 PM

BTW: cfspreadsheet is available in cf 9 and greater.

Comment 27 by udaya posted on 1/10/2013 at 4:09 PM

Please help me how to get the row id of spread sheet to print

Comment 28 by Raymond Camden posted on 1/11/2013 at 12:19 AM

I don't understand - you want to print one row?

Comment 29 by udaya posted on 1/11/2013 at 5:20 AM

Hi Raymond,
Trying to print row number. Got it by #data.CurrentRow# or #CurrentRow#. But I am working on how to handle transaction handling of around 600 rows from spread sheet with around 25 columns. Trying to read the data of each row and set the values to bean, and saving to database, iterating through rows, all in one transaction. But it failed with Java heap size limit or out of memeory error.
May I ask your advice on how to handle databse tranactions to read and load the data from spread sheet after proper data conversion to required data types. Trying such bulk gtransaction is a bad idea ? how to make into batches I am thinking. Or it is better to read one row at a time do save to database, commit transaction and come back again to read next row ?

Please help / advise while I continue to try different alternatives. I appreciate any level of help or advice.

Comment 30 by Raymond Camden posted on 1/11/2013 at 8:21 PM

Well, it is a bit hard to diagnose over a blog comment, but -

a) You could tweak your JVM settings to give you more RAM
b) You could look at doing it outside of CF - with something like DTS in SQL Server for example.

Comment 31 by ConfusedDeer posted on 2/6/2014 at 4:20 AM

When I ran your code in coldfusion 10 I got the following error:

Invalid tag nesting configuration.
A query driven queryloop tag is nested inside a queryloop tag that also has a query attribute. This is not allowed. Nesting these tags implies that you want to use grouped processing. However, only the top-level tag can specify the query that drives the processing.

The error occurred in line 76

74 : </cfloop>
75 : </tr>
76 : <cfoutput query="data" startRow="2">
77 : <tr>
78 : <cfloop index="c" list="#colList

Comment 32 by Raymond Camden posted on 2/6/2014 at 4:22 AM

You mean my code towards the end of the blog entry?

Comment 33 by confusedDeer posted on 2/6/2014 at 4:52 AM

Your fix now works!

Comment 34 by Raymond Camden posted on 2/6/2014 at 4:59 AM

Um, I didn't fix anything, but ok.

Comment 35 by confusedDeer posted on 2/6/2014 at 5:03 AM

That's why this page was inaccessible for a few moments while you didn't fix anything ;) thanks for the tutorials they've been a great help.
BTW: I have a copy of the orginial code you posted and now this one and I'm doing a diff.

Comment 36 by ConfusedDeer posted on 2/6/2014 at 5:11 AM

Ray,
You are right it was still broken, but I replaced the code above with the following code and it works (thanks henry for the code below):

<cfoutput>
<cfloop query="data" startRow="2">
<tr>
<cfloop index="c" list="#colList#">
<td>#data[c][currentRow]#</td>
</cfloop>
</tr>
</cfloop>
</cfoutput>

Comment 37 by Raymond Camden posted on 2/6/2014 at 5:26 AM

I honestly don't know what you mean. I took my code - pasted it into a local file - ran it in CF10 - and it worked fine. As it stands, your error:

A query driven queryloop tag is nested inside a queryloop tag that also has a query attribute.

Implies a cfloop query = something inside another one. Do you see that in my code?

Comment 38 by JohnSepu posted on 5/2/2014 at 2:34 AM

Hi Ray,

I'm trying to delete a file (using cffile action="delete") that has been uploaded by a form and fails the isspreadsheetfile check. Seems there's a lock on the file which is causing the delete action to bomb. I can manually delete the file from the server after 30 seconds or so however. And the delete process works fine when I remove the isspreadsheetfile check code. Any ideas on how to force unlock the file so it can be deleted right away when it fails the isspreadsheetfile check? Or is there an attribute on the file I need to set upon initial upload? My code is below. It's all wrapped in <cftransaction> tags. Thanks!

---------------------------------------------------------------

<cffile action="upload" filefield="form.graduate_names" destination="e:\temp\commencement-confirmation\" nameconflict="makeunique" result="graduates" attributes="normal">

<cfif not isspreadsheetfile("e:\temp\commencement-confirmation\#graduates.serverfile#") or (not right(graduates.serverfile, 5) is ".xlsx" and not right(graduates.serverfile, 4) is ".xls")>
<cffile action="delete" file="e:\temp\commencement-confirmation\#graduates.serverfile#">
<cflocation url="index-test.cfm?error=graduates_file_format_error" addtoken="no">
<cfabort>
</cfif>

Comment 39 by Raymond Camden posted on 5/2/2014 at 6:05 AM

I just did a quick test with a newly generated text file. I made the file, write some text to it, and then run isSpreadsheetFile on it, deleted it, and it worked fine for me. Are you sure removing JUST that check makes it work ok?

Comment 40 by JohnSepu posted on 5/2/2014 at 9:11 PM

Thanks for your reply. Yes. Removing this line from the code makes it work:

"not isspreadsheetfile("e:\temp\commencement-confirmation\#graduates.serverfile#") or "

I've narrowed it down further. If I try to upload a Word document, the file is deleted immediately after the check runs and rejects the file type. It seems that if I try to upload any other type of file (TXT, PDF, JPG) there is a lock placed on said file for anywhere from 2-30 seconds that causes the application to bomb. It's a Windows thing I fear and I'm not sure how or if there's a workaround. :-\
------------------------------------------------------------
Raymond Camden #
Commented on 05-01-2014 at 9:05 PM
I just did a quick test with a newly generated text file. I made the file, write some text to it, and then run isSpreadsheetFile on it, deleted it, and it worked fine for me. Are you sure removing JUST that check makes it work ok?

Comment 41 by Raymond Camden posted on 5/2/2014 at 10:47 PM

Hmm, well, at minimum, you should file a bug report.

Comment 42 by Kevin posted on 5/23/2014 at 5:56 PM

JohnSepu, did you happen to find out a fix for this? I'm getting the exact same error and Google sent me here when I started searching! Thanks!

Comment 43 by Micheal posted on 11/19/2014 at 10:27 PM

I just wanted to know what file types it validates against?

For example is it just .xls and .xlsx files or does it check others like: .xlt, .ods, .csv, .tsv, .txt, .tab?

Does it also validate the content too? Ie would it be good to use it to validate an excel file that is being uploaded? (to enhance security)

Comment 44 by Raymond Camden posted on 11/20/2014 at 12:27 AM

"For example is it just .xls and .xlsx files or does it check others like: .xlt, .ods, .csv, .tsv, .txt, .tab?"

That is up to isSpreadsheetFile. I do not have the *exact* specifics on what it supports. I would *not* expect it to support tsv, txt, or tab, but I have not tested it.

"Does it also validate the content too? Ie would it be good to use it to validate an excel file that is being uploaded? (to enhance security) "

In what way? Content in terms of what the app demands or content as in - it is a csv file but is it REALLY comma delimited?

Comment 45 by george kochev posted on 9/10/2015 at 6:19 PM

Is there a way to extract only certain columns, let say 2 or 3 and create an arrays!

Comment 46 (In reply to #45) by Raymond Camden posted on 9/11/2015 at 3:48 AM

You can't - but you can obviously only *use* what columns you care about. Just because the query contains all the columns it doesn't mean you have to use them all.

Comment 47 (In reply to #46) by george kochev posted on 9/11/2015 at 6:19 PM

I'm trying to extract the data out of 2 columns, and I was hoping at least .csv would work since they are comma separated, still couldn't figure out how to do that on CF. The point is not to read whatever the columns provide, but to use their data and avoid manually entering the data somewhere else. Is that make sense.

Comment 48 (In reply to #46) by george kochev posted on 9/11/2015 at 7:18 PM

and actually I made it work.

Comment 49 by Richard L posted on 5/25/2016 at 3:50 PM

One cool set of API's I ended up using for client side Excel sheet parsing was Sheet JS (http://sheetjs.com/), which is broken up into XLSX and LXS JavaScript API's. I used https://github.com/SheetJS/... along with Lucee server, to processes a targeted column on an Excel sheet. I used Ajax calls to bring the data back to the browser, and gave the user the option of downloading the transformed data, back into Excel format.

Though I used Lucee, I image it would be no real difference sending the data to ColdFusion for processing.

I guess it will depend on the size of the data being parsed and the purpose, before deciding to use either a client side approach versus on the server.

Comment 50 (In reply to #49) by Raymond Camden posted on 5/25/2016 at 4:37 PM

Cool - thanks for sharing that, Richard.

Comment 51 (In reply to #47) by Matt Wilde posted on 6/10/2016 at 7:58 PM

This is old I know but the solution still works. You can use Ben Nadel's CSVtoQuery script you can find on his ColdFusion blog.
From there use

var x = ListToArray(ValueList(query.columnName, 'delimiter')), 'delimiter');

You just need a query object to apply ValueList on.

Comment 52 by Sam posted on 9/30/2016 at 5:37 PM

Hello Raymond! im trying to upload an excel file and insert into database. a normal upload file would contain atleast 5000 input rows with 116 fields. its taking upto 5 minutes to upload. also i tried to upload a sample file with 400 rows and its taking upto 1 minute.

would you please let me know how to do a faster upload?
i'm using the same code as above.
thanks

Comment 53 (In reply to #52) by Raymond Camden posted on 9/30/2016 at 6:56 PM

Is the 5 minutes the upload or the processing? If it is the upload, there isn't much you can do. You can ask users to upload a zip and unzip it on the CF side w/ CFZIP. That's one option.

If the time is *after* the file is uploaded, then you would need to figure out what part of the process is taking the longest and see if you can do something about that.

Comment 54 (In reply to #53) by Sam posted on 9/30/2016 at 7:01 PM

its the process which is taking the time. i'm very new to the programming i'm not sure how i can figure out what is taking so long to process the file. any hints how i can figure it out?
Thank you

Comment 55 (In reply to #54) by Raymond Camden posted on 9/30/2016 at 7:06 PM

"its the process" - but what process do you mean? The user has to send the file up. CF has to process the file. CF then reads in the XLS file. CF then prints it out. Then, when CF is done, it sends the result back to the user. That takes time too.

So you've got numerous things you would need to look into.

Comment 56 by Rejith R Krishnan posted on 11/1/2016 at 12:28 PM

Hi Ray,

I have seen places where the <cfspreadsheet action="read"> which converts column values to numeric form (see the image). Is there anyway we can get around this?

https://uploads.disquscdn.c...

Comment 57 (In reply to #56) by Raymond Camden posted on 11/1/2016 at 3:01 PM

As far as I know, CF makes a guess as to the data and doesn't give you control over that process. I know it isn't helpful, but "dont use CF" is the only advice I can give.