In this ColdFusion sample I'm going to demonstrate how to allow users to upload Excel files and use ColdFusion to both validate and read the content within. Let's begin by designing a simple upload form.
<cfif structKeyExists(variables, "errors")>
<cfoutput>
<p>
<b>Error: #variables.errors#</b>
</p>
</cfoutput>
</cfif>
<form action="test.cfm" enctype="multipart/form-data" method="post">
<input type="file" name="xlsfile" required>
<input type="submit" value="Upload XLS File">
</form>
Nothing too complex here. The form has a grand total of one field - the file field named xlsfile. Note above the form is a simple set of ColdFusion logic to notice an errors variable and output it. In case you're curious, this value will be created a bit later in our example. So - let's process the upload. Here's the code that handles that.
<cfif structKeyExists(form, "xlsfile") and len(form.xlsfile)>
<!--- Destination outside of web root --->
<cfset dest = getTempDirectory()>
<cffile action="upload" destination="#dest#" filefield="xlsfile" result="upload" nameconflict="makeunique">
<cfif upload.fileWasSaved>
<cfset theFile = upload.serverDirectory & "/" & upload.serverFile>
<cfif isSpreadsheetFile(theFile)>
<cfspreadsheet action="read" src="#theFile#" query="data" headerrow="1">
<cffile action="delete" file="#theFile#">
<cfset showForm = false>
<cfelse>
<cfset errors = "The file was not an Excel file.">
<cffile action="delete" file="#theFile#">
</cfif>
<cfelse>
<cfset errors = "The file was not properly uploaded.">
</cfif>
</cfif>
This code block begins with the field check used for our upload. If it exists, and has a value, we have to do some processing. We need a place to store the upload, and as we all know, you never upload files to a directory under web root. Therefore I used the temp directory as a quick storage place. I upload the file using cffile/action=upload. If the file was successfully uploaded, I use isSpreadsheetFile() to determine if the file was a valid spreadsheet. This covers XLS, XLSX, and even OpenOffice documents. If it is a valid spreadsheet, I read it in using the cfspreadsheet tag. Notice the last two arguments.
The query argument tells ColdFusion to parse the spreadsheet data into a query. This assumes we only want the first sheet. If you want to work with other sheets, that's definitely possible.
The last argument, headerrow, tells ColdFusion to consider the first row to be column headers. It may not always be advisable to assume this. But for now, we will.
The rest of that block simply handles errors and specifying if we should show the form again. If the user uploaded a valid spreadsheet we don't want to show the form. Instead, we want to display the contents. Let's look at how I did this.
<style>
.ssTable { width: 100%;
border-style:solid;
border-width:thin;
}
.ssHeader { background-color: #ffff00; }
.ssTable td, .ssTable th {
padding: 10px;
border-style:solid;
border-width:thin;
}
</style>
<p>
Here is the data in your Excel sheet (assuming first row as headers):
</p>
<cfset metadata = getMetadata(data)>
<cfset colList = "">
<cfloop index="col" array="#metadata#">
<cfset colList = listAppend(colList, col.name)>
</cfloop>
<cfif data.recordCount is 1>
<p>
This spreadsheet appeared to have no data.
</p>
<cfelse>
<table class="ssTable">
<tr class="ssHeader">
<cfloop index="c" list="#colList#">
<cfoutput><th>#c#</th></cfoutput>
</cfloop>
</tr>
<cfoutput query="data" startRow="2">
<tr>
<cfloop index="c" list="#colList#">
<td>#data[c][currentRow]#</td>
</cfloop>
</tr>
</cfoutput>
</table>
</cfif>
So skipping over the CSS, the real meat of the work begins when we get the metadata. Why do we do this? ColdFusion's query object does not maintain the same order of columns that our spreadsheet had. I can use the getMetadata function on the query to get the proper column order. That's the array list you see there.
Next - we do a quick check of the size of the query. We are assuming our spreadsheet has a first row being used as headers. So if we assume that, and there is only one row, then we really don't have any data. Notice then in the next block of the conditional, we use startRow=2 to begin with where we figure the real data starts. After that it's a simple matter of outputting the query dynamically. (For an example of working with dynamic ColdFusion queries, see this blog entry.)
How does it look? Here's the result of uploading a sample XLS sheet.
And below is the complete template. Read on though for more...
<cfset showForm = true>
<cfif structKeyExists(form, "xlsfile") and len(form.xlsfile)>
<!--- Destination outside of web root --->
<cfset dest = getTempDirectory()>
<cffile action="upload" destination="#dest#" filefield="xlsfile" result="upload" nameconflict="makeunique">
<cfif upload.fileWasSaved>
<cfset theFile = upload.serverDirectory & "/" & upload.serverFile>
<cfif isSpreadsheetFile(theFile)>
<cfspreadsheet action="read" src="#theFile#" query="data" headerrow="1">
<cffile action="delete" file="#theFile#">
<cfset showForm = false>
<cfelse>
<cfset errors = "The file was not an Excel file.">
<cffile action="delete" file="#theFile#">
</cfif>
<cfelse>
<cfset errors = "The file was not properly uploaded.">
</cfif>
</cfif>
<cfif showForm>
<cfif structKeyExists(variables, "errors")>
<cfoutput>
<p>
<b>Error: #variables.errors#</b>
</p>
</cfoutput>
</cfif>
<form action="test.cfm" enctype="multipart/form-data" method="post">
<input type="file" name="xlsfile" required>
<input type="submit" value="Upload XLS File">
</form>
<cfelse>
<style>
.ssTable { width: 100%;
border-style:solid;
border-width:thin;
}
.ssHeader { background-color: #ffff00; }
.ssTable td, .ssTable th {
padding: 10px;
border-style:solid;
border-width:thin;
}
</style>
<p>
Here is the data in your Excel sheet (assuming first row as headers):
</p>
<cfset metadata = getMetadata(data)>
<cfset colList = "">
<cfloop index="col" array="#metadata#">
<cfset colList = listAppend(colList, col.name)>
</cfloop>
<cfif data.recordCount is 1>
<p>
This spreadsheet appeared to have no data.
</p>
<cfelse>
<table class="ssTable">
<tr class="ssHeader">
<cfloop index="c" list="#colList#">
<cfoutput><th>#c#</th></cfoutput>
</cfloop>
</tr>
<cfoutput query="data" startRow="2">
<tr>
<cfloop index="c" list="#colList#">
<td>#data[c][currentRow]#</td>
</cfloop>
</tr>
</cfoutput>
</table>
</cfif>
</cfif>
Your Homework!
Your homework, if you chose to accept it, is to simply take the template and add a checkbox to toggle if the code should assume the first row is the header. It's not as simple as you think. Sure you can just get rid of that attribute, but you also have to update the display as well. Post your code to Pastebin and then share the url.
Notes
Why didn't I use the VFS to store the file? I did - but isSpreadsheetFile() always returns false on an XLS file in the VFS. Boo!
Like the style of this blog entry? (Simple example with a homework assignment.) If so - I'm thinking of doing more like it.
Archived Comments
Few months back, I was trying to figure out to read an excel file. Ben Nadal's POI utility helped me. But this seems easiest way to do that.
Thankyou very much Ray. If possible, can you share something on writing to an excel file? I know its pretty simple, but faced problems in writing data to excel when you have single/double qoutes, space or some special characters in data.
@Ray Is there any way to remove empty rows from the Excel import? I can't seem to find a way with cfspreadsheet and also haven't been able to figure out a good way with Query of Queries. Any thoughts?
@Dinesh - yes, I can do a simple write example.
@Joe - let me dig into that.
If you are happy to give over control of the html the cfspreadsheet tag will return html:
<cfspreadsheet action="read" src="[FILE]" format="html" name="getHTML">
<cfoutput><table>#getHTML#</table></cfoutput>
Its crude output but very quick.
@Ray, also while you're writing your extra sample for Dinesh, could you add to it on how to hide a column please (pretty please)?
One of our local user group member is having a problem with the hidden="true/false" attribute in the CF9 built-in spreadsheet function SpreadsheetFormatCell. The hidden attribute is not doing it's job. Based on the Cf9 function reference at http://help.adobe.com/en_US... it should work.
Would this be possible to be added please? :)
TIA
Jaana
@Sam: I had forgotten about that option. It's damn slick. Is the HTML nicely formatted? I'll have to try and view source. Does it respect column order? If so - a lot of my blog entry gets simplified. ;) But I could see folks wanting greater control over the HTML.
@Jaana: To be honest, I'm surprised by all the questions. This gives me an opportunity to finally play more with the functions so sure - yes - I'll happily look at your question as well.
That user with the hidden attribute would be me on the SpreadSheetFormatRow() function, I just gave the TACFUG peeps that FormatCell() docs because that's the one that talks all about the format structure.
:) there's Jeff :)
Thanks Ray, your samples are by far the best and always easy to understand :)
@Ray: the html is basic tr and td's and does seem to respect order. With some css you could make it look pretty good.
@Ray - I have nothad a chance to work on the homework, but I like the concept and would enjoy seeing more post in that fashion.
@Joe: When I used my code above, and uploaded my Excel sheet (after adding a blank and then a row with data), CF removed the empty row automatically.
@Dinesh -http://www.coldfusionjedi.c...
@Jaana: I'm not quite sure I get what you mean. In my code I loop over all the columns. If you knew a column that you always wanted to hide, like Beers, you would just check for that in your loop. That would only be helpful in cases where folks are uploaded a standard XLS file (ie one you expect in a certain format). It should be as simple as that, right?
@Ray - There is a spreadsheet that I have when uploaded using your code leaves the emtpy rows. I think it has to do with how the data is deleted from the row, i.e. Clear Contents vs. Delete Row. Would you mind if I email you the file so that you can try it?
Sure, email it over.
I can confirm what Joe found. His XLS file ended up with empty rows. So I think an obvious fix would be to simply update the code to look for no data. One sec.
Joe, check out the mod in this pastebin. Sure it's a bit hacky, but it seems to work:
http://pastebin.com/ECtjEVxY
Does this work with .xls and .xlsx files?
Or is there a limitation as to which Excel file version(s) work properly?
It should work in both.
I so needed this!!!
My only question is what is the character between rows? I'm trying to import this into a db and don't know how to distinguish between rows.
#data[c][currentRow]#<cfif data.currentrow neq data.rowcount>, </cfif> No worky :(
I'm sorry - what? I can't understand your question.
When I output #data[c][currentRow]# there is a " " in between records/rows. Is that a space or tab or return? My SQL Insert fails as there is no way for me to distinguish between field content.
I'm still not getting you. When you output one value, that isn't a row. It's one value. If you are saying that one of the cells has additional whitespace, why not use trim() on it?
Hey Ray, It seems im late to the party on this issue, however, I now have a task to read one spreedsheet with multiple tabs within it. How would your example be modified to do that?
Those tabs are sheets, and CF lets you read a specific sheet. (See the CFML Ref for details.)
BTW: cfspreadsheet is available in cf 9 and greater.
Please help me how to get the row id of spread sheet to print
I don't understand - you want to print one row?
Hi Raymond,
Trying to print row number. Got it by #data.CurrentRow# or #CurrentRow#. But I am working on how to handle transaction handling of around 600 rows from spread sheet with around 25 columns. Trying to read the data of each row and set the values to bean, and saving to database, iterating through rows, all in one transaction. But it failed with Java heap size limit or out of memeory error.
May I ask your advice on how to handle databse tranactions to read and load the data from spread sheet after proper data conversion to required data types. Trying such bulk gtransaction is a bad idea ? how to make into batches I am thinking. Or it is better to read one row at a time do save to database, commit transaction and come back again to read next row ?
Please help / advise while I continue to try different alternatives. I appreciate any level of help or advice.
Well, it is a bit hard to diagnose over a blog comment, but -
a) You could tweak your JVM settings to give you more RAM
b) You could look at doing it outside of CF - with something like DTS in SQL Server for example.
When I ran your code in coldfusion 10 I got the following error:
Invalid tag nesting configuration.
A query driven queryloop tag is nested inside a queryloop tag that also has a query attribute. This is not allowed. Nesting these tags implies that you want to use grouped processing. However, only the top-level tag can specify the query that drives the processing.
The error occurred in line 76
74 : </cfloop>
75 : </tr>
76 : <cfoutput query="data" startRow="2">
77 : <tr>
78 : <cfloop index="c" list="#colList
You mean my code towards the end of the blog entry?
Your fix now works!
Um, I didn't fix anything, but ok.
That's why this page was inaccessible for a few moments while you didn't fix anything ;) thanks for the tutorials they've been a great help.
BTW: I have a copy of the orginial code you posted and now this one and I'm doing a diff.
Ray,
You are right it was still broken, but I replaced the code above with the following code and it works (thanks henry for the code below):
<cfoutput>
<cfloop query="data" startRow="2">
<tr>
<cfloop index="c" list="#colList#">
<td>#data[c][currentRow]#</td>
</cfloop>
</tr>
</cfloop>
</cfoutput>
I honestly don't know what you mean. I took my code - pasted it into a local file - ran it in CF10 - and it worked fine. As it stands, your error:
A query driven queryloop tag is nested inside a queryloop tag that also has a query attribute.
Implies a cfloop query = something inside another one. Do you see that in my code?
Hi Ray,
I'm trying to delete a file (using cffile action="delete") that has been uploaded by a form and fails the isspreadsheetfile check. Seems there's a lock on the file which is causing the delete action to bomb. I can manually delete the file from the server after 30 seconds or so however. And the delete process works fine when I remove the isspreadsheetfile check code. Any ideas on how to force unlock the file so it can be deleted right away when it fails the isspreadsheetfile check? Or is there an attribute on the file I need to set upon initial upload? My code is below. It's all wrapped in <cftransaction> tags. Thanks!
---------------------------------------------------------------
<cffile action="upload" filefield="form.graduate_names" destination="e:\temp\commencement-confirmation\" nameconflict="makeunique" result="graduates" attributes="normal">
<cfif not isspreadsheetfile("e:\temp\commencement-confirmation\#graduates.serverfile#") or (not right(graduates.serverfile, 5) is ".xlsx" and not right(graduates.serverfile, 4) is ".xls")>
<cffile action="delete" file="e:\temp\commencement-confirmation\#graduates.serverfile#">
<cflocation url="index-test.cfm?error=graduates_file_format_error" addtoken="no">
<cfabort>
</cfif>
I just did a quick test with a newly generated text file. I made the file, write some text to it, and then run isSpreadsheetFile on it, deleted it, and it worked fine for me. Are you sure removing JUST that check makes it work ok?
Thanks for your reply. Yes. Removing this line from the code makes it work:
"not isspreadsheetfile("e:\temp\commencement-confirmation\#graduates.serverfile#") or "
I've narrowed it down further. If I try to upload a Word document, the file is deleted immediately after the check runs and rejects the file type. It seems that if I try to upload any other type of file (TXT, PDF, JPG) there is a lock placed on said file for anywhere from 2-30 seconds that causes the application to bomb. It's a Windows thing I fear and I'm not sure how or if there's a workaround. :-\
------------------------------------------------------------
Raymond Camden #
Commented on 05-01-2014 at 9:05 PM
I just did a quick test with a newly generated text file. I made the file, write some text to it, and then run isSpreadsheetFile on it, deleted it, and it worked fine for me. Are you sure removing JUST that check makes it work ok?
Hmm, well, at minimum, you should file a bug report.
JohnSepu, did you happen to find out a fix for this? I'm getting the exact same error and Google sent me here when I started searching! Thanks!
I just wanted to know what file types it validates against?
For example is it just .xls and .xlsx files or does it check others like: .xlt, .ods, .csv, .tsv, .txt, .tab?
Does it also validate the content too? Ie would it be good to use it to validate an excel file that is being uploaded? (to enhance security)
"For example is it just .xls and .xlsx files or does it check others like: .xlt, .ods, .csv, .tsv, .txt, .tab?"
That is up to isSpreadsheetFile. I do not have the *exact* specifics on what it supports. I would *not* expect it to support tsv, txt, or tab, but I have not tested it.
"Does it also validate the content too? Ie would it be good to use it to validate an excel file that is being uploaded? (to enhance security) "
In what way? Content in terms of what the app demands or content as in - it is a csv file but is it REALLY comma delimited?
Is there a way to extract only certain columns, let say 2 or 3 and create an arrays!
You can't - but you can obviously only *use* what columns you care about. Just because the query contains all the columns it doesn't mean you have to use them all.
I'm trying to extract the data out of 2 columns, and I was hoping at least .csv would work since they are comma separated, still couldn't figure out how to do that on CF. The point is not to read whatever the columns provide, but to use their data and avoid manually entering the data somewhere else. Is that make sense.
and actually I made it work.
One cool set of API's I ended up using for client side Excel sheet parsing was Sheet JS (http://sheetjs.com/), which is broken up into XLSX and LXS JavaScript API's. I used https://github.com/SheetJS/... along with Lucee server, to processes a targeted column on an Excel sheet. I used Ajax calls to bring the data back to the browser, and gave the user the option of downloading the transformed data, back into Excel format.
Though I used Lucee, I image it would be no real difference sending the data to ColdFusion for processing.
I guess it will depend on the size of the data being parsed and the purpose, before deciding to use either a client side approach versus on the server.
Cool - thanks for sharing that, Richard.
This is old I know but the solution still works. You can use Ben Nadel's CSVtoQuery script you can find on his ColdFusion blog.
From there use
var x = ListToArray(ValueList(query.columnName, 'delimiter')), 'delimiter');
You just need a query object to apply ValueList on.
Hello Raymond! im trying to upload an excel file and insert into database. a normal upload file would contain atleast 5000 input rows with 116 fields. its taking upto 5 minutes to upload. also i tried to upload a sample file with 400 rows and its taking upto 1 minute.
would you please let me know how to do a faster upload?
i'm using the same code as above.
thanks
Is the 5 minutes the upload or the processing? If it is the upload, there isn't much you can do. You can ask users to upload a zip and unzip it on the CF side w/ CFZIP. That's one option.
If the time is *after* the file is uploaded, then you would need to figure out what part of the process is taking the longest and see if you can do something about that.
its the process which is taking the time. i'm very new to the programming i'm not sure how i can figure out what is taking so long to process the file. any hints how i can figure it out?
Thank you
"its the process" - but what process do you mean? The user has to send the file up. CF has to process the file. CF then reads in the XLS file. CF then prints it out. Then, when CF is done, it sends the result back to the user. That takes time too.
So you've got numerous things you would need to look into.
Hi Ray,
I have seen places where the <cfspreadsheet action="read"> which converts column values to numeric form (see the image). Is there anyway we can get around this?
https://uploads.disquscdn.c...
As far as I know, CF makes a guess as to the data and doesn't give you control over that process. I know it isn't helpful, but "dont use CF" is the only advice I can give.