Twitter: raymondcamden


Address: Lafayette, LA, USA

ColdFusion Sample - Upload and Parse an Excel File

05-31-2011 19,378 views ColdFusion 37 Comments

In this ColdFusion sample I'm going to demonstrate how to allow users to upload Excel files and use ColdFusion to both validate and read the content within. Let's begin by designing a simple upload form.

view plain print about
1<cfif structKeyExists(variables, "errors")>
2    <cfoutput>
3    <p>
4    <b>Error: #variables.errors#</b>
5    </p>
6    </cfoutput>
7</cfif>
8    
9<form action="test.cfm" enctype="multipart/form-data" method="post">
10        
11     <input type="file" name="xlsfile" required>
12     <input type="submit" value="Upload XLS File">
13        
14</form>

Nothing too complex here. The form has a grand total of one field - the file field named xlsfile. Note above the form is a simple set of ColdFusion logic to notice an errors variable and output it. In case you're curious, this value will be created a bit later in our example. So - let's process the upload. Here's the code that handles that.

view plain print about
1<cfif structKeyExists(form, "xlsfile") and len(form.xlsfile)>
2
3    <!--- Destination outside of web root --->
4    <cfset dest = getTempDirectory()>
5
6    <cffile action="upload" destination="#dest#" filefield="xlsfile" result="upload" nameconflict="makeunique">
7
8    <cfif upload.fileWasSaved>
9        <cfset theFile = upload.serverDirectory & "/" & upload.serverFile>
10        <cfif isSpreadsheetFile(theFile)>
11            <cfspreadsheet action="read" src="#theFile#" query="data" headerrow="1">
12            <cffile action="delete" file="#theFile#">
13            <cfset showForm = false>
14        <cfelse>
15            <cfset errors = "The file was not an Excel file.">
16            <cffile action="delete" file="#theFile#">
17        </cfif>
18    <cfelse>
19        <cfset errors = "The file was not properly uploaded.">    
20    </cfif>
21        
22</cfif>

This code block begins with the field check used for our upload. If it exists, and has a value, we have to do some processing. We need a place to store the upload, and as we all know, you never upload files to a directory under web root. Therefore I used the temp directory as a quick storage place. I upload the file using cffile/action=upload. If the file was successfully uploaded, I use isSpreadsheetFile() to determine if the file was a valid spreadsheet. This covers XLS, XLSX, and even OpenOffice documents. If it is a valid spreadsheet, I read it in using the cfspreadsheet tag. Notice the last two arguments.

The query argument tells ColdFusion to parse the spreadsheet data into a query. This assumes we only want the first sheet. If you want to work with other sheets, that's definitely possible.

The last argument, headerrow, tells ColdFusion to consider the first row to be column headers. It may not always be advisable to assume this. But for now, we will.

The rest of that block simply handles errors and specifying if we should show the form again. If the user uploaded a valid spreadsheet we don't want to show the form. Instead, we want to display the contents. Let's look at how I did this.

view plain print about
1<style>
2.ssTable { width: 100%;
3         border-style:solid;
4         border-width:thin;
5}
6.ssHeader { background-color: #ffff00; }
7.ssTable td, .ssTable th {
8    padding: 10px;
9    border-style:solid;
10    border-width:thin;
11}
12</style>
13
14<p>
15Here is the data in your Excel sheet (assuming first row as headers):
16</p>
17
18<cfset metadata = getMetadata(data)>
19<cfset colList = "">
20<cfloop index="col" array="#metadata#">
21    <cfset colList = listAppend(colList, col.name)>
22</cfloop>
23
24<cfif data.recordCount is 1>
25    <p>
26    This spreadsheet appeared to have no data.
27    </p>
28<cfelse>
29    <table class="ssTable">
30        <tr class="ssHeader">
31            <cfloop index="c" list="#colList#">
32                <cfoutput><th>#c#</th></cfoutput>
33            </cfloop>
34        </tr>
35        <cfoutput query="data" startRow="2">
36            <tr>
37            <cfloop index="c" list="#colList#">
38                <td>#data[c][currentRow]#</td>
39            </cfloop>
40            </tr>                    
41        </cfoutput>
42    </table>
43</cfif>

So skipping over the CSS, the real meat of the work begins when we get the metadata. Why do we do this? ColdFusion's query object does not maintain the same order of columns that our spreadsheet had. I can use the getMetadata function on the query to get the proper column order. That's the array list you see there.

Next - we do a quick check of the size of the query. We are assuming our spreadsheet has a first row being used as headers. So if we assume that, and there is only one row, then we really don't have any data. Notice then in the next block of the conditional, we use startRow=2 to begin with where we figure the real data starts. After that it's a simple matter of outputting the query dynamically. (For an example of working with dynamic ColdFusion queries, see this blog entry.)

How does it look? Here's the result of uploading a sample XLS sheet.

And below is the complete template. Read on though for more...

view plain print about
1<cfset showForm = true>
2<cfif structKeyExists(form, "xlsfile") and len(form.xlsfile)>
3
4    <!--- Destination outside of web root --->
5    <cfset dest = getTempDirectory()>
6
7    <cffile action="upload" destination="#dest#" filefield="xlsfile" result="upload" nameconflict="makeunique">
8
9    <cfif upload.fileWasSaved>
10        <cfset theFile = upload.serverDirectory & "/" & upload.serverFile>
11        <cfif isSpreadsheetFile(theFile)>
12            <cfspreadsheet action="read" src="#theFile#" query="data" headerrow="1">
13            <cffile action="delete" file="#theFile#">
14            <cfset showForm = false>
15        <cfelse>
16            <cfset errors = "The file was not an Excel file.">
17            <cffile action="delete" file="#theFile#">
18        </cfif>
19    <cfelse>
20        <cfset errors = "The file was not properly uploaded.">    
21    </cfif>
22        
23</cfif>
24
25<cfif showForm>
26    <cfif structKeyExists(variables, "errors")>
27        <cfoutput>
28        <p>
29        <b>Error: #variables.errors#</b>
30        </p>
31        </cfoutput>
32    </cfif>
33    
34    <form action="test.cfm" enctype="multipart/form-data" method="post">
35        
36         <input type="file" name="xlsfile" required>
37         <input type="submit" value="Upload XLS File">
38        
39    </form>
40<cfelse>
41
42    <style>
43    .ssTable { width: 100%;
44             border-style:solid;
45             border-width:thin;
46    }
47    .ssHeader { background-color: #ffff00; }
48    .ssTable td, .ssTable th {
49        padding: 10px;
50        border-style:solid;
51        border-width:thin;
52    }
53    </style>
54    
55    <p>
56    Here is the data in your Excel sheet (assuming first row as headers):
57    </p>
58    
59    <cfset metadata = getMetadata(data)>
60    <cfset colList = "">
61    <cfloop index="col" array="#metadata#">
62        <cfset colList = listAppend(colList, col.name)>
63    </cfloop>
64    
65    <cfif data.recordCount is 1>
66        <p>
67        This spreadsheet appeared to have no data.
68        </p>
69    <cfelse>
70        <table class="ssTable">
71            <tr class="ssHeader">
72                <cfloop index="c" list="#colList#">
73                    <cfoutput><th>#c#</th></cfoutput>
74                </cfloop>
75            </tr>
76            <cfoutput query="data" startRow="2">
77                <tr>
78                <cfloop index="c" list="#colList#">
79                    <td>#data[c][currentRow]#</td>
80                </cfloop>
81                </tr>                    
82            </cfoutput>
83        </table>
84    </cfif>
85    
86</cfif>

Your Homework!

Your homework, if you chose to accept it, is to simply take the template and add a checkbox to toggle if the code should assume the first row is the header. It's not as simple as you think. Sure you can just get rid of that attribute, but you also have to update the display as well. Post your code to Pastebin and then share the url.

Notes

Why didn't I use the VFS to store the file? I did - but isSpreadsheetFile() always returns false on an XLS file in the VFS. Boo!

Like the style of this blog entry? (Simple example with a homework assignment.) If so - I'm thinking of doing more like it.

Related Blog Entries

37 Comments

  • Dinesh Kanwar #
    Commented on 06-01-2011 at 4:24 AM
    Few months back, I was trying to figure out to read an excel file. Ben Nadal's POI utility helped me. But this seems easiest way to do that.
    Thankyou very much Ray. If possible, can you share something on writing to an excel file? I know its pretty simple, but faced problems in writing data to excel when you have single/double qoutes, space or some special characters in data.
  • Joe Brislin #
    Commented on 06-01-2011 at 8:00 AM
    @Ray Is there any way to remove empty rows from the Excel import? I can't seem to find a way with cfspreadsheet and also haven't been able to figure out a good way with Query of Queries. Any thoughts?
  • Commented on 06-01-2011 at 8:09 AM
    @Dinesh - yes, I can do a simple write example.
    @Joe - let me dig into that.
  • Commented on 06-01-2011 at 8:31 AM
    If you are happy to give over control of the html the cfspreadsheet tag will return html:

    <cfspreadsheet action="read" src="[FILE]" format="html" name="getHTML">
    <cfoutput><table>#getHTML#</table></cfoutput>

    Its crude output but very quick.
  • Commented on 06-01-2011 at 8:33 AM
    @Ray, also while you're writing your extra sample for Dinesh, could you add to it on how to hide a column please (pretty please)?

    One of our local user group member is having a problem with the hidden="true/false" attribute in the CF9 built-in spreadsheet function SpreadsheetFormatCell. The hidden attribute is not doing it's job. Based on the Cf9 function reference at http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef... it should work.

    Would this be possible to be added please? :)

    TIA
    Jaana
  • Commented on 06-01-2011 at 8:47 AM
    @Sam: I had forgotten about that option. It's damn slick. Is the HTML nicely formatted? I'll have to try and view source. Does it respect column order? If so - a lot of my blog entry gets simplified. ;) But I could see folks wanting greater control over the HTML.

    @Jaana: To be honest, I'm surprised by all the questions. This gives me an opportunity to finally play more with the functions so sure - yes - I'll happily look at your question as well.
  • Commented on 06-01-2011 at 8:48 AM
    That user with the hidden attribute would be me on the SpreadSheetFormatRow() function, I just gave the TACFUG peeps that FormatCell() docs because that's the one that talks all about the format structure.
  • Commented on 06-01-2011 at 8:52 AM
    :) there's Jeff :)

    Thanks Ray, your samples are by far the best and always easy to understand :)
  • Commented on 06-01-2011 at 9:37 AM
    @Ray: the html is basic tr and td's and does seem to respect order. With some css you could make it look pretty good.
  • Commented on 06-01-2011 at 1:53 PM
    @Ray - I have nothad a chance to work on the homework, but I like the concept and would enjoy seeing more post in that fashion.
  • Commented on 06-01-2011 at 8:42 PM
    @Joe: When I used my code above, and uploaded my Excel sheet (after adding a blank and then a row with data), CF removed the empty row automatically.
  • Commented on 06-01-2011 at 9:29 PM
    @Dinesh -http://www.coldfusionjedi.com/index.cfm/2011/6/1/C...
  • Commented on 06-02-2011 at 9:24 AM
    @Jaana: I'm not quite sure I get what you mean. In my code I loop over all the columns. If you knew a column that you always wanted to hide, like Beers, you would just check for that in your loop. That would only be helpful in cases where folks are uploaded a standard XLS file (ie one you expect in a certain format). It should be as simple as that, right?
  • Joe Brislin #
    Commented on 06-02-2011 at 3:00 PM
    @Ray - There is a spreadsheet that I have when uploaded using your code leaves the emtpy rows. I think it has to do with how the data is deleted from the row, i.e. Clear Contents vs. Delete Row. Would you mind if I email you the file so that you can try it?
  • Commented on 06-02-2011 at 3:03 PM
    Sure, email it over.
  • Commented on 06-02-2011 at 4:52 PM
    I can confirm what Joe found. His XLS file ended up with empty rows. So I think an obvious fix would be to simply update the code to look for no data. One sec.
  • Commented on 06-02-2011 at 4:55 PM
    Joe, check out the mod in this pastebin. Sure it's a bit hacky, but it seems to work:

    http://pastebin.com/ECtjEVxY
  • Les Mizzell #
    Commented on 01-05-2012 at 8:21 AM
    Does this work with .xls and .xlsx files?

    Or is there a limitation as to which Excel file version(s) work properly?
  • Commented on 01-05-2012 at 8:22 AM
    It should work in both.
  • Ed Salsberg #
    Commented on 04-17-2012 at 9:01 AM
    I so needed this!!!

    My only question is what is the character between rows? I'm trying to import this into a db and don't know how to distinguish between rows.

    #data[c][currentRow]#<cfif data.currentrow neq data.rowcount>, </cfif> No worky :(
  • Commented on 04-17-2012 at 9:09 AM
    I'm sorry - what? I can't understand your question.
  • Ed Salsberg #
    Commented on 04-17-2012 at 9:21 AM
    When I output #data[c][currentRow]# there is a " " in between records/rows. Is that a space or tab or return? My SQL Insert fails as there is no way for me to distinguish between field content.
  • Commented on 04-17-2012 at 9:28 AM
    I'm still not getting you. When you output one value, that isn't a row. It's one value. If you are saying that one of the cells has additional whitespace, why not use trim() on it?
  • David Jaocbson #
    Commented on 05-17-2012 at 12:25 PM
    Hey Ray, It seems im late to the party on this issue, however, I now have a task to read one spreedsheet with multiple tabs within it. How would your example be modified to do that?
  • Commented on 05-18-2012 at 7:59 AM
    Those tabs are sheets, and CF lets you read a specific sheet. (See the CFML Ref for details.)
  • Commented on 09-05-2012 at 10:24 AM
    BTW: cfspreadsheet is available in cf 9 and greater.
  • udaya #
    Commented on 01-10-2013 at 5:09 AM
    Please help me how to get the row id of spread sheet to print
  • Commented on 01-10-2013 at 1:19 PM
    I don't understand - you want to print one row?
  • udaya #
    Commented on 01-10-2013 at 6:20 PM
    Hi Raymond,
    Trying to print row number. Got it by #data.CurrentRow# or #CurrentRow#. But I am working on how to handle transaction handling of around 600 rows from spread sheet with around 25 columns. Trying to read the data of each row and set the values to bean, and saving to database, iterating through rows, all in one transaction. But it failed with Java heap size limit or out of memeory error.
    May I ask your advice on how to handle databse tranactions to read and load the data from spread sheet after proper data conversion to required data types. Trying such bulk gtransaction is a bad idea ? how to make into batches I am thinking. Or it is better to read one row at a time do save to database, commit transaction and come back again to read next row ?

    Please help / advise while I continue to try different alternatives. I appreciate any level of help or advice.
  • Commented on 01-11-2013 at 9:21 AM
    Well, it is a bit hard to diagnose over a blog comment, but -

    a) You could tweak your JVM settings to give you more RAM
    b) You could look at doing it outside of CF - with something like DTS in SQL Server for example.
  • ConfusedDeer #
    Commented on 02-05-2014 at 5:20 PM
    When I ran your code in coldfusion 10 I got the following error:

    Invalid tag nesting configuration.
    A query driven queryloop tag is nested inside a queryloop tag that also has a query attribute. This is not allowed. Nesting these tags implies that you want to use grouped processing. However, only the top-level tag can specify the query that drives the processing.

    The error occurred in line 76

    74 : </cfloop>
    75 : </tr>
    76 : <cfoutput query="data" startRow="2">
    77 : <tr>
    78 : <cfloop index="c" list="#colList
  • Commented on 02-05-2014 at 5:22 PM
    You mean my code towards the end of the blog entry?
  • confusedDeer #
    Commented on 02-05-2014 at 5:52 PM
    Your fix now works!
  • Commented on 02-05-2014 at 5:59 PM
    Um, I didn't fix anything, but ok.
  • confusedDeer #
    Commented on 02-05-2014 at 6:03 PM
    That's why this page was inaccessible for a few moments while you didn't fix anything ;) thanks for the tutorials they've been a great help.
    BTW: I have a copy of the orginial code you posted and now this one and I'm doing a diff.
  • ConfusedDeer #
    Commented on 02-05-2014 at 6:11 PM
    Ray,
    You are right it was still broken, but I replaced the code above with the following code and it works (thanks henry for the code below):

    <cfoutput>
    <cfloop query="data" startRow="2">
    <tr>
    <cfloop index="c" list="#colList#">
    <td>#data[c][currentRow]#</td>
    </cfloop>
    </tr>
    </cfloop>
    </cfoutput>
  • Commented on 02-05-2014 at 6:26 PM
    I honestly don't know what you mean. I took my code - pasted it into a local file - ran it in CF10 - and it worked fine. As it stands, your error:

    A query driven queryloop tag is nested inside a queryloop tag that also has a query attribute.

    Implies a cfloop query = something inside another one. Do you see that in my code?

Post Reply

Please refrain from posting large blocks of code as a comment. Use Pastebin or Gists instead. Text wrapped in asterisks (*) will be bold and text wrapped in underscores (_) will be italicized.

Leave this field empty