Twitter: raymondcamden

Address: Lafayette, LA, USA

Friday Challenge - Compare Directories

12-07-2007 4,853 views ColdFusion 14 Comments

It's been a while since I've done a Friday Challenge. Frankly I just haven't felt very creative. I think everyone goes through cycles of creativity and - um - the opposite of creativity (see, my vocabulary is suffering!) A reader, Kris (very Christmasy!) sent in the following idea, and I think it's pretty good.

As a reminder - you should spend less than 10 minutes working on this. Don't go crazy unless you have a real understanding boss. Your challenge today is to write a UDF or Custom Tag that takes 2 directories as parameters. The tag will return a list of:

  • What files exist in folder A, but not B
  • What files exist in folder B, but not A
  • What files exist in both, but APPEAR different (size, date)

If you want to go crazy and make it recursive, that is fine, but again, there is no need, this is just for fun. (Although honestly, this could be quite useful!)


These comments will soon be imported into Disqus. To add a comment, use Disqus above.
  • Commented on 12-07-2007 at 10:26 AM
    Worse than I thought! It seems that QoQ doesn't like subqueries in the WHERE clause, so I had to do some nastiness, esp. in accounting for recursion (computing relative file names, etc.)

    Here's what I've got:

    <cfif thisTag.executionMode eq "start">

    <cfparam name="" type="string" default="directoryCompare" />
    <cfparam name="attributes.directoryOne" type="string" />
    <cfparam name="attributes.directoryTwo" type="string">

    <cfset result = structNew() />

    <cfif not directoryExists(attributes.directoryOne)>
       <cfthrow message="'#attributes.directoryOne#' doesn't exist." />

    <cfif not directoryExists(attributes.directoryTwo)>
       <cfthrow message="'#attributes.directoryTwo#' doesn't exist." />

    <cfdirectory name="filesOne" action="list" recurse="true" directory="#attributes.directoryOne#" />
    <cfdirectory name="filesTwo" action="list" recurse="true" directory="#attributes.directoryTwo#" />

       Ray's a mean bastard. It looks like you can't use subqueries in the WHERE clause of a QoQ,
       so my initial idea of just using SQL is bunk.
       So I says to myself: create a simple list and do an "IN".
       Doesn't help much with size/date comparison, though.
       Final answer = map keyed by relative path.

    <cfloop list="One,Two" index="i">
       <cfloop query="files#i#">
          <cfif variables["files" & i].type eq "File">
             <cfif i eq "One">
                <cfset origDir = attributes.directoryOne />
                <cfset origDir = attributes.directoryTwo />
             <cfset key = right(variables["files" & i].directory & "/", len(variables["files" & i].directory) - len(origDir) + 1) & variables["files" & i].name />
             <cfset variables["fileMap" & i][key] = structNew() />
             <cfset variables["fileMap" & i][key].dateLastModified = variables["files" & i].dateLastModified />
             <cfset variables["fileMap" & i][key].size = variables["files" & i].size />

    <!--- Build unique list of files. --->
    <cfset result.uniqueInFirstDirectory = "" />
    <cfloop collection="#fileMapOne#" item="i">
       <cfif not structKeyExists(fileMapTwo, i)>
          <cfset result.uniqueInFirstDirectory = listAppend(result.uniqueInFirstDirectory, i) />
    <cfset result.uniqueInSecondDirectory = "" />
    <cfloop collection="#fileMapTwo#" item="i">
       <cfif not structKeyExists(fileMapOne, i)>
          <cfset result.uniqueInSecondDirectory = listAppend(result.uniqueInSecondDirectory, i) />
    <cfset similarFileMap = structNew() >
    <cfloop collection="#fileMapOne#" item="i">
       <cfif structKeyExists(fileMapTwo, i)
                and (
                   fileMapTwo[i].size neq fileMapOne[i].size
                   or fileMapTwo[i].dateLastModified neq fileMapOne[i].dateLastModified
                and not structKeyExists(similarFileMap, i)
          <cfset similarFileMap[i] = i />

    <cfset result.similarFiles = structKeyList(similarFileMap) />

    <cfset caller[] = result />

  • Commented on 12-07-2007 at 10:38 AM
    I gave ColdFusion query of queries a shot:
  • Commented on 12-07-2007 at 10:42 AM
    I like Joe's approach. His Struct-key indexing will be faster and more scalable than my Query of queries approach :)
  • Commented on 12-07-2007 at 11:00 AM
    Joe, one comment. When writing a custom tag that only runs in start mode, I thnk it is must better to simply end your tag with

    <cfexit method="exittag">

    Instead of wrapping your entire tag in a CFIF. Not only is it less code, I think whole pages wrapped in one CFIF are bad form.
  • Commented on 12-07-2007 at 11:11 AM
    I was about to type this up but basically I would have done the same thing as Ben except:

    * instead of looping in the QoQ I would have done a NOT IN and valuelisted the column with a queryparam of list=true.
  • Commented on 12-07-2007 at 11:15 AM
    Not to bad at all. This could actually come in handy from time to time, so a very useful challange Ray.

    Here we go...

    <cffunction name="comparedirs" access="public" returntype="struct">

       <cfargument name="firstdir" type="string" required="yes">
       <cfargument name="seconddir" type="string" required="yes">

       <cfset returnvar = structNew() />

       <cfif not directoryExists(arguments.firstdir)>
          <cfthrow message="'Hey, this directory #arguments.firstdir# doesn't exist!'" />

       <cfif not directoryExists(arguments.seconddir)>
          <cfthrow message="'Hey, this directory #arguments.seconddir# doesn't exist!." />

       <cfdirectory name="myfisrtdir" action="list" recurse="true" directory="#arguments.firstdir#" />
       <cfdirectory name="myseconddir" action="list" recurse="true" directory="#arguments.seconddir#" />

       <cfset firstdirnameslist = "" />
       <cfset seconddirnameslist = "" />
       <!--- Setup the name list for both dirs --->
       <cfoutput query="myfisrtdir">
          <cfset firstdirnameslist = listappend(firstdirnameslist,name)>
       <cfoutput query="myseconddir">
          <cfset seconddirnameslist = listappend(seconddirnameslist,name)>
       <!--- Find all the unique names and put them in a list to return --->
       <cfset returnvar.uniqueTofirstdir = "" />
       <cfoutput query="myfisrtdir">
          <cfif listfindnocase(seconddirnameslist,name) is "No">
             <cfset returnvar.uniqueTofirstdir = listAppend(returnvar.uniqueTofirstdir, name) />
       <cfset returnvar.uniqueToseconddir = "" />
       <cfoutput query="myseconddir">
          <cfif listfindnocase(firstdirnameslist,name) is "No">
             <cfset returnvar.uniqueToseconddir = listAppend(returnvar.uniqueToseconddir, name) />

       <!--- Find all the files that almost match, but not quite and put them in a list to return --->
       <cfset returnvar.almostmatching = "" />
       <cfoutput query="myfisrtdir">
          <cfif listfindnocase(seconddirnameslist,name)>
             <cfquery name="getseconddirinfo" dbtype="query">
                SELECT dateLastModified, size FROM myseconddir
                WHERE Name = '#Name#'
             <!--- Compare the last modified and sizes of the file that existed in both dirs --->
             <cfif (dateLastModified NEQ getseconddirinfo.dateLastModified) OR (size NEQ getseconddirinfo.size)>
                <cfset returnvar.almostmatching = listAppend(returnvar.almostmatching, name) />
       <cfreturn returnvar />


    <cfset result = comparedirs(#ExpandPath( './A/' )#, #ExpandPath( './B/' )#)>
    <cfdump var="#result#">
  • Commented on 12-07-2007 at 11:19 AM

    My only concern with the CFQueryParam approach is that I have run into problems where I max out the number parameter bindings that a query can have :) That has only happened on direct SQL queries, so it might not pertain to query of queries, but I think anything over 3,000 bindings crashes the request (I have some NOT so well though out approaches!!)
  • Commented on 12-07-2007 at 11:50 AM
    @Ben - fair point. I've encountered the same thing in the past. I would just never allow a top level folder to reach that amount of files without some kind of re-organization. Personal habit though.
  • Commented on 12-07-2007 at 11:56 AM

    Agreed. If for no other reason (of which there are plenty), it would take my Explorer too long to load the list.

    Where I have run into the upper limit on param binding is when using massive ID lists. Sometimes I try to lump too much stuff into a single query.
  • Commented on 12-07-2007 at 12:05 PM
    QoQ solution:

    <cffunction name="DirDiff" returntype="query">
       <cfargument name="L" type="string" required="true">
       <cfargument name="R" type="string" required="true">
       <cfset var Result=QueryNew("Name,Side")>
       <cfset var LQ="">
       <cfset var RQ="">
       <cfif DirectoryExists(Arguments.L) AND DirectoryExists(Arguments.R)>
          <cfdirectory name="LQ" directory="#Arguments.L#" action="LIST">
          <cfdirectory name="RQ" directory="#Arguments.R#" action="LIST">
          <cfquery dbtype="query" name="Result">
          SELECT LQ.Name AS Name, 'LEFT' AS Side FROM LQ
          UNION ALL
          SELECT RQ.Name AS Name, 'RIGHT' AS Side FROM RQ
          UNION ALL
          SELECT LQ.Name AS Name, 'BOTH' AS Side FROM LQ, RQ WHERE (LQ.Name = RQ.Name)
          <cfquery dbtype="query" name="Result">
          SELECT Name AS Name, MIN(Side) AS Side FROM Result GROUP BY Name ORDER BY Name
       <cfreturn Result>
  • Commented on 12-07-2007 at 12:39 PM
    To handle the case where the files are in both places but are different, modify the first QoQ to:

    SELECT LQ.Name AS Name, 'LEFT' AS Side FROM LQ
    SELECT LQ.Name AS Name, 'BOTH' AS Side FROM LQ, RQ WHERE (LQ.Name = RQ.Name) AND (LQ.Size = RQ.Size) AND (LQ.DateLastModified = RQ.DateLastModified)

    It's totally cheating, I know. But I wonder how it scales in comparison with the Struct-indexed method ... ?
  • Commented on 12-07-2007 at 6:07 PM
    RickO's method looks the cleanest so far. Very slick way of doing it.

    RickO and Ben's are the only ones that don't fail badly when there's a comma in the file name too.

    With some performance testing, Joe's method vs RickO's I get the following averages:

    Joe: 177.565ms, 317.96ms, 155.65ms
    Joe (UDF): 155.5ms, 407.205ms, 100.025
    RickO: 104.63ms, 287.545ms, 71.435ms

    So I'd say it scales very well RickO! Also of importance is that custom tags aren't that much slower than functions.

    (Test was done on a Powerbook G4 1.33Ghz, 1.25GB of RAM, 4200RPM HD, CF8 Java 1.5, Dir1 Size: 140, Dir2 Size: 12, Executions: 200, Runs: 3)
  • Commented on 12-07-2007 at 8:08 PM
    Here is a link to my original solution.
    I look forward to comparing it to the posts above.
  • Commented on 12-08-2007 at 2:43 PM
    Rick rocks the ColdFusion hard core style :) I always learn something really cool from him.