Raymond Camden's Blog Rss

Friday Puzzle - Welcome to Santa's IT Department

17

Posted in ColdFusion | Posted on 12-01-2006 | 4,071 views

For today's puzzler you find yourself newly employed in Santa's Workshop. While Santa is quite magical, full of joy, blah blah blah, his IT skills leave a bit to be desired. In fact, his current IT department currently consists of one elf who wants to be a dentist and some large furry creature.

Your first task is simple. Santa maintains a list of kids who are naughty and nice. He used to keep this list on a scroll of parchment but after recently upgrading to Windows 95 (hey, he takes his time with changes!) Santa began using Notepad.

Unfortunately - Notepad isn't a very nice database and this has led to some problems. Duplicate names and kids on two lists.

Write a simple function to read in two text files (good.txt and naughty.txt). Compare the files and remove any child on both lists and add them to a new file, tobedecided.txt. Also remove any child who appears more than once in a list.

Good luck - Santa is counting on you!

Comments

[Add Comment] [Subscribe to Comments]

just names on the lists?
Nod. Just names. Santa is smart enough to know one John Smith from another. (Or maybe he isn't. Maybe next Friday... ;)
Originally my lists contained a name on each line, but was having trouble converting that to a list, what would be the delimiter? \r? Once I made my good and bad lists comma delimited it worked.

<cffile action="read" file="D:\path\good.txt" variable="goodkids" />
<cffile action="read" file="D:\path\bad.txt" variable="badkids" />
<cfloop list="#goodkids#" index="i">
   <cfif listFindNoCase(badkids, i) neq 0>
      <cffile action="write" file="D:\path\tobedeterminded.txt" output="#i#" addnewline="yes">
   </cfif>
</cfloop>
<cffile action="read" file="D:\path\tobedeterminded.txt" variable="tbd" />
<cfoutput><pre>#tbd#</pre></cfoutput>
Good start Tim - but don't forget you also need to remove (from both lists) the names of kids on both. The second task of removing dupes in general isn't quite as important.
Since I got distracted I'll concede victory to Tim.

Tim, to convert to a list, just steal - er borrow - some of ray's blog cfc code -

<cfset theList = replace(str, chr(10), ",", "all")>
<cfset theList = replace(str, chr(13), ",", "all")>
Also Tim, I'd maybe save the output in a variable and write to file once rather than for each iteration of the loop. &lt;/myTwoCents&gt;
Here is my method. It tags the good and naughty FILE PATHS and creates the tobedecided.txt in the same directory.


<cffunction name="FixSantasList" access="public" returntype="void" output="false">
   
   <!--- Define arguments. --->
   <cfargument name="GoodFile" type="string" required="true" />
   <cfargument name="NaughtyFile" type="string" required="true" />
   
   <!--- Define the local scope. --->
   <cfset var LOCAL = StructNew() />
   
   <!--- Read in good text. --->
   <cffile
      action="READ"
      file="#ARGUMENTS.GoodFile#"
      variable="LOCAL.GoodData"
      />
      
   <!--- Read in the naughty text. --->
   <cffile
      action="READ"
      file="#ARGUMENTS.NaughtyFile#"
      variable="LOCAL.NaughtyData"
      />
      
   <!---
      Create a file path for the undecided data. We will be putting
      this in the same directory as the other files.
   --->
   <cfset LOCAL.UndecidedFile = (
      GetDirectoryFromPath( ARGUMENTS.GoodFile ) &
      "tobedecided.txt"
      ) />
      
   
   <!--- Create indexes for the good, naugty, and undecided. --->
   <cfset LOCAL.GoodIndex = StructNew() />
   <cfset LOCAL.NaughtyIndex = StructNew() />
   <cfset LOCAL.UndecidedIndex = StructNew() />
   
   <!--- Loop over the good names and add to index. --->
   <cfloop index="LOCAL.Name" list="#LOCAL.GoodData#" delimiters="#Chr( 13 )##Chr( 10 )#">
   
      <cfset LOCAL.GoodIndex[ LOCAL.Name ] = LOCAL.Name />
   
   </cfloop>
   
   <!--- Loop over the naughty names and add to index. --->
   <cfloop index="LOCAL.Name" list="#LOCAL.NaughtyData#" delimiters="#Chr( 13 )##Chr( 10 )#">
   
      <cfset LOCAL.NaughtyIndex[ LOCAL.Name ] = LOCAL.Name />
   
   </cfloop>
   
   
   <!---
      ASSERT: At this point, the good and naughty indexes should
      ONLY have unique name entries. However, there may be names
      that exist in both.
   --->
   
   <!---
      Loop over the keys in the good index to see if they exist in
      the naughty index. If they do, they have to be removed from
      both and put into the undecided index.
   --->
   <cfloop item="LOCAL.Name" collection="#LOCAL.GoodIndex#">
      
      <!--- Check to see if it exists in naughty as well. --->
      <cfif StructKeyExists( LOCAL.NaughtyIndex, LOCAL.Name )>
      
         <!--- Add the name to the undecided index. --->
         <cfset LOCAL.UndecidedIndex[ LOCAL.Name ] = LOCAL.Name />
         
         <!--- Remove from the good and naughty index. --->
         <cfset StructDelete( LOCAL.GoodIndex, LOCAL.Name ) />
         <cfset StructDelete( LOCAL.NaughtyIndex, LOCAL.Name ) />
      
      </cfif>
   
   </cfloop>
   
   
   <!---
      ASSERT: At this point, all names should be in the appropriate
      indexes and should only appear one. Now, they need to be
      commited to file.
   --->
   
   
   <!--- Write the good names to the file. --->
   <cffile
      action="WRITE"
      file="#ARGUMENTS.GoodFile#"
      output="#ArrayToList( StructKeyArray( LOCAL.GoodIndex ), '#Chr( 13 )##Chr( 10 )#' ).Trim()#"
      />
      
   
   <!--- Write the naughty names to the file. --->
   <cffile
      action="WRITE"
      file="#ARGUMENTS.NaughtyFile#"
      output="#ArrayToList( StructKeyArray( LOCAL.NaughtyIndex ), '#Chr( 13 )##Chr( 10 )#' ).Trim()#"
      />
      
   <!--- Write the undecided names to the file. --->
   <cffile
      action="WRITE"
      file="#LOCAL.UndecidedFile#"
      output="#ArrayToList( StructKeyArray( LOCAL.UndecidedIndex ), '#Chr( 13 )##Chr( 10 )#' ).Trim()#"
      />
   
   <!---
      ***NOTE: I made this because I do NOT trust the case sensitivity of the
      key list returned by the structure. However, for short hand, I will use
      the above which does rely on sturct key last maintaining case sensitivity.
   --->
   <!---
   <!--- Create a string buffer for the good names. --->
   <cfset LOCAL.NameBuffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
   
   <!--- Loop over good names and add to string buffer. --->
   <cfloop item="LOCAL.Name" collection="#LOCAL.GoodIndex#">
   
      <cfset LOCAL.NameBuffer.Append(
         LOCAL.GoodIndex[ LOCAL.Name ] &
         Chr( 13) & Chr( 10 )
         ) />
   
   </cfloop>
   
   <!--- Write the buffer to the file. --->
   <cffile
      action="WRITE"
      file="#ARGUMENTS.GoodFile#"
      output="#LOCAL.NameBuffer.ToString().Trim()#"
      />
      
      
   <!--- Create a string buffer for the good names. --->
   <cfset LOCAL.NameBuffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
   
   <!--- Loop over naughty names and add to string buffer. --->
   <cfloop item="LOCAL.Name" collection="#LOCAL.NaughtyIndex#">
   
      <cfset LOCAL.NameBuffer.Append(
         LOCAL.NaughtyIndex[ LOCAL.Name ] &
         Chr( 13) & Chr( 10 )
         ) />
   
   </cfloop>
   
   <!--- Write the buffer to the file. --->
   <cffile
      action="WRITE"
      file="#ARGUMENTS.NaughtyFile#"
      output="#LOCAL.NameBuffer.ToString().Trim()#"
      />
   
   
   <!--- Create a string buffer for the good names. --->
   <cfset LOCAL.NameBuffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
   
   <!--- Loop over undecided names and add to string buffer. --->
   <cfloop item="LOCAL.Name" collection="#LOCAL.UndecidedIndex#">
   
      <cfset LOCAL.NameBuffer.Append(
         LOCAL.UndecidedIndex[ LOCAL.Name ] &
         Chr( 13) & Chr( 10 )
         ) />
   
   </cfloop>
   
   <!--- Write the buffer to the file. --->
   <cffile
      action="WRITE"
      file="#LOCAL.UndecidedFile#"
      output="#LOCAL.NameBuffer.ToString().Trim()#"
      />
      
   --->
      
   <!--- Return out. --->
   <cfreturn />
</cffunction>
Nice Ben. Also - people - note how his use of structs to suck in the names will also take care of dupes in the same list.
Thanks. I learned the Struct() as a look-up from Rick Osborne (one smart dude). Also, afterwards, I realize, I didn't need to convert to StructKeyArray() then to list. I could have gone right to StructKeyList() and passed in the same delimiters. I forgot that that function takes delimiters :D
OK, I changed my mind - here's my entry.

Since Santa loves reusability we decided to build an entire component for him. I hope someone catches my obscure RUN-DMC reference.

<cfcomponent displayname="santa">
   <cffunction name="toLines" returntype="string" hint="i make a list into lines" output="false">
      <cfargument name="str" hint="the list to turn into lines" required="true">
      <cfreturn replace(arguments.str, ",", chr(10), "all") />
   </cffunction>
   
   <cffunction name="toList" returntype="string" hint="i get the txt file and turn it into a list" output="false">
      <cfargument name="str" hint="the string to make a list out of" type="string" required="true">
      <cfset var returnStr = "">
      <cfset returnStr = replace(arguments.str, chr(10), ",", "all")>
      <cfset returnStr = replace(arguments.str, chr(13), ",", "all")>
      <cfreturn returnStr />
   </cffunction>
   
   <cffunction name="readFile" returntype="string" hint="i read the files" output="false">
      <cfargument name="fileToRead" hint="the full path to the file to read" required="true">
      <cfset var returnStr = "">
      <cffile action="read" file="#arguments.fileToRead#" variable="returnStr">
      <cfreturn returnStr />   
   </cffunction>
   
   <cffunction name="listNoDups" hint="i return a list with no duplicate items" output="false" returntype="string">
      <cfargument name="theList" hint="the list to clean up" required="true">
      <cfset var returnStr = "">
      <cfset var i = "">
      
      <cfloop from="1" to="#listLen(arguments.theList)#" index="i">
         <cfif NOT listContainsNoCase(returnStr, listGetAt(arguments.theList, i))>
            <cfset returnStr = listAppend(returnStr, listGetAt(arguments.theList, i))>
         </cfif>
      </cfloop>
      <cfreturn returnStr />
   </cffunction>
   
   <cffunction name="listCompare" hint="i compare two lists" output="false" returntype="string">
      <cfargument name="listOne" type="string" hint="the first list" required="true">
      <cfargument name="listTwo" type="string" hint="the second list" required="true">
      <cfset var tbdList = "">
      <cfset var i = "">
      <cfset var cleanListOne = listNoDups(arguments.listOne)>
      <cfset var cleanListTwo = listNoDups(arguments.listTwo)>
      <cfdump var="#listOne#"><cfdump var="#listTwo#">
      <cfloop from="1" to="#listLen(arguments.listOne)#" index="i">
         <cfif listContainsNoCase(cleanListTwo, listGetAt(cleanListOne, i))>
            <cfset tbdList = listAppend(tbdList, listGetAt(cleanListOne, i))>
         </cfif>
      </cfloop>
      <cfreturn tbdList />
   </cffunction>
   
</cfcomponent>

christmasInHollis.cfm
---------------------
<cfset variables.goodTxt = expandPath("good.txt")>
<cfset variables.naughtyTxt = expandPath("naughty.txt")>
<cfset variables.goodList = readFile(variables.goodTxt)>
<cfset variables.naughtyList = readFile(variables.naughtyTxt)>
<cfset variables.tbd = listCompare(toList(variables.goodList), toList(variables.naughtyList))>
<cffile action="write" file="#expandPath("tbd.txt")#" output="#toLines(variables.tbd)#">
Happy Friday everyone! :-) I don't think that Santa's machine has enough memory to read 2 HUGES text files (trillian records) then combine them. Beside, Coldfusion will take a lot of time to read them LINE-BY-LINE.... I'll show Santa to use free MySQL to do it... :-)

Please read this post from Ben Forta:
http://www.forta.com/blog/index.cfm/2006/12/1/Cold...
But you have to remember that Santa's computer is only limited by the amount of Christmas Cheer in the world, not by RAM and Processing speed ;)
Wow, this beats the heck out of real work.

<!-- Read --->
<cffile action="read" file="#expandPath("./good.txt")#" variable="goodFileContent" />
<cffile action="read" file="#expandPath("./bad.txt")#" variable="badFileContent" />

<!--- Maps elim dups --->
<cfset tbd = structNew() />
<cfset good = structNew() />
<cfset bad = structNew() />

<!--- Bad -> map --->
<cfloop list="#badFileContent#" index="i" delimiters="#chr(10)#">
   <cfset bad[i] = "" />
</cfloop>

<!--- Good -> filtered into approp. maps --->
<cfloop list="#goodFileContent#" index="i" delimiters="#chr(10)#">
   <cfset dupPosition = listFindNoCase(badFileContent, i, chr(10)) />
   <cfif dupPosition>
      <cfset tbd[i] = "" />
      <cfset structDelete(bad, i) />
   <cfelse>
      <cfset good[i] = "" />
   </cfif>
</cfloop>

<!--- Write --->
<cffile action="write" file="#expandPath("./newGood.txt")#" output="#listChangeDelims(structKeyList(good), chr(10))#" />
<cffile action="write" file="#expandPath("./newBad.txt")#" output="#listChangeDelims(structKeyList(bad), chr(10))#" />
<cffile action="write" file="#expandPath("./newTbd.txt")#" output="#listChangeDelims(structKeyList(tbd), chr(10))#" />
I agree with Tuyen and the Forta reference. Two large lists (hundreds of millions of rows apiece?), perfect fodder for a DBMS.

In my case, do two BULK INSERT statements, which would insert the data from the two lists into a SQL Server table in no time. Then, write one SQL query to generate the results you want.

Awesome ideas on the use of structures in this case to help handle duplicates!

Tony

PS - This page http://www.un.org/esa/socdev/ageing/agewpop1.htm seems to indicate that as of the year 2000, 30% of the population was 15 years old or younger. It doesn't say how many were "good" and "bad". :(
If Santa doesn't have a database handy, CF provides a workaround. CFHTTP to read in the files as queries, QoQ to extract distincts as well as any that appear in both lists, then write the files back out. Like this:

<cfset thisFileName = listLast(CGI.SCRIPT_NAME, "/") />
<cfset baseFolder = replace(CGI.SCRIPT_NAME, thisFileName, "") />
<cfset baseURL = "http://"; & CGI.SERVER_NAME & ":" & CGI.SERVER_PORT & baseFolder />
<cfset baseFilePath = replace(CGI.CF_TEMPLATE_PATH, thisFileName, "") />

<cfset goodURL = baseURL & "good.txt" />
<cfset naughtyURL = baseURL & "naughty.txt" />

<cfset goodFile = baseFilePath & "good.txt" />
<cfset naughtyFile = baseFilePath & "naughty.txt" />
<cfset undecidedFile = baseFilePath & "tobedecided.txt" />

<cfhttp method="get" columns="name" firstrowasheaders="false" url="#goodURL#" name="goodKids" />
<cfhttp method="get" columns="name" firstrowasheaders="false" url="#naughtyURL#" name="naughtyKids" />

<cfquery dbtype="query" name="uniqueGoodKids">
   SELECT DISTINCT name
   FROM goodKids
   WHERE name NOT IN (#quotedValueList(naughtyKids.name)#)
</cfquery>

<cfquery dbtype="query" name="uniqueNaughtyKids">
   SELECT DISTINCT name
   FROM naughtyKids
   WHERE name NOT IN (#quotedValueList(goodKids.name)#)
</cfquery>

<cfquery dbtype="query" name="undecidedKids">
   SELECT DISTINCT name
   FROM goodKids
   WHERE name IN (#quotedValueList(naughtyKids.name)#)
</cfquery>

<cfset newLine = "
" />

<cffile action="write" file="#goodFile#" addnewline="false" output="#valueList(uniqueGoodKids.name, newLine)#" fixnewline="true" />
<cffile action="write" file="#naughtyFile#" addnewline="false" output="#valueList(uniqueNaughtyKids.name, newLine)#" fixnewline="true" />
<cffile action="write" file="#undecidedFile#" addnewline="false" output="#valueList(undecidedKids.name, newLine)#" fixnewline="true" />
My approach:
--------------

<!--- Configuration settings --->
<CFSCRIPT>
sGoodFileName = "good";
sBadFileName = "naughty";
sFileExtension = "txt";
sTBD_FileName = "tobedecided." & sFileExtension;

sDSN = "MySQL_Test";
sDataBaseName = "SantySnoop";

/*--- Actually use different filenames for testing ease. ---
*/
sGoodFileIn = sGoodFileName & "_src." & sFileExtension;
sBadFileIn = sBadFileName & "_src." & sFileExtension;
sGoodFileOut = sGoodFileName & "_out." & sFileExtension;
sBadFileOut = sBadFileName & "_out." & sFileExtension;

sBasePath = GetDirectoryFromPath (GetCurrentTemplatePath ());
</CFSCRIPT>

<CFOUTPUT>
<title>Fixes Santa's naughty/nice lists</title>
<h1>Fixes Santa's naughty/nice lists</h1>
<p>
This application cleans up Santa's "Do not cry" and "Terrible Tot watch" lists to comply with NSA requirements
and USC BR-549 (13)(b)(ii) and TSA "whim of the week" TSA-88961473-5657-223.
</p>

<h3>Operation:</h3>
<ol>
<li> Read two source files, <i>#sGoodFileIn#</i> and <i>#sBadFileIn#</i>.
<li> Remove duplicate names (case insensitive) from each list.
<li> If any name appears on both lists, remove it to a third list, &quot;TBD&quot;.
<li> Write out the cleaned up &quot;good&quot; list to <i>#sGoodFileOut#</i>.
<li> Write out the cleaned up &quot;bad&quot; list to <i>#sBadFileOut#</i>.
<li> Write out any &quot;TBD&quot; names to <i>#sTBD_FileName#</i>.
<li> Done!
</ol>

<h3>Important!</h3>
<ul>
<li> Almost all of the work here is done in SQL (as it should be).
<li> In this case, we use MySQL only but ANY RDBMS will do this better than a pure Coldfusion approach.
<li> This code requires a valid MySQL datasource.
<li> It Creates 5 tables in a database named &quot;#sDataBaseName#&quot;.
<li> The text files are plain text with one name per line (EG &quot;Jim Johnson&quot;).
</ul>
</CFOUTPUT>


<!--- Create the database and tables if they do not already exist.
--->
<CFSCRIPT>
oDB_Func = CreateObject ("component", "cfcSantaListDB").oInit (sDSN, sDataBaseName);
oDB_Func.SetupDataBase ();
oDB_Func.CreateStandardNameTable ('tGoodListRaw', 'no');
oDB_Func.CreateStandardNameTable ('tBadListRaw', 'no');
oDB_Func.CreateStandardNameTable ('tGoodListFinal', 'yes');
oDB_Func.CreateStandardNameTable ('tBadListFinal', 'yes');
oDB_Func.CreateStandardNameTable ('tTBD_ListFinal', 'yes');
</CFSCRIPT>


<!--- Get the initial stats. --->
<CFSET iGoodRowsInitial= oDB_Func.iGetRowCount ('tGoodListFinal')>
<CFSET iBadRowsInitial = oDB_Func.iGetRowCount ('tBadListFinal')>
<CFSET iTBDRowsInitial = oDB_Func.iGetRowCount ('tTBD_ListFinal')>


<!--- Import the raw lists. --->
<CFSET iGoodRowsRaw = oDB_Func.iImportTextData ('tGoodListRaw', '#sBasePath##sGoodFileIn#')>
<CFSET iBadRowsRaw = oDB_Func.iImportTextData ('tBadListRaw', '#sBasePath##sBadFileIn#')>


<!--- Clean up duplicates in each list. --->
<CFSET iGoodRowsNew = oDB_Func.iAddNewNamesToList ('tGoodListRaw', 'tGoodListFinal')>
<CFSET iBadRowsNew = oDB_Func.iAddNewNamesToList ('tBadListRaw', 'tBadListFinal')>


<!--- Find the "TBD" names --->
<CFSET iNumTBD_Names = oDB_Func.iStripOutNiceNaughtyDoers ('tGoodListFinal', 'tBadListFinal', 'tTBD_ListFinal')>


<!--- Write the final files --->
<CFSET iGoodRowsFinal = oDB_Func.iExportTextData ('tGoodListFinal', '#sBasePath##sGoodFileOut#')>
<CFSET iBadRowsFinal = oDB_Func.iExportTextData ('tBadListFinal', '#sBasePath##sBadFileOut#')>
<CFSET iTBDRowsFinal = oDB_Func.iExportTextData ('tTBD_ListFinal', '#sBasePath##sTBD_FileName#')>


<!--- Report --->
<CFOUTPUT>
<h3>Results:</h3>
<table class="SantasJudgement">
<tr>
<th>&nbsp;</th>
<th scope="col">Nice</th>
<th scope="col">Naughty</th>
<th scope="col">Need more snooping</th>
</tr>
<tr>
<th scope="row">Initial unique names:</th>
<td>#iGoodRowsInitial#</td>
<td>#iBadRowsInitial#</td>
<td>#iTBDRowsInitial#</td>
</tr>
<tr>
<th scope="row">Raw lines read:</th>
<td>#iGoodRowsRaw#</td>
<td>#iBadRowsRaw#</td>
<td>na</td>
</tr>
<tr>
<th scope="row">Duplicate, blank, or preexisting names:</th>
<td>#iGoodRowsRaw - iGoodRowsNew#</td>
<td>#iBadRowsRaw - iBadRowsNew#</td>
<td>na</td>
</tr>
<tr>
<th scope="row">New names added:</th>
<td>#iGoodRowsNew#</td>
<td>#iBadRowsNew#</td>
<td>na</td>
</tr>
<tr>
<th scope="row">Names appearing on both lists:</th>
<td>#iGoodRowsInitial + iGoodRowsNew - iGoodRowsFinal#</td>
<td>#iBadRowsInitial + iBadRowsNew - iBadRowsFinal#</td>
<td>#iNumTBD_Names# new</td>
</tr>
<tr>
<th scope="row">Final number of unique names:</th>
<td>#iGoodRowsFinal#</td>
<td>#iBadRowsFinal#</td>
<td>#iTBDRowsFinal#</td>
</tr>
</table>
</CFOUTPUT>

<!---
<CFDUMP var="#variables#">
--->

*****************************************************************************
***** Source for cfcSantaListDB.cfc starts here. *****
*****************************************************************************
<CFCOMPONENT
displayname = "SantaList_DB_Object"
output = "no"
hint = "Encapsulates the DB functions for our Santa list application."
>
<!--- For this Q&D example code, we avoid the data abstraction layer and just
use MySQL syntax (Version 5.0).
--->

<!--- Private object globals. --->
<CFSCRIPT>
sDSN = "";
sDB_Name = "";
sLineTerm = "\r\n";

oRunTime = CreateObject ("java", "java.lang.Thread");
</CFSCRIPT>


<CFFUNCTION
name = "oInit"
displayname = "Initialize function/constructor"
output = "no"
returntype = "cfcSantaListDB"
hint = "Creates the object with the specified DSN and DB name."
>
<CFARGUMENT name="sDataSourceName" type="variablename" required="yes" hint="MySQL only for now.">
<CFARGUMENT name="sDatabaseName" type="variablename" default="SantySnoop">


<CFSET Variables.sDSN = Arguments.sDataSourceName>
<CFSET Variables.sDB_Name = Arguments.sDatabaseName>

<CFRETURN this>
</CFFUNCTION>


<CFFUNCTION name="SetupDataBase" output="no" returntype="void">

<CFQUERY name="qDB_Setup" datasource="#Variables.sDSN#">
CREATE DATABASE IF NOT EXISTS #Variables.sDB_Name#;
</CFQUERY>

<!--- Avoid race problems. Wait until DB shows up.
--->
<CFLOOP condition="1 EQ 1">
<CFSET Variables.oRunTime.sleep (200)> <!--- We often need to allow some time for MySQL to REALLY finish DB Alter ops. --->

<CFQUERY name="qDB_SetupChk" timeout="5" datasource="#Variables.sDSN#">
SHOW DATABASES LIKE '#LCase (Variables.sDB_Name)#';
</CFQUERY>

<CFIF qDB_SetupChk.RecordCount GTE 1>
<CFBREAK>
</CFIF>
</CFLOOP>
</CFFUNCTION>


<CFFUNCTION name="CreateStandardNameTable" output="no" returntype="void">
<CFARGUMENT name="sTableName" type="variablename" required="yes">
<CFARGUMENT name="bAddUniqueConstraint" type="boolean" default="no">

<CFQUERY name="qTableSetup" datasource="#Variables.sDSN#">
CREATE TABLE IF NOT EXISTS
#Variables.sDB_Name#.#Arguments.sTableName#
(
iId mediumint(9) NOT NULL auto_increment,
sName char(65) NOT NULL,
PRIMARY KEY (iId),
#IIf (Arguments.bAddUniqueConstraint, DE ("UNIQUE KEY sNameIdx "), DE ("INDEX "))# (sName)
);
</CFQUERY>

<!--- We don't seem to need to allow any extra time or checks for table creation.
SHOW TABLES FROM SantySnoop LIKE 'tGoodListRaw'
--->
</CFFUNCTION>


<CFFUNCTION name="iImportTextData" output="no" returntype="numeric">
<CFARGUMENT name="sTableName" type="variablename" required="yes">
<CFARGUMENT name="sFilePath" type="string" required="yes">
<CFARGUMENT name="bResetTable" type="boolean" default="yes">

<!--- Escape path slashes. --->
<CFSET var sEscFilePath = Replace (Arguments.sFilePath, "\", "\\", "ALL")>

<CFTRANSACTION>
<!--- CF BUG! The trailing select in the first query is not handled by CF!!
Thus we need to split the queries -- which we can do because we stay in
the same MySQL session.
--->
<CFQUERY name="qTextImport" datasource="#Variables.sDSN#">
<!--- Note that row_count() does not work with INFILE.
Also, spaces before () cause weird errors.
--->
<CFIF Arguments.bResetTable>
TRUNCATE TABLE #Variables.sDB_Name#.#Arguments.sTableName#;
</CFIF>

<!--- If we don't truncate the table, get the initial row count. --->
SET @iBeforeRaw = (SELECT Count(iID) FROM #Variables.sDB_Name#.#Arguments.sTableName#);

LOAD DATA INFILE '#sEscFilePath#' INTO TABLE #Variables.sDB_Name#.#Arguments.sTableName#
LINES TERMINATED BY '#Variables.sLineTerm#'
(sName);

SET @iAfterRaw = (SELECT Count(iID) FROM #Variables.sDB_Name#.#Arguments.sTableName#);

SELECT @iAfterRaw - @iBeforeRaw AS iRowsInserted; <!--- CF ignores!!! --->
</CFQUERY>

<CFQUERY name="qTextImportStat" datasource="#Variables.sDSN#">
SELECT @iAfterRaw - @iBeforeRaw AS iRowsInserted;
</CFQUERY>
</CFTRANSACTION>

<CFRETURN qTextImportStat['iRowsInserted'][1]>
</CFFUNCTION>


<CFFUNCTION name="iExportTextData" output="no" returntype="numeric">
<CFARGUMENT name="sTableName" type="variablename" required="yes">
<CFARGUMENT name="sFilePath" type="string" required="yes">

<!--- Escape path slashes. --->
<CFSET var sEscFilePath = Replace (Arguments.sFilePath, "\", "\\", "ALL")>

<CFIF FileExists (sEscFilePath)>
<CFFILE action="delete" file="#Arguments.sFilePath#">
</CFIF>

<CFTRANSACTION>
<!--- CF BUG! The trailing select in the first query is not handled by CF!!
Thus we need to split the queries -- which we can do because we stay in
the same MySQL session.
--->
<CFQUERY name="qTextExport" datasource="#Variables.sDSN#">
<!--- Note that row_count() does not work well here.
--->
SELECT
sName
FROM
#Variables.sDB_Name#.#Arguments.sTableName#
INTO OUTFILE
'#sEscFilePath#'
LINES TERMINATED BY
'#Variables.sLineTerm#'
;

SET @iWriteCnt = (SELECT ROW_COUNT()); <!--- This is wrong for many mysql versions! --->
</CFQUERY>

<CFQUERY name="qTextExportStat" datasource="#Variables.sDSN#">
SELECT
Count(iID) AS iRowsInserted
FROM
#Variables.sDB_Name#.#Arguments.sTableName#;
</CFQUERY>
</CFTRANSACTION>

<CFRETURN qTextExportStat['iRowsInserted'][1]>
</CFFUNCTION>


<CFFUNCTION name="iGetRowCount" output="no" returntype="numeric">
<CFARGUMENT name="sTableName" type="variablename" required="yes">

<CFQUERY name="qGetRowCount" datasource="#Variables.sDSN#">
SELECT
Count(iID) AS iNumRows
FROM
#Variables.sDB_Name#.#Arguments.sTableName#;
</CFQUERY>

<CFRETURN qGetRowCount['iNumRows'][1]>
</CFFUNCTION>


<CFFUNCTION name="iAddNewNamesToList" output="no" returntype="numeric">
<CFARGUMENT name="sSrcTableName" type="variablename" required="yes">
<CFARGUMENT name="sDestTableName" type="variablename" required="yes">
<CFARGUMENT name="bResetTable" type="boolean" default="no">

<CFTRANSACTION>
<!--- CF BUG! The trailing select in the first query is not handled by CF!!
Thus we need to split the queries -- which we can do because we stay in
the same MySQL session.
--->
<CFQUERY name="qNameClean" datasource="#Variables.sDSN#">
<CFIF Arguments.bResetTable>
TRUNCATE TABLE #Variables.sDB_Name#.#Arguments.sDestTableName#;
</CFIF>

Use #Variables.sDB_Name#;

INSERT INTO
#Arguments.sDestTableName# (sName)
SELECT DISTINCT
tRaw.sName
FROM
#Arguments.sSrcTableName# tRaw
LEFT JOIN
#Arguments.sDestTableName# tFin ON tRaw.sName = tFin.sName
WHERE
tFin.sName IS NULL
AND
tRaw.sName IS NOT NULL
AND
Trim(tRaw.sName) <> ''
;

SET @iNewRows = (SELECT ROW_COUNT());
</CFQUERY>

<CFQUERY name="qNameCleanStat" datasource="#Variables.sDSN#">
SELECT IFNULL(@iNewRows, 0) AS iRowsInserted;
</CFQUERY>
</CFTRANSACTION>

<CFRETURN qNameCleanStat['iRowsInserted'][1]>
</CFFUNCTION>


<CFFUNCTION name="iStripOutNiceNaughtyDoers" output="no" returntype="numeric">
<CFARGUMENT name="sGoodTableName" type="variablename" required="yes">
<CFARGUMENT name="sBadTableName" type="variablename" required="yes">
<CFARGUMENT name="sTBD_TableName" type="variablename" required="yes">
<CFARGUMENT name="bResetTable" type="boolean" default="no">

<CFTRANSACTION>
<!--- CF BUG! The trailing select in the first query is not handled by CF!!
Thus we need to split the queries -- which we can do because we stay in
the same MySQL session.
--->
<CFQUERY name="qTBD_Name" datasource="#Variables.sDSN#">
<CFIF Arguments.bResetTable>
TRUNCATE TABLE #Variables.sDB_Name#.#Arguments.sTBD_TableName#;
</CFIF>

Use #Variables.sDB_Name#;

INSERT INTO
#Arguments.sTBD_TableName# (sName)
SELECT
tBad.sName
FROM
#Arguments.sBadTableName# tBad
INNER JOIN
#Arguments.sGoodTableName# tGood ON tBad.sName = tGood.sName
LEFT JOIN
#Arguments.sTBD_TableName# tTBD ON tBad.sName = tTBD.sName
WHERE
tTBD.sName IS NULL
;

SET @iNewRows = (SELECT ROW_COUNT());


DELETE FROM
tBad
USING
#Arguments.sBadTableName# tBad,
#Arguments.sTBD_TableName# tTBD
WHERE
tBad.sName = tTBD.sName
;


DELETE FROM
tGood
USING
#Arguments.sGoodTableName# tGood,
#Arguments.sTBD_TableName# tTBD
WHERE
tGood.sName = tTBD.sName
;
</CFQUERY>

<CFQUERY name="qTBD_NameStat" datasource="#Variables.sDSN#">
SELECT IFNULL(@iNewRows, 0) AS iRowsInserted;
</CFQUERY>
</CFTRANSACTION>

<CFRETURN qTBD_NameStat['iRowsInserted'][1]>
</CFFUNCTION>

</CFCOMPONENT>
Totally agree with you, Tony, about BULK INSERT. Wow, that was my first solution. Whenever dealing with MILLION records, BULK INSERT would be the best solution.... Or we can use PERL HASH TABLE (run on windows comnand) which is much faster than Coldfusion Struct (I guess)...

Great Post!

[Add Comment] [Subscribe to Comments]