Finding dates in a string using ColdFusion

A reader of mine had an interesting question. Is it possible to find all the dates in a string? In theory you could parse all the words and attempt to turn each into a date. You would need to check each word and a "reasonable" amount of words after it. Perhaps up to 4. I decided to take an initial stab at a simpler solution - looking just for dates in the form of mm/dd/yyyy. (Note to all of my readers outside of America. The code I'm showing here would actually work fine in your locales as well.)

First - let's create a simple string.

<cfsavecontent variable="str"> This is some text. I plan on taking over the world on 12/1/2011. After I do that, I plan on establishing the Beer Empire on 1/2/2012. But on 3/3/2013 I'll take a break. But this 13/91/20 is not a valid date. </cfsavecontent>

Now let's do a regex based on Number/Number/Number.

<cfset possibilities = reMatch("\d+/\d+/\d+", str)>

This gives us an array of possible matches that we can loop over:

<cfloop index="w" array="#possibilities#"> <cfif isDate(w)> <cfoutput>#w# is a date.<br/></cfoutput> </cfif> </cfloop>

Which gives us...

12/1/2011 is a date.
1/2/2012 is a date.
3/3/2013 is a date.

Any thoughts on this technique? The entire template is below.

<cfsavecontent variable="str"> This is some text. I plan on taking over the world on 12/1/2011. After I do that, I plan on establishing the Beer Empire on 1/2/2012. But on 3/3/2013 I'll take a break. But this 13/91/20 is not a valid date. </cfsavecontent>

<cfset possibilities = reMatch("\d+/\d+/\d+", str)> <cfloop index="w" array="#possibilities#"> <cfif isDate(w)> <cfoutput>#w# is a date.<br/></cfoutput> </cfif> </cfloop>

Archived Comments

Comment 1 by Michael Zock posted on 8/21/2011 at 12:31 AM

It depends on how versatile you want the whole thing to be.
The 1-12 and 1-31 limitation can already be done inside the regular expression as well, but once the source adds dates with (international) formats like YYYY-MM-DD or DD.MM.YYYY or just two-digit years there's a lot more work left to do.

Comment 2 by Adam Cameron posted on 8/21/2011 at 1:31 PM

Hi Michael: I can't see a way of dealing with the fact 29/2/2012 is a date but 29/2/2013 is not. So I think one is still going to need to check each match anyhow, so there's perhaps a balance to be reached between complexity of regex and expectations of false positives (which are then dealt with via the date check in the loop)?

--
Adam

Comment 3 by Tom Chiverton posted on 8/22/2011 at 1:23 PM

Plus, certainly in the past, I've had issues with ambiguous dates (such as 2/3/99) being parsed US style rather than UK style, despite the server locale.
It's a nasty area...

BTW did you mean to except single digit years ? \d{1,2}/\d{1,2}/\d{4}|\d{2}/ might be a better expression to capture just 'natural' style possible dates.

Comment 4 by Raymond Camden posted on 8/22/2011 at 2:23 PM

Tom - I think US/UK could be ignored if you assumed all the dates mentioned in text applied to your current locale. (Or the current locale as set by setLocale.)

Good point on the {} range.

Comment 5 by Ed "SteelValor" Sals posted on 8/22/2011 at 4:54 PM

Thanks Ray! I had been banging my head on this problem for two full days until I got the sense to contact a Jedi. =]

Comment 6 by Josh Curtiss posted on 9/13/2011 at 10:11 PM

For fun and out of desperation to procrastinate, I came up with this RegEx:

(\d+/\d+/\d+)|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(t|tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s*\d+(st|nd|rd|th)?,\s*\d{4})|(\d+-\w{3}-\d+)

In addition to mm/dd/yy or mm/dd/yyyy etc, it also supports stuff like "Aug 1, 2010" and "September 3rd, 2010" as well as DD-MMM-YY.

ColdFusion will recognize all of those with the exception of the "3rd", "2nd", etc, so doing a quick REReplaceNoCase takes care of that exception:

REReplaceNoCase(w,"(st|nd|rd|th)?,",",")

Ok I guess I should get back to work.

Comment 7 by Raymond Camden posted on 9/13/2011 at 10:27 PM

That's pretty epic. :)