Finding dates in a string using ColdFusion

This post is more than 2 years old.

A reader of mine had an interesting question. Is it possible to find all the dates in a string? In theory you could parse all the words and attempt to turn each into a date. You would need to check each word and a "reasonable" amount of words after it. Perhaps up to 4. I decided to take an initial stab at a simpler solution - looking just for dates in the form of mm/dd/yyyy. (Note to all of my readers outside of America. The code I'm showing here would actually work fine in your locales as well.)

First - let's create a simple string.

<cfsavecontent variable="str"> This is some text. I plan on taking over the world on 12/1/2011. After I do that, I plan on establishing the Beer Empire on 1/2/2012. But on 3/3/2013 I'll take a break. But this 13/91/20 is not a valid date. </cfsavecontent>

Now let's do a regex based on Number/Number/Number.

<cfset possibilities = reMatch("\d+/\d+/\d+", str)>

This gives us an array of possible matches that we can loop over:

<cfloop index="w" array="#possibilities#"> <cfif isDate(w)> <cfoutput>#w# is a date.<br/></cfoutput> </cfif> </cfloop>

Which gives us...

12/1/2011 is a date.
1/2/2012 is a date.
3/3/2013 is a date.

Any thoughts on this technique? The entire template is below.

<cfsavecontent variable="str"> This is some text. I plan on taking over the world on 12/1/2011. After I do that, I plan on establishing the Beer Empire on 1/2/2012. But on 3/3/2013 I'll take a break. But this 13/91/20 is not a valid date. </cfsavecontent>

<cfset possibilities = reMatch("\d+/\d+/\d+", str)> <cfloop index="w" array="#possibilities#"> <cfif isDate(w)> <cfoutput>#w# is a date.<br/></cfoutput> </cfif> </cfloop>

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Michael Zock posted on 8/21/2011 at 12:31 AM

It depends on how versatile you want the whole thing to be.
The 1-12 and 1-31 limitation can already be done inside the regular expression as well, but once the source adds dates with (international) formats like YYYY-MM-DD or DD.MM.YYYY or just two-digit years there's a lot more work left to do.

Comment 2 by Adam Cameron posted on 8/21/2011 at 1:31 PM

Hi Michael: I can't see a way of dealing with the fact 29/2/2012 is a date but 29/2/2013 is not. So I think one is still going to need to check each match anyhow, so there's perhaps a balance to be reached between complexity of regex and expectations of false positives (which are then dealt with via the date check in the loop)?

--
Adam

Comment 3 by Tom Chiverton posted on 8/22/2011 at 1:23 PM

Plus, certainly in the past, I've had issues with ambiguous dates (such as 2/3/99) being parsed US style rather than UK style, despite the server locale.
It's a nasty area...

BTW did you mean to except single digit years ? \d{1,2}/\d{1,2}/\d{4}|\d{2}/ might be a better expression to capture just 'natural' style possible dates.

Comment 4 by Raymond Camden posted on 8/22/2011 at 2:23 PM

Tom - I think US/UK could be ignored if you assumed all the dates mentioned in text applied to your current locale. (Or the current locale as set by setLocale.)

Good point on the {} range.

Comment 5 by Ed "SteelValor" Sals posted on 8/22/2011 at 4:54 PM

Thanks Ray! I had been banging my head on this problem for two full days until I got the sense to contact a Jedi. =]

Comment 6 by Josh Curtiss posted on 9/13/2011 at 10:11 PM

For fun and out of desperation to procrastinate, I came up with this RegEx:

(\d+/\d+/\d+)|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(t|tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s*\d+(st|nd|rd|th)?,\s*\d{4})|(\d+-\w{3}-\d+)

In addition to mm/dd/yy or mm/dd/yyyy etc, it also supports stuff like "Aug 1, 2010" and "September 3rd, 2010" as well as DD-MMM-YY.

ColdFusion will recognize all of those with the exception of the "3rd", "2nd", etc, so doing a quick REReplaceNoCase takes care of that exception:

REReplaceNoCase(w,"(st|nd|rd|th)?,",",")

Ok I guess I should get back to work.

Comment 7 by Raymond Camden posted on 9/13/2011 at 10:27 PM

That's pretty epic. :)