April 4, 2011 (This post is more than 2 years old.)

ColdFusion Quickie: Finding all the credit card numbers in a string

coldfusion

My coworker just asked for a quick way to find all the credit cards in a string. Here is a script I wrote up for it. This does not apply the Luhn algorithm on the results. It simply looks for 16 numbers in a row or 4 sets of 4 digits separated by a space. One nice thing you might like - and might not have seen before - is the use of reMatch, which finds, and returns, all the matches of a regex in a string. I'm sure someone is going to come around and write this 10x better, so consider it a challenge. ;)

<cfsavecontent variable="source"> This is some text with 8902 1248 2381 3821 some crap in it. This line will have 5, not 4, but we should still match the four 9999 8902 1248 2381 3821. And now let's just do 16 in a row 4719209812347891 and then 15 471920981234789 and then 17 - 47192098123478910. Testing my word boundary 1719209812347891. </cfsavecontent>

<cfset reg = "\b[[:digit:]]{16,16}\b">


<cfdump var="#reMatch(reg, source)#">
<cfset reg2 = "\b[[:digit:]]{4,4} [[:digit:]]{4,4} [[:digit:]]{4,4} [[:digit:]]{4,4}\b">
<cfdump var="#reMatch(reg2, source)#">

<cfset megareg = "#reg#|#reg2#"> <cfdump var="#reMatch(megareg, source)#">

Note - the first two dumps were simply for testing. If you just wanted the final result you would only run the last call.

Support this Content!

If you like this content, please consider supporting me. You can become a Patron, visit my Amazon wishlist, or buy me a coffee! Any support helps!

Want to get a copy of every new post? Use the form below to sign up for my newsletter.

Archived Comments

Comment 1 by Lance posted on 4/4/2011 at 8:01 PM

The regex lines up pretty closely with another source over here. http://www.regular-expressi... Their version looks like this \b(?:\d[ -]*?){13,16}\b The only real difference is their version is a little less strict to account for different cc number lengths and ways people may choose to format the cc number.

Comment 2 by Peter Boughton posted on 4/4/2011 at 8:17 PM

You don't need {16,16} - just {16} works, and is simpler.
And instead of [[:digit:]] why not just use \d which is far easier to read.

So, your expressions would end up as:

\b\d{16}\b

and:

\b\d{4} \d{4} \d{4} \d{4}\b

Except unfortunately it's not that simple. :(
Not all credit card numbers are 16 digits long.

Amex cards are 15, Diners Club is 14, Maestro varies from 13..19 digits

Which might makes things look like this:

\b[3-6]\d{15}\b ## standard
\b3[47]\d{13}\b ## amex
\b30[0-5]\d{11}\b ## diners
\b36\d{12}\b ## diners
\b50[123]\d{10,16}\b ## maestro
\b6\d{12,18}\b ## maestro

Without even getting into how they'd be written with spaces involved.

I think for that I'd probably be tempted to something like do \b[\d ]{13,23}\b to extract possible candidates, then do further filtering to remove false positives.

Although I can't actually think of a situation where I might want to extract card numbers from a string... what's your co-worker trying to do? Why does he have a bunch of credit card numbers? :S

Comment 3 by David Hammond posted on 4/4/2011 at 9:01 PM

To extend a little bit on Peter's response, I think I would come up with something like this:

\b\d{4}[ -]?\d{4}[ -]?\d{4}[ -]?\d{4}\b

This should match 16 straight digits or numbers separated by spaces or hyphens (which I think is pretty common).

Comment 4 by Raymond Camden posted on 4/4/2011 at 9:43 PM

Thanks Peter - I knew my {N,N} was off but had a total brain fart. As for reasons - the coworker is Lance above - I'll let him comment if he can share the reason, but one might imagine a basic "does this document contain sensitive material" type check.

Comment 5 by Lance posted on 4/4/2011 at 10:29 PM

@Peter- Ray pretty much summed up the reason. We needed to check a particular database column to determine if any of our users were passing along cc info as part of a more generic data entry UI. Which.... is something we don't recommend doing for all of the obvious reasons.

Not our use case, but the same Regex is useful if you are doing OCR on scanned documents and need to screen for sensitive info .

Comment 6 by Peter Boughton posted on 4/4/2011 at 10:38 PM

Ah cool, that makes sense.

Ray - the mobile view doesn't appear to have a way to add a comment? Did I miss it, or is that not added yet?

Comment 7 by Raymond Camden posted on 4/4/2011 at 10:56 PM

@Peter: No, it will not be supported in the next version either. You will be able to switch to the 'real' version easier though.

Comment 8 by Ray V posted on 4/5/2011 at 5:10 PM

This works for me. Four consecutive numbers with a space in between or 16 consecutive numbers:

\b((\d{4}\s?){4}|\d{16})\b

http://myregexp.com/ - regex plugin for eclipse and online tester.

Comment 9 by Raymond Camden posted on 4/5/2011 at 5:14 PM

In the second part of your regex, did you mean \d in front or \b?

Comment 10 by Peter Boughton posted on 4/5/2011 at 5:17 PM

Ray V, the \s will match *any* whitespace character (including tab and newline), not just a regular space. It'll work, but might bring false positives.

Ray C - he's got a | for alternation so doesn't need any \b inside, just one at each end - i.e. \b(...|...)\b

Comment 11 by Raymond Camden posted on 4/5/2011 at 5:26 PM

Ohh - ok - I read \d as another boundry, not as a digit.

Comment 12 by Ray V posted on 4/5/2011 at 7:00 PM

Good catch Peter! I substituted the \s with the unicode space character \u0020

\b((\d{4}\u0020?){4}|\d{16})\b

My favorite reference: Regular Expressions in 10 Minutes - Ben Forta - http://www.forta.com/books/...

Comment 13 by Robert Zehnder posted on 4/5/2011 at 10:41 PM

This might not be really topical, but I work on large scale e-commerce systems so I find some funky stuff. Older Visa's can have 13 digits so they need to be taken into account. Searching through doc it may not be as important as in your shopping cart, etc.

Comment 14 by Edward - Florida SEO posted on 4/6/2011 at 1:31 AM

Every major credit card ...

Comment 15 by Raymond Camden posted on 4/6/2011 at 1:53 AM

Wow - that regex made my nose bleed. ;)

Support this Content!

Archived Comments

Webmentions