My coworker just asked for a quick way to find all the credit cards in a string. Here is a script I wrote up for it. This does not apply the Luhn algorithm on the results. It simply looks for 16 numbers in a row or 4 sets of 4 digits separated by a space. One nice thing you might like - and might not have seen before - is the use of reMatch, which finds, and returns, all the matches of a regex in a string. I'm sure someone is going to come around and write this 10x better, so consider it a challenge. ;)
<cfset reg = "\b[[:digit:]]{16,16}\b"> <cfdump var="#reMatch(reg, source)#"> <cfset reg2 = "\b[[:digit:]]{4,4} [[:digit:]]{4,4} [[:digit:]]{4,4} [[:digit:]]{4,4}\b">
<cfdump var="#reMatch(reg2, source)#"> <cfset megareg = "#reg#|#reg2#">
<cfdump var="#reMatch(megareg, source)#">
<cfsavecontent variable="source">
This is some text with 8902 1248 2381 3821 some crap in it. This
line will have 5, not 4, but we should still match the four 9999 8902 1248 2381 3821. And
now let's just do 16 in a row 4719209812347891 and then 15 471920981234789 and then 17 - 47192098123478910.
Testing my word boundary 1719209812347891.
</cfsavecontent>
Note - the first two dumps were simply for testing. If you just wanted the final result you would only run the last call.
Archived Comments
The regex lines up pretty closely with another source over here. http://www.regular-expressi... Their version looks like this \b(?:\d[ -]*?){13,16}\b The only real difference is their version is a little less strict to account for different cc number lengths and ways people may choose to format the cc number.
You don't need {16,16} - just {16} works, and is simpler.
And instead of [[:digit:]] why not just use \d which is far easier to read.
So, your expressions would end up as:
\b\d{16}\b
and:
\b\d{4} \d{4} \d{4} \d{4}\b
Except unfortunately it's not that simple. :(
Not all credit card numbers are 16 digits long.
Amex cards are 15, Diners Club is 14, Maestro varies from 13..19 digits
Which might makes things look like this:
\b[3-6]\d{15}\b ## standard
\b3[47]\d{13}\b ## amex
\b30[0-5]\d{11}\b ## diners
\b36\d{12}\b ## diners
\b50[123]\d{10,16}\b ## maestro
\b6\d{12,18}\b ## maestro
Without even getting into how they'd be written with spaces involved.
I think for that I'd probably be tempted to something like do \b[\d ]{13,23}\b to extract possible candidates, then do further filtering to remove false positives.
Although I can't actually think of a situation where I might want to extract card numbers from a string... what's your co-worker trying to do? Why does he have a bunch of credit card numbers? :S
To extend a little bit on Peter's response, I think I would come up with something like this:
\b\d{4}[ -]?\d{4}[ -]?\d{4}[ -]?\d{4}\b
This should match 16 straight digits or numbers separated by spaces or hyphens (which I think is pretty common).
Thanks Peter - I knew my {N,N} was off but had a total brain fart. As for reasons - the coworker is Lance above - I'll let him comment if he can share the reason, but one might imagine a basic "does this document contain sensitive material" type check.
@Peter- Ray pretty much summed up the reason. We needed to check a particular database column to determine if any of our users were passing along cc info as part of a more generic data entry UI. Which.... is something we don't recommend doing for all of the obvious reasons.
Not our use case, but the same Regex is useful if you are doing OCR on scanned documents and need to screen for sensitive info .
Ah cool, that makes sense.
Ray - the mobile view doesn't appear to have a way to add a comment? Did I miss it, or is that not added yet?
@Peter: No, it will not be supported in the next version either. You will be able to switch to the 'real' version easier though.
This works for me. Four consecutive numbers with a space in between or 16 consecutive numbers:
\b((\d{4}\s?){4}|\d{16})\b
http://myregexp.com/ - regex plugin for eclipse and online tester.
In the second part of your regex, did you mean \d in front or \b?
Ray V, the \s will match *any* whitespace character (including tab and newline), not just a regular space. It'll work, but might bring false positives.
Ray C - he's got a | for alternation so doesn't need any \b inside, just one at each end - i.e. \b(...|...)\b
Ohh - ok - I read \d as another boundry, not as a digit.
Good catch Peter! I substituted the \s with the unicode space character \u0020
\b((\d{4}\u0020?){4}|\d{16})\b
My favorite reference: Regular Expressions in 10 Minutes - Ben Forta - http://www.forta.com/books/...
This might not be really topical, but I work on large scale e-commerce systems so I find some funky stuff. Older Visa's can have 13 digits so they need to be taken into account. Searching through doc it may not be as important as in your shopping cart, etc.
Every major credit card ...
<pre class="code">
^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$
</pre>
Wow - that regex made my nose bleed. ;)