A few days ago a user reported an issue with my blog involving the comment form. Apparently he has an email address using one of the new TLDs (top level domains) that are cropping up, specifically "directory." I decided to do some testing to see how well ColdFusion supports these new TLDs.
First off, it was a bit difficult to find out what has been added recently, but I did find a Wikipedia page with everything listed: ICANN-era generic top-level domains. I knew new TLDs were coming, but my god, I had no idea how many and how... weird some of them were. I mean, I guess it is kind of cool that "blue" is a TLD. But... ok, whatever.
I decided to write a quick test script that would use isValid against some of these new TLDs. I wasn't going to try to type them, just a sample. Here is the script.
<cfscript>
tlds = "com,edu,directory,guru,gift,jobs,international,museum,name,sexy,social,tel,travel,ceo,cheap";
for(i=1; i<=listLen(tlds); i++) {
emailToTest = "foo@foo.#listgetAt(tlds, i)#";
writeoutput("Email: #emailToTest# isValid? #isValid('email',emailToTest)#<br>");
}
</cfscript>
As you can see, it just a simple list of TLDs. I iterate over them, create a test email address, and run isValid against it. Here are the results:
Email: foo@foo.com isValid? YES
Email: foo@foo.edu isValid? YES
Email: foo@foo.directory isValid? NO
Email: foo@foo.guru isValid? YES
Email: foo@foo.gift isValid? YES
Email: foo@foo.jobs isValid? YES
Email: foo@foo.international isValid? NO
Email: foo@foo.museum isValid? YES
Email: foo@foo.name isValid? YES
Email: foo@foo.sexy isValid? YES
Email: foo@foo.social isValid? YES
Email: foo@foo.tel isValid? YES
Email: foo@foo.travel isValid? YES
Email: foo@foo.ceo isValid? YES
Email: foo@foo.cheap isValid? YES
So most of them passed, but a few, like directory and international, did not. I couldn't figure out why until I noticed that both were a bit long. Then I figured it out. ColdFusion was simply checking the length of the TLD. As a test, I tried "abcdefg" as a TLD and it worked. As soon as I tried "abcdefgh", it failed. I'm going to report this as a bug.
As it stands, this blog uses a UDF to check for email validity. (The code began back in ColdFusion 6, so I've got a lot of skeletons in my code closet.) The UDF I'm using uses regular expressions and uses a TLD checker of "Either 2-3 characters or in this hard coded list." Here is the code now:
function isEmail(str) {
return (REFindNoCase("^['_a-z0-9-]+(\.['_a-z0-9-]+)*(\+['_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*\.(([a-z]{2,3})|(aero|asia|biz|cat|coop|info|museum|name|jobs|post|pro|tel|travel|mobi))$",arguments.str) AND len(listGetAt(arguments.str, 1, "@")) LTE 64 AND
len(listGetAt(arguments.str, 2, "@")) LTE 255) IS 1;
}
My thinking is that I'll just modify that first clause in the TLD section to allow for 2 to 30 characters, with 30 being pretty arbitrary. I'm open to suggestions!
Archived Comments
I recommend you don't invent a rule, you just follow the already-published and industry-accepted rules.
See here for testing of isValid("email"): http://cfmlblog.adamcameron...
There's a link in the to Wikipedia, which in turn links to the RFCs.
There's no rule specifically regarding TLD-length, so I recommend you don't invent one. There are, however, various rules for the domain part of an email address (composition, length, etc). All the rules are very well described, and easy to understand.
--
Adam
if you aren't doing bulk validation, you can always just look up the domain via DNS, networks and servers are a lot faster than they used to be
So, if it is so easy, why don't we have a good one at CFLib? ;)
Seriously though - I may just remove the server side checking completely. If someone wants to put a bad email address in, it's not like I care, and I'm also checking client side with the built in checking used with type="email" in modern browsers (which, by the way, had no issues at all with .directory) .
Zac, this wasn't for bulk testing, just for my blog. (Of course, the issue I talk about is more general than that.)
@Ray... I think it's because most CFML developers - like Adobe ColdFusion and Railo engineers - don't bother to read the RFCs, and just *guess* how to validate an address based on the subset of rules they are personally familiar with.
I'm with you... if you don't *care* about the user's email address, don't bother validating it. If you do actually care, then you need to tell them they need to give you a valid address, and then email them some sort of confirmation to respond to before they progress with [whatever]. That's the only reliable way to do it.
Of course this is overkill for a blog ;-)
--
Adam
That isn't the only problem ColdFusion has with email validation. They also don't allow a bunch of special characters in the user account portion that are perfectly valid according to the specs.
We ran into this issue a little while back when a user had an email address with an ampersand in it and our website wouldn't allow it because we were using CF's built-in validation. We eventually had to switch to our own RegEx. Off the top of my head, two characters CF can't handle are ampersand (&) and double-quote (").
I've had a bug open for isValid() since, roughly, forever : http://www.elliottsprehn.co...
Adobe don't seem to care.
I agree with the above though - if you care it's because you want to use it on behalf of the user, so validate it to the RFC and be sure to send it a confirmation link.
We quit using isValid() for validating email back in CF7. IsEmail() doesn't allow email addresses with the plus (+) symbol. Regex can get too complex & difficult to troubleshoot. I'm in the process of switching to this java library as it passes 164 unit tests and can optionally check DNS to verify whether a mail server (MX) record exists for the domain.
http://isemail.info/about
As just an FYI, in 10 at least, + is ok in the address name.
I compared isValid (BIF), isEmail (UDF) and IEMail (java, from 2010) on CF8, 9, 10 & 11. IsEMail was consistent on all CF versions and passed more unit tests. Since the BIF has finally been fixed on CF10/11, any idea if Adobe will fix it for customers that are still on 9 when the issues were first reported and it was the only version available? (It's not 12/31/2014 EOL already, is it?)
The amount of email validators that think bla@dawes.id.au is invalid is huge.
I agree, there is no need to have a hardcoded list.... not that it is hard to get hold of (http://au.godaddy.com/tlds/..... but bad idea
I've been of the opinion for a very long time now that a valid email address is one someone can send an email to and reach the intended recipient. A valid telephone number is one someone can call and reach the intended person. And a valid zip code is one that can be placed on an envelope and the envelope reaches the intended destination.
Programmers should not be attempting to "validate" these bits of information with limiting programming scripts. Only a user can be responsible for the actual validity of these bits of information, and programmers (especially US-based programmers) block or mangle valid information with their validation scripts all too often. And they often have no idea.
As an example, I've entered a valid URL for my website in the Website field above, www.aria-media.com. This address works when I use it in a browser, but I can't submit this comment because the validation script doesn't accept it. So I deleted the website from the field and try to submit again ...
I think is overkill to match what TLDs exist and what do not, but if one really wants to do that, updating a new version from Mozilla Public Suffix List every month might be an option:
https://publicsuffix.org
Nice of mozilla to put it in a non-standard format... not.
Either way most email servers will not let you authenticate a user these days. The only way to do that is send and email and wait for a bounce.
Email authentication was a great way to check a live address in the past, but too many spammers used it to setup lists, so any decent email server now will block this technique. I used to use this for email lists, now it's just easier to send an email. It gives you a lot more information about them anyway when they click on a link to verify they then go to the web browser where you can determine a whole bunch of further information (geo, and other demographics through google analytics, including sex, hobbies, etc)
---
getting back to the point, just write a regex that works... simple and test it with the entire godaddy or whatever list you want. also allow for plus addresses (new+accounts@me.com) and other punctuation like dot, dash, underscore in the account and use a domain regex for the domain.
I know this is an older post, but i just ran into this today testing tim.garver@gmail and isValid says that is valid. i have used this function for years and never know it failed on such a simple test is mind boggling. missing the TLD causes it to say yes its valid. insane. We are running CF11.6 i am going to test this on a CF901 box next week. Can anybody else confirm this on CF9x?
Thanks
Tim