Quick little regex example - Youtube video from URL

I'm almost scared to post this. Every time I post a regex example I typically get about 200 comments showing sexier, smaller, faster examples, but at the same time, I like good (and practical) examples like this. This is why regex was built! So - what's the example? Given a simple URL with a Youtube video ID in it, how do you extract just the ID? Here's the URL:

http://www.youtube.com/watch?v=f89niPP64Hg

Now - you could just treat that as a list and listLast it, but we don't know if there will ever be any additional URL parameters. What we really want is the value of "V". Here is the regex I used:

.*?v=([a-z0-9\-_]+).*

And here is a complete code template:

<cfset u = "http://www.youtube.com/watch?v=f89niPP64Hg> <cfset videoid = reReplaceNoCase(u, ".*?v=([a-z0-9\-_]+).*","\1")> <cfoutput>#u#, id=#videoid#</cfoutput>

Note that this not work with Youtube's short url version: http://youtu.be/f89niPP64Hg. For that, if I found youtu.be in the URL I'd probably just listLast with / as the delimiter.

Archived Comments

Comment 1 by Peter Boughton posted on 7/6/2011 at 2:55 AM

Well you asked for it. ;)

<cfset videoid = rereplace( u , "^[^?]+\?v=([^&##]+).*" , "\1" ) />

Doing the [^?]+ bit is faster (avoids backtracking), whilst doing the [^&##] instead of [a-z0-9\-_] means if YouTube change their IDs it'll always work.

With lookbehinds it could be done with a match instead of a replace, but CFML's default regex doesn't do lookbehind (so can't use rematch here), but in theory it could look like:
<cfset videoid = RegexMatch( "(?<=\?v=)[^&##]+" , u ) />

Of course, all this assumes you've got a youtube URL in that format... they've actually got a handful of different ways of referring to videos. :/

Comment 2 by Peter Boughton posted on 7/6/2011 at 3:27 AM

Ok, "handful" was a slight exageration, there's only actually three (I was thinking of http vs https stuff which isn't relevant).

This version should cater for "youtube.com/watch?v=id" and "youtube.com/v/id" and "youtube.com/embed/v/id":

<cfset videoid = rereplace( u , "^(?:[^?]+\?v=|[^v]+/v/)([^&##/]+).*", "\1" ) />

Though it does assume we never have "videos.youtube.com/v/id" (or anything else with a v before the important one), which is probably a safe bet for now, but of course not guaranteed.
( My brain has suddenly gone to sleep, otherwise I'd come up with something more sensible. :S )

Comment 3 by Raymond Camden posted on 7/6/2011 at 3:22 PM

Good ones, Peter, thanks. :)

Comment 4 by Andres Bastidas posted on 1/6/2012 at 12:28 AM

great example. I've added an additional check to get the id from the new share URLs too, with youtu.be:

<cfset regex = "^(?:[^?]+\?v=|[^v]+/v/)([^&##/]+).*|http://youtu.be/">

<cfset videoid = rereplace(u, regex, "\1" ) />

Comment 5 by kodulehe posted on 2/8/2012 at 2:48 PM

Good example, thanks! Searched this one some time.

Comment 6 by Simone posted on 11/20/2012 at 9:02 AM

You guys are rock stars!

Comment 7 by Dani Szwarc posted on 12/5/2013 at 1:10 AM

Glad I've found the info here and even more glad about the 200 (well, maybe less) comments showing sexier, smaller, faster examples.
I love the CF community.

Comment 8 by e-poe tegemine posted on 4/24/2014 at 1:45 AM

Thanks! Had one typo probably, but this example gave me right direction!