ColdFusion Regex example - finding URLs in CSS

This post is more than 2 years old.

Todd Sharp and I were talking today about finding URLs within CSS. Specifically - things in the format of url(...). For example (and yes, I know this is a bad CSS example since it repeats the same ID, but I built it up with cut and paste):

#id{ background-image: url('foo.jpg'); }

#id{ background-image: url('goo.jpg'); }

#id{ background-image: url('doo.jpg'); }

Todd had the following regex courtesy of a blog post by Ben Nadel:

url\([^\)]+\)

And while this can return all the instances, it also includes the wrapping url(). I don't believe what Todd wanted could be done in one function call, but there are a few ways we could do it in a loop. Here is what i came up with.

<cfsavecontent variable="s"> #id{ background-image: url('foo.jpg'); }

#id{ background-image: url('goo.jpg'); }

#id{ background-image: url('doo.jpg'); }

</cfsavecontent>

<cfset matches = reFind("url(([^)]+))",s, 1, true)> <cfloop condition="matches.pos[1] gt 1"> <cfset match = mid(s, matches.pos[2], matches.len[2])> <cfset match = rereplace(match,"['""]", "", "all")> <cfoutput> match was #match#<p> </cfoutput> <cfset matches = reFind("url(([^)]+))",s, matches.pos[1]+matches.len[1], true)> </cfloop>

Basically I just use a conditioned loop and reFind. Note the use of the fourth argument to ensure I get subexpressions back. It may be hard to see, but I modified Ben's regex to add an additional ( and ) around the 'inner' portion of the match. The result of my reFind call will be both the entire match as well as the inner match. I do one more quick regex replacement to get rid of the single or double quotes (and I'm pretty sure this could be done in the original regex instead, but it's one more line so I won't be losing any sleep over it), and then I output the result. Given the sample string above, my results are:

match was foo.jpg
match was goo.jpg
match was doo.jpg
Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by andy matthews posted on 12/30/2010 at 1:38 AM

This is what I came up with:

\((?:'|").+?(?:'|")\)

Captures everything between parens, while not matching the single, or double, quotes.

Comment 2 by Raymond Camden posted on 12/30/2010 at 1:40 AM

Does it work if no single or double quotes exist? I believe I've seen

url(foo.jpg)

before.

Comment 3 by David Hammond posted on 12/30/2010 at 8:06 AM

It is possible to handle the quotes within the regular expression:

<cfset matches = reFind("url\(('|"")?([^\)'""]+)('|"")?\)", s, 1, true)>
<cfloop condition="matches.pos[1] gt 1">
<cfset match = mid(s, matches.pos[3], matches.len[3])>
<cfoutput>
match was #match#<p>
</cfoutput>
<cfset matches = reFind("url\(('|"")?([^\)'""]+)('|"")?\)", s, matches.pos[1]+matches.len[1], true)>
</cfloop>

Comment 4 by anthony posted on 12/31/2010 at 12:03 AM

(?<=url\().*(?=\)) would work if CF could do look behinds. You could use some java to do the regex for you.

Comment 5 by Rhysling posted on 12/31/2010 at 2:51 AM

For something this simple there's no way you need multiple find/replace steps.

Lookaheads and lookbehinds would be nice. It'd also be nice if ColdFusion didn't assume parentheses around the entire regex pattern, and then returned an array of the actual matched strings, too, rather than that len & pos junk, like every other language I've ever used.

I have a slightly odd way of writing regex, for readability...You don't have to escape most reserved characters inside of [], so I use those in favor in escaping whenever possible, which muddles readability for me.

url[(]+['"]?([^'")]*)['"]?[)]+

&lt;!--- Assumes CSS var ---&gt;
&lt;cfset regex = "url[(]+['""]?([^'"")]*)['""]?[)]+" /&gt;
&lt;cfset matches = reFind(regex, css, 1, true) /&gt;
&lt;cfoutput&gt;Dump #regex#&lt;/cfoutput&gt;&lt;cfdump var="#matches#" /&gt;
&lt;cfloop condition="matches.len[1] gt 1"&gt;
&lt;cfset strMatch = mid(css, matches.pos[2], matches.len[2])&gt;
&lt;cfoutput&gt;#strMatch#&lt;/cfoutput&gt;
&lt;cfset matches = reFind(regex, css, matches.pos[2]+matches.len[2], true) /&gt;
&lt;/cfloop&gt;

Comment 6 by Rhysling posted on 12/31/2010 at 3:03 AM

Or url[(]+['"\s]*?([^'")]*)['"\s]*?[)]+ for being careful of weird syntax that might nevertheless work.

Comment 7 by andy matthews posted on 12/31/2010 at 7:33 AM

Ray...

Change my previous example to:

\((?:'|")?.+?(?:'|")?\)

That says 0 or 1 single or double quote (but don't capture them).