Todd Sharp and I were talking today about finding URLs within CSS. Specifically - things in the format of url(...). For example (and yes, I know this is a bad CSS example since it repeats the same ID, but I built it up with cut and paste):
#id{
background-image: url('goo.jpg');
} #id{
background-image: url('doo.jpg');
}
#id{
background-image: url('foo.jpg');
}
Todd had the following regex courtesy of a blog post by Ben Nadel:
url\([^\)]+\)
And while this can return all the instances, it also includes the wrapping url(). I don't believe what Todd wanted could be done in one function call, but there are a few ways we could do it in a loop. Here is what i came up with.
#id{
background-image: url('goo.jpg');
} #id{
background-image: url('doo.jpg');
} </cfsavecontent> <cfset matches = reFind("url(([^)]+))",s, 1, true)>
<cfloop condition="matches.pos[1] gt 1">
<cfset match = mid(s, matches.pos[2], matches.len[2])>
<cfset match = rereplace(match,"['""]", "", "all")>
<cfoutput>
match was #match#<p>
</cfoutput>
<cfset matches = reFind("url(([^)]+))",s, matches.pos[1]+matches.len[1], true)>
</cfloop>
<cfsavecontent variable="s">
#id{
background-image: url('foo.jpg');
}
Basically I just use a conditioned loop and reFind. Note the use of the fourth argument to ensure I get subexpressions back. It may be hard to see, but I modified Ben's regex to add an additional ( and ) around the 'inner' portion of the match. The result of my reFind call will be both the entire match as well as the inner match. I do one more quick regex replacement to get rid of the single or double quotes (and I'm pretty sure this could be done in the original regex instead, but it's one more line so I won't be losing any sleep over it), and then I output the result. Given the sample string above, my results are:
match was foo.jpg
match was goo.jpg
match was doo.jpg
Archived Comments
This is what I came up with:
\((?:'|").+?(?:'|")\)
Captures everything between parens, while not matching the single, or double, quotes.
Does it work if no single or double quotes exist? I believe I've seen
url(foo.jpg)
before.
It is possible to handle the quotes within the regular expression:
<cfset matches = reFind("url\(('|"")?([^\)'""]+)('|"")?\)", s, 1, true)>
<cfloop condition="matches.pos[1] gt 1">
<cfset match = mid(s, matches.pos[3], matches.len[3])>
<cfoutput>
match was #match#<p>
</cfoutput>
<cfset matches = reFind("url\(('|"")?([^\)'""]+)('|"")?\)", s, matches.pos[1]+matches.len[1], true)>
</cfloop>
(?<=url\().*(?=\)) would work if CF could do look behinds. You could use some java to do the regex for you.
For something this simple there's no way you need multiple find/replace steps.
Lookaheads and lookbehinds would be nice. It'd also be nice if ColdFusion didn't assume parentheses around the entire regex pattern, and then returned an array of the actual matched strings, too, rather than that len & pos junk, like every other language I've ever used.
I have a slightly odd way of writing regex, for readability...You don't have to escape most reserved characters inside of [], so I use those in favor in escaping whenever possible, which muddles readability for me.
url[(]+['"]?([^'")]*)['"]?[)]+
<!--- Assumes CSS var --->
<cfset regex = "url[(]+['""]?([^'"")]*)['""]?[)]+" />
<cfset matches = reFind(regex, css, 1, true) />
<cfoutput>Dump #regex#</cfoutput><cfdump var="#matches#" />
<cfloop condition="matches.len[1] gt 1">
<cfset strMatch = mid(css, matches.pos[2], matches.len[2])>
<cfoutput>#strMatch#</cfoutput>
<cfset matches = reFind(regex, css, matches.pos[2]+matches.len[2], true) />
</cfloop>
Or url[(]+['"\s]*?([^'")]*)['"\s]*?[)]+ for being careful of weird syntax that might nevertheless work.
Ray...
Change my previous example to:
\((?:'|")?.+?(?:'|")?\)
That says 0 or 1 single or double quote (but don't capture them).