Posted in ColdFusion | Posted on 07-26-2011 | 4,608 views
Last night I noticed something interesting. I had added a link to a Google+ post (I'd post the link here, but it looks like you can't edit a Google+ share setting after it is written) and noticed it used an image from the link in the post. It wasn't a "URL Preview" (ie, a screen shot), but rather one of the images from the page itself. I decided to dig into this a bit and figure out what image it picked and why. Here is what I've found.
I began my search and - not surprisingly - immediately found an answer for Facebook's link previews over on Stack Overflow: Facebook Post Link Image. Turns out that Facebook makes use of two possible values for their link previews:
- First, Facebook looks for an OpenGraph tag like so: <meta property="og:image" content="image url"/> There's a set of these OpenGraph tags to allow for even more customization of how Facebook "sees" your page. You can read the documentation for more detail on this. (That link is about their Like feature, but it applies in general.)
- If Facebook can't find that, it then looks for another link tag: <link rel="image_src" href="image url" />. If you have both of these tags, then Facebook gives preference to the OpenGraph tag.
What's even cooler is that Facebook provides a "Lint" tool that allows you to test how it will parse your page: URL Linter. I encourage you to give this a try. It's probably worthwhile for your client sites as well.
Unfortunately none of these worked for Google+. No amount of Googling helped. After some more testing I've been able to determine that Google+ simply uses the first image it finds. This seems odd. The first image in a web page is probably something layout related and not really "critical" to the page itself. That being said, it seems to be the logic Google uses. So consider this HTML.
2<head>
3<title>Test Title</title>
4<meta name="description" content="A description for the page." />
5<link rel="image_src" href="http://www.coldfusionjedi.com/images/meatwork.jpg" />
6<meta property="og:image" content="http://www.coldfusionjedi.com/images/ScreenClip145.png"/>
7</head>
8
9<body>
10
11<h1>A Test Title</h1>
12
13<img src="http://www.coldfusionjedi.com/images/eyeballs/right.jpg">
14
15<p>
16This is a page.
17</p>
18
19<img src="http://www.coldfusionjedi.com/images/IMAG0235.jpg">
20
21</body>
22</html>
Given this HTML, Facebook will grab this URL for the preview: http://www.coldfusionjedi.com/images/ScreenClip145.png. Google+ will instead pick this one: http://www.coldfusionjedi.com/images/eyeballs/right.jpg. As much as I'm a Google+ fan now, I really think Facebook is making a much better choice.
Ok, given the logic above, what about writing our own code to mimic this behavior? I wrote a simple UDF that accomplishes this - I'll post it to CFLib a bit later today.
2 <cfargument name="theurl" type="string" required="true">
3 <cfargument name="defaultimageurl" type="string" required="false" default="" hint="If we can't find an image, the UDF will return this.">
4
5 <cfset var httpResult = "">
6 <cfset var html = "">
7 <cfset var match = "">
8 <cfset var srcmatch = "">
9
10 <!--- grab the html --->
11 <cfhttp url="#arguments.theurl#" result="httpResult">
12 <cfif httpResult.responseheader.status_code neq 200>
13 <cfreturn "">
14 </cfif>
15
16 <cfset html = httpResult.fileContent>
17
18 <!--- First look for meta/og:image --->
19 <!--- Example: <meta property="og:image" content="http://www.coldfusionjedi.com/images/ScreenClip145.png"/> --->
20 <cfset match = reFindNoCase("<meta[[:space:]]+property=""og:image""[[:space:]]+content=""(.+?)""[[:space:]]*/{0,1}>", html,1,1)>
21
22 <cfif match.pos[1] gt 0>
23 <cfreturn mid(html, match.pos[2], match.len[2])>
24 </cfif>
25
26 <!--- Then try link rel/image_src --->
27 <!--- Example: <link rel="image_src" href="http://www.coldfusionjedi.com/images/meatwork.jpg" /> --->
28 <cfset match = reFindNoCase("<link[[:space:]]+rel=""image_src""[[:space:]]+href=""(.+?)""[[:space:]]*/{0,1}>", html,1,1)>
29
30 <cfif match.pos[1] gt 0>
31 <cfreturn mid(html, match.pos[2], match.len[2])>
32 </cfif>
33
34 <!--- Finally, try ANY image --->
35 <cfset match = reMatchNoCase("<img.*?>",html)>
36 <cfif arrayLen(match) gte 1>
37 <!--- return the source --->
38 <cfset srcmatch = reFindNoCase("src=""(.+?)""", match[1],1,1)>
39 <cfreturn mid(match[1], srcmatch.pos[2], srcmatch.len[2])>
40 </cfif>
41
42 <cfreturn arguments.defaultimageurl>
43</cffunction>
If you read slowly down the UDF you can see it attempts to mimic Facebook's logic first and then finally resorts to the 'first image on page' logic. It also allows for default image argument. Now personally - I don't necessarily think the first image on page thing is going to make sense. If you agree, just remove that block of code.


Thanks for this example Ray. Have you thought about extending it to optionally return an array of _n_ matched images? Or maybe _n_ matched images from the FB model?
Finally, I agree that FB's approach is much better. I imagine that Google+ will steal it soon.
Re: G+, that's my experience too. Once G+ is completely open and public, I will probably leave FB for good, or just ignore it.
The arrows will allow you to scroll through the images till you are happy with the one you want to associate with the link, or you can use the X button to not add an image.
Of course that was for Google+, on FaceBook there is something underneath the description of the site that is very similar.
Hope that helps.
Onward to Open Graph I guess...
If you paste that as a new link, the image preview should have the arrows that Andrew mentioned.
No image before, now with image.
https://plus.google.com/101776844630674138344/post...
like this "बाà¤?à¤?à¥?लादà¥?श, पाà¤?िसà¥?तान à¤à¤²à¥? दà¥? दà¥?श हà¥?à¤?, à¤..."
is it a cache problem ?
This link
http://www.shreshthbharat.in/2011/07/06/bangladesh...
[Add Comment] [Subscribe to Comments]