How are Facebook and Google+ creating link previews?

Last night I noticed something interesting. I had added a link to a Google+ post (I’d post the link here, but it looks like you can’t edit a Google+ share setting after it is written) and noticed it used an image from the link in the post. It wasn’t a “URL Preview” (ie, a screen shot), but rather one of the images from the page itself. I decided to dig into this a bit and figure out what image it picked and why. Here is what I’ve found.

I began my search and - not surprisingly - immediately found an answer for Facebook’s link previews over on Stack Overflow: Facebook Post Link Image. Turns out that Facebook makes use of two possible values for their link previews:

  1. First, Facebook looks for an OpenGraph tag like so: <meta property="og:image" content="image url"/> There's a set of these OpenGraph tags to allow for even more customization of how Facebook "sees" your page. You can read the documentation for more detail on this. (That link is about their Like feature, but it applies in general.)
  2. If Facebook can't find that, it then looks for another link tag: <link rel="image_src" href="image url" />. If you have both of these tags, then Facebook gives preference to the OpenGraph tag. </ol>

    What's even cooler is that Facebook provides a "Lint" tool that allows you to test how it will parse your page: URL Linter. I encourage you to give this a try. It's probably worthwhile for your client sites as well.

    Unfortunately none of these worked for Google+. No amount of Googling helped. After some more testing I've been able to determine that Google+ simply uses the first image it finds. This seems odd. The first image in a web page is probably something layout related and not really "critical" to the page itself. That being said, it seems to be the logic Google uses. So consider this HTML.

    
    <html>
    <head>
    <title>Test Title</title>
    <meta name="description" content="A description for the page." />
    <link rel="image_src" href="http://www.raymondcamden.com/images/meatwork.jpg" />
    <meta property="og:image" content="http://www.coldfusionjedi.com/images/ScreenClip145.png"/> 
    </head>
    
    <body>
    
    <h1>A Test Title</h1>
    
    <img src="http://www.coldfusionjedi.com/images/eyeballs/right.jpg">
    
    <p>
    This is a page.
    </p>
    
    <img src="http://www.coldfusionjedi.com/images/IMAG0235.jpg">
    
    </body>
    </html>
    

    Given this HTML, Facebook will grab this URL for the preview: http://www.coldfusionjedi.com/images/ScreenClip145.png. Google+ will instead pick this one: http://www.coldfusionjedi.com/images/eyeballs/right.jpg. As much as I'm a Google+ fan now, I really think Facebook is making a much better choice.

    Ok, given the logic above, what about writing our own code to mimic this behavior? I wrote a simple UDF that accomplishes this - I'll post it to CFLib a bit later today.

    
    <cffunction name="getURLPreview" output="false" returnType="string">
    	<cfargument name="theurl" type="string" required="true">
    	<cfargument name="defaultimageurl" type="string" required="false" default="" hint="If we can't find an image, the UDF will return this."> 
    
    	<cfset var httpResult = "">
    	<cfset var html = "">
    	<cfset var match = "">
    	<cfset var srcmatch = "">
    
    	<!--- grab the html --->
    	<cfhttp url="#arguments.theurl#" result="httpResult">
    	<cfif httpResult.responseheader.status_code neq 200>
    		<cfreturn "">
    	</cfif>
    
    	<cfset html = httpResult.fileContent>
    
    	<!--- First look for meta/og:image --->
    	<!--- Example: <meta property="og:image" content="http://www.coldfusionjedi.com/images/ScreenClip145.png"/> --->
    	<cfset match = reFindNoCase("<meta[[:space:]]+property=""og:image""[[:space:]]+content=""(.+?)""[[:space:]]*/{0,1}>", html,1,1)>
    
    	<cfif match.pos[1] gt 0>
    		<cfreturn mid(html, match.pos[2], match.len[2])>
    	</cfif>
    
    	<!--- Then try link rel/image_src --->
    	<!--- Example: <link rel="image_src" href="http://www.coldfusionjedi.com/images/meatwork.jpg" /> --->
    	<cfset match = reFindNoCase("<link[[:space:]]+rel=""image_src""[[:space:]]+href=""(.+?)""[[:space:]]*/{0,1}>", html,1,1)>
    
    	<cfif match.pos[1] gt 0>
    		<cfreturn mid(html, match.pos[2], match.len[2])>
    	</cfif>
    
    	<!--- Finally, try ANY image --->
    	<cfset match = reMatchNoCase("<img.*?>",html)>
    	<cfif arrayLen(match) gte 1>
    		<!--- return the source --->
    		<cfset srcmatch = reFindNoCase("src=""(.+?)""", match[1],1,1)>
    		<cfreturn mid(match[1], srcmatch.pos[2], srcmatch.len[2])>
    	</cfif>
    
    	<cfreturn arguments.defaultimageurl>
    </cffunction>
    

    If you read slowly down the UDF you can see it attempts to mimic Facebook's logic first and then finally resorts to the 'first image on page' logic. It also allows for default image argument. Now personally - I don't necessarily think the first image on page thing is going to make sense. If you agree, just remove that block of code.

Raymond Camden's Picture

About Raymond Camden

Raymond is a developer advocate. He focuses on JavaScript, serverless and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Comments