For a while now I've made use of a service called Twilert. The site has one simple purpose. It allows you to create Twitter search profiles and generate an email report to you daily (or weekly, etc). I thought it might be interesting to look at how difficult this would be to build in ColdFusion. Luckily Twitter goes a long way to providing both a simple to use API and a very powerful API as well. Here's what I came up with - and hopefully this can be useful to others.
First - let me define what I want to build. Like the Twilert service, I'll start with a set of search terms. I'll perform my search daily via a scheduled task that runs right past midnight and then delivers the report to me via email. The Twitter API is very nicely documented. In particular, the Search API is the one we care about. Also of note are the rate limits Twitter applies. While my code won't hit that limit, it is something to keep in mind. I'd suggest spending a few minutes scanning all of the previous links to get a feel for the Twitter API and what is supports. Now that you've done done (ok, be honest, if you are like me, you probably decided to skip it and read it later), let's start to build out our report generator.
First, the search term. This could be dynamic, perhaps based on the URL, which would then make it easy to set up a few scheduled tasks, each with different values. For now though I just hard coded it:
<!--- Search terms, max 140, minus date portion --->
<cfset search = "coldfusion">
Twitter supports basic AND/OR style searches as well. But I'll keep it simple and just one word. Now, I mentioned the rate limits before. Another thing to note is that when you perform a search, you can only return 100 results at one time. Twitter supports a Page attribute, but they limit you to 15 pages. That's 1500 results which seems a bit much, especially for an email. I created a variable to represent the total number of network requests, or pages, of data to get:
<!--- Max number of HTTP requests --->
<cfset maxRequests = 10>
For the most part, this is pretty arbitrary. If I got an email with 1000 results in it I doubt I'd read past the first twenty or so. Obviously this is something you can change to your liking, within the limits of Twitter's API.
<!--- current page --->
<cfset page = 1>
<!--- max results per page is 100 --->
<cfset max = 100>
The page variable just tracks the current page and max will be sent to Twitter to request the maximum amount of results possible.
<!--- Loop until we run out of results or hit maxRequests. Use a simple boolean to check both --->
<cfset done = false>
<!--- A flag to see if something went wrong. --->
<cfset errorFlag = false>
<!--- A flag to determine if we maxed out our search --->
<cfset maxFlag = false>
These three variables are just flags. I'll be using the done variable in a loop coming up. The errorFlag will notice if something goes wrong with one of the HTTP calls. The maxFlag will be used if we hit the maximum number of requests.
<!--- append yesterdays date to the search url --->
<cfset yesterday = dateAdd("d", -1, now())>
<cfset searchURL = search & " since:#dateFormat(yesterday,'yyyy-mm-dd')#">
<cfset searchURL = urlEncodedFormat(search)>
Next up we add the date filter to our search terms. Remember I'm running this every day so I want to limit the results to entries from yesterday. This is done with the since operator. Twitter also supports an until operator, but as I plan on running this report right past midnight, it won't matter. (You can see a good report of all the operators here.)
<cfset results = []>
The last bit of code before we actually begin to search is to create the array that will store our results. Ok - so everything so far was setup - now let's look at the actual search:
<cfloop condition="not done">
<cfhttp url="http://search.twitter.com/search.json?page=#page#&rpp=#max#&q=#searchURL#" result="result">
<cfif result.responseheader.status_code is "200">
<cfset content = result.fileContent.toString()>
<cfset data = deserializeJSON(content)>
<cfloop index="item" array="#data.results#">
<cfset arrayAppend(results, item)>
</cfloop>
<cfif structKeyExists(data, "next_page")>
<cfset page++>
<cfif page gt maxRequests>
<cfset maxFlag = true>
<cfset done = true>
</cfif>
<cfelse>
<cfset done = true>
</cfif>
<cfelse>
<cfset errorFlag = true>
<cfset done = true>
</cfif>
</cfloop>
Ok, let me describe this line by line. The loop will continue until the done variable is true. In each iteration I use cfhttp to hit Twitter. Notice that I ask for JSON back, pass in both page and max, and pass in my search query.
If the result status is 200, it should be good. I get the content and deserialize the JSON. I loop through each result and simply append it to the global results array. If the result JSON contains a next_page value, then more data exists. I do a check first though to see that I've not made too many requests. Lastly, I've got an ELSE block for times when the status wasn't 200. I could add additional logging here, but for now I just use the simple error flag.
Now that we have results, let's begin the display portion:
<!--- prepare result --->
<cfsavecontent variable="report">
<cfoutput>
<style>
h2, p, .twit_date { font-family: Verdana, Geneva, Arial, Helvetica, sans-serif; }
.twit_date { font-size: 10px; }
.twit_odd {
padding: 10px;
}
.twit_even {
padding: 10px;
background-color: ##f0f0f0;
}
</style>
I've begun my display with a cfsavecontent. The reason for this is that I considered also generating a PDF report as well. I didn't end up doing it, but since I'll have my report in a nice variable, I'll be able to do just about anything with it. I then put on my designer hat (it has stars on it) and whipped up some simple CSS I'll use later. Please feel free to send suggestions on nicer CSS.
<h2>Twitter Search Results</h2>
<p>
The following report was generated for the search term(s): #search#.<br/>
It contains matches found from <b>#dateFormat(yesterday,"mmmm dd, yyyy")#</b> to now.<br/>
A total of <b>#arrayLen(results)#</b> result(s) were found.<br/>
<cfif maxFlag><b>Note: The maximumum number of results were found. More may be available.</b><br/></cfif>
<cfif errorFlag><b>Note: An error ocurred during the report.</b><br/></cfif>
</p>
Next up is a simple header. I report on the search term, the date, number of results, and on my flags.
<cfloop index="x" from="1" to="#arrayLen(results)#">
<cfset twit = results[x]>
<cfif x mod 2 is 0>
<cfset class = "twit_even">
<cfelse>
<cfset class = "twit_odd">
</cfif>
<!--- massage date a bit to remove +XXXX --->
<cfset twitdate = twit.created_at>
<cfset twitdate = listDeleteAt(twitdate, listLen(twitdate, " "), " ")>
<p class="#class#">
<img src="#twit.profile_image_url#" align="left">
<a href="http://twitter.com/#twit.from_user#">#twit.from_user#</a> #twit.text#<br/>
<span class="twit_date">#twitdate#</span>
<br clear="left">
</p>
</cfloop>
Now I loop over each Twit. Twitter reports a variety of fields for each result. I decided to only care about the time, the user (and his or her profile image), and the text. Please keep in mind though that there is even more information in the results. This is what I decided was important. The display is rather simple. Profile picture to the left, name and text on top, and the formatted date below it. (FYI: Notice the x mod 2 if clause there? I actually had the ColdFusion 9 ternary clause first and it was a lot slimmer. I know I could switch it to IIF but I hate that function.)
</cfoutput>
</cfsavecontent>
<cfoutput>#report#</cfoutput>
The final bits simply close up our tags and then output to screen. So I did lie a bit - I don't actually email the report, but as you can imagine, that would take about two seconds. I'd just wrap the report result in cfmail tags. I've got a few ideas on how to make this report even slicker. That will be in the next entry. So - is this useful? I could imagine this being a great way for a business to automate monitoring of their name and products.
You can download the full bits below.
Archived Comments
This is pretty much how I collect the results for http://www.tweettrail.com one thing to watch out for is looping till done. If twitter has an issue some some reason it will create infinite loops ( yes it happened to me ). I created some max error counts to break out of the loop if case this occurs.
I believe my max requests will take care of that, won't it?
Oh yeah I should never doubt that the Jedi hasn't thought ahead :)
Because I found that about 1 in 50ish http requests fail I continue to process ( ie try again ) when I don't get a 200 response but if I get more then X errors per search term I stop the search and dropout as something might be wrong at twitter.
Yeah, mine right now says: If bad, just stop. Although technically if it got N results first those still work. I kinda figured that since it was a search report, and run daily, it didn't need to be anal about retries.
Do you think it is worthwhile maybe to say, "hey, I failed, but lets try again a few times." I could, on failure, NOT add to HTTP requests, BUT keep a counter of errors, and stop at 3 or some such.
Well I found I had to do that or I'd get results that were really inaccurate and I cache the results for 5 hours so I really didn't want bad data for such a long time. Perhaps because lots of http:// requests were coming from the same IP/account or that my VPS network caused it to error occasionally who knows. 1 in 50 was a loose guess and it occurred with terms that were really common ie jQuery and I would have to do 50-100+ http requests to get all the data.
Hmm. Well maybe at minimum I should log the errors to get a bit more info when things go wrong. I need to deploy this to my own server, but I'm waiting till part 2 (later this week).
I tested this report, it looks great. I’ve been wanting to list users tweets on my website based on a keyword for a long time.
These are some suggestions I’d like to see in future reports if possible:
1. Keywords filtering: I was wondering if there is a way we can filter out inappropriate words within the users’ tweets such as “sex”, the F-word etc.. Basically, if a tweet has an inappropriate tweet, ignore it, do not display it..
2. Pages Navigation bar at the bottom of the page so we can page maybe 20 pages at a time
3. Would the report refresh itself automatically? Or the user has to refresh the page in order to see the new recent tweets.
4. Maybe a search field so a user will have the option to change the search criteria
I think I am asking too much, lets hope :-)
Thank you. This is a very good start..
-AJ
Items 2, 3, and 4 really only work if the report isn't scheduled base. Remember the main idea here was to run this at midnight via a scheduled task. You could use it as a user-runnable page though. To handle 2, you would want to cache the results though. 3 could be done with Ajax. 4 would also work if you cached the results.
1) is something that I'm going to support in part 2 of this article - kinda. You will see.
Thank you Ray! I look forward to seeing Part-2 of this report..
On a side note, I was wondering if the data we are retrieving from Twitter can be imported into a database such as MS Access which will help me a great deal in formatting my own Reports and tackling my questions in my prior thread, and not to mention that my Reports will always work regardless if Twitter is down since I'll be pulling data from my own database, which should also speed up the retrieval process..
Sorry for all these questions, I am just throwing ideas. Hope you don’t mind :)
-AJ
You absolutely could import it into a DB, and then do trending, which will mesh will with part 2. (I'm thinking maybe to be posted Wednesday.)
By the way, how do I change my Avatar in this Blog? I didn't see an option to upload my own when I post my comments..
Thx.
-AJ
Its a Gravatar. Go to Gravatar.com and you can upload a picture there. Many services make use of it.
Thanks as always Ray
Awesome post. And a very cool idea!
Bah, this is nothing! Wait to the next one! ;) (Which I need to make time for @ lunch.)
Even more gooder...
Name your class .twit_0 and .twit_1
class="twit_#x mod 2#"
@Emmet: Yeah, I know that, I just don't like the ugly CSS names. Picky - I know. ;)
Ha ha, awesome - can't wait.
Bah... ugly. You know I have to comment on your padawan designer skills any way I can. ;)
Yeah I'm actually surprised you let me get by with that CSS snobbishness. You know my design skills well. ;)
Warning. There is a bug in this code. I lost the SINCE attribute when building up the link. I've fixed this in my new report code (blog post at lunch), but to fix it, change these lines:
<cfset yesterday = dateAdd("d", -1, now())>
<cfset searchURL = search & " since:#dateFormat(yesterday,'yyyy-mm-dd')#">
<cfset searchURL = urlEncodedFormat(searchURL)>
Specifically the 3rd line was urlEncoding search, not searchURL. This is NOT fixed in the zip, but my next blog entry will include both reports and will have fixed code in both.
Greetings -
I haven't seen any updates recently, just wondering if there are more features coming.. Thx.
-AJ
Did you not see part 2?
http://www.coldfusionjedi.c...
Part 2 involved robots, snakes, and giant armadillos. Really. Well, maybe.
Raymond is a great and wise human.because he share information.thank you raymond.