I'm was looking at a page in the Yahoo Developer network when I ran across this snippet:

Documentation for Yahoo Shopping Web Services

When calling the shopping APIs, you must set the HTTP user agent to a valid web browser string. Bot and spider strings are not valid. The user agent can be set to some default value - it does not have to be changed based on the user's browser. Some examples are as follows:

  • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE)
  • Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20060111 Firefox/ (for FireFox)

If a valid user agent is not set, no results will be returned. A 400 HTTP status code will be returned, along with an error message indicating that the user agent is not valid.

So... you build a service API meant to be called remotely via some computer program. Yet at the same time, you demand that we act like a real browser and change our user agent string. I'm trying to wrap my head around why Yahoo would do this. If they want to prevent someone from scraping all of their data - they already have a rate limit in place that would stop that. This seems like a pointless thing to ask.

On a whim I gave it a quick try in ColdFusion. The default useragent sent by ColdFusion is blocked. I then tried a few random useragents (not the ones they documented) and I was able to guess one (Opera, even Opera Baby, but not Baby Opera) quickly enough.

So again - can someone figure a reason for this?