Reader Paul pinged me with the following:
Our main site uses cfcontent to fetch files, since we've decided to store the more sensitive ones outside of the web root. This has been working famously so far, but our client use has really ballooned, as has their appetite for large files (we're talking pdfs upwards of 70Mbs).
The result as you may assume is much longer download times. The network activity/bandwidth not a problem for our host. The real issue seems to be the fact it turns these downloads into open threads. During high-traffic times, it's not uncommon to see all 20 consumed, and worse, jrun overheated at 25% and hanging there.
Is there a better way to handle this kind of situation? We're going to be moving to a beefier machine which may help somewhat, but even with huge pipes, users can only pull in a file so fast, so even with the possibility of more threads, logjams could persist.
I only have one personal experience with this myself. A few years ago I did a job for a company that needed to serve up large files. Instead of using cfcontent I used symlinks (to be honest, 'symlink' may not be the proper technical term - think shortcut) to create an alias of the file under web root. These aliases were removed on a timed basis. But for a short time, in theory, the files were accessible if you knew the link. I helped reduce that risk by copying to a folder under web root that was created with a UUID.
The other option I've heard mentioned recently is mod_xsendfil, an Apache-only mod that allows you to simply output a header that allows Apache to serve up a file outside of web root.
Does anyone have any better advice to offer Paul?
Archived Comments
I once worked on a project where I was building a coldfusion based cheap image resizer. Think amazon: http://ecx.images-amazon.co... vs http://ecx.images-amazon.co...
The problem was cfcontent creates tons of overhead for something that should be very fast. So I rewrote this to use native Java. Now granted, much of the overhead was with CFImage vs Java image libraries. The final bit that made it just that much faster was removing cfcontent for something using java io buffers. I don't have access to my code right now, it'd require a series of blog posts to explain.
I don't have performance numbers, but it got down to something you'd expect from native disk io (milliseconds) instead of hundreds of milliseconds. It also has the benefit of not loading the entire file in memory before flushing it. You can start with something Ben Nadel put up: http://www.bennadel.com/blo...
I'm currently using ProFlashDownload on a project. I copy all of the files to be downloaded to a publicly accessible sub-directory (named with a hash) and then launch this app to allow download. (NOTE: I don't link directly to the files... I use a session-restricted CF redirection script.)
http://www.cftagstore.com/t...
I've added some additional features like allowing the customer to zip all of the files prior to downloading. The Flash widget performs a background call to a server-side API and the files are deleted after being successfully downloaded. We added a scheduled process to clean up sub-directories when all the files aren't downloaded.
Another method I've used is FTP. This takes all of the load off of the application & web servers and allows for multiple and/or throttled connections. There are some programs available that will allow you to use a database for authentication... this allows you to create accounts on-the-fly and disable access after x hours or so much bandwidth has occurred. I haven't tried this program yet, but I highly recommend their other products:
http://www.surgemail.com/su...
we are doing something similar to Ray, buy using rewriter to create "token"-URLs to files outside the webroot for a certain period of time. This is particular useful because the (large) file downloads can benefit from resumed downloads as well.
Does anyone know if Amazon S3 or Microsoft Azure can generate temp links that expire once content has been downloaded?
You can easily generate a timed link, but it can't be "goes away when downloaded." But that's a good option for sure.
Why not just alias a directory in apache and don't allow indexing?
Because once you have the URL you would always have access.
Interesting. Don't have that problem myself, but I would think that your solution of a symlink and maybe adding a cron or cf scheduled task to clean them up might be the easiest.
I used something from:
http://www.adrianwalker.org...
Serve up some 200+mb files from a non-web-accessible directory, after I make sure users are allowed to access that file. Streaming rather than opening the file entirely seemed to work much better mem usage wise on the server.
I push mimetype using setContentType, and a setHeader("Content-Disposition"..) since IE likes attachment, while other browsers seem to prefer inline.
Works for us anyway.
Try to avoid cfcontent, stream the content to help keep the memory impact lower as it will flush to the request buffer as it reads the file.
Could something like this be done to help improve security on progressive download based videos? I understand you would still be able to capture the content via the symlink but it could throw a little twist into it. Maybe I'm just being hopeful! :)
The best solution I've seen is using mod_xsendfile. ColdFusion can then simply set a header (x-sendfile) that contains the full path to the download. The module would then catch the header, remove it (we don't want the client seeing paths) and then have Apache send the file instead of tying up ColdFusion.
There is a port of this available for IIS too from Helicon (APE) but it only works on the newer versions.
Other alternatives I've looked at include using URL rewrite rules based on Cookie values set by CF or using symlinks (*nix), junctions (Windows) or dynamic virtual directories (IIS) that you expire and remove with a schedule task.
I think this is one of those problems that web servers have to solve and not leave it up to our server side technologies. Apache should bring in mod_xsendfile as an official module and IIS should support it natively. It'd save everyone a lot of time and effort.
Bad news: Helicon APE does NOT work with CF. http://www.helicontech.com/...