Node lessons learned painfully (or why my site crashed)

This post is more than 2 years old.

Last week I launched JavaScriptCookbook. The code behind the site makes use of Express and Node.js. This is the first site I've ever built using Node and frankly - I'm really unsure of... well everything. It was fairly trivial to get the site up and running (Express is just an incredible framework), but there is still quite a bit to the ... ecosystem (not sure if that is the right word) that is a bit unclear to me.

I launched the site on Thursday. After the initial announcement I got some tweets from rather important accounts that led to a big spike in traffic. According to Google Analytics the site had - at one point - 140 simultaneous users. That was awesome. But the site never crashed or slowed down.

However - when I woke up Friday, I saw some tweets saying the site was down. I confirmed this. I hopped over to AppFog, restarted the site, and everything was kosher. I then tried to figure out what went wrong. AppFog has a command line tool and it provides access to crash logs. When I ran this though nothing came up. I saw my console messages but nothing more. I really had no idea what happened. I opened a ticket with AppFog and just carried on.

Yesterday I was testing something with the site when I realized something crucial. If your Node app encounters a bug, its aborts. As in - it dies. This is not new to me at all. I've been playing with Node for a while. But when running a Node app as a site, well, that's pretty important to remember. I'm so used to the ColdFusion model (which applies to PHP and other platforms) of a constant server that simply parses code on a request by request basis.

For folks curious, my bug was rather simple. The code that handled loading a page by the SES token (what you see in the URL) was not handling cases where the token didn't match anything. I didn't notice this until I deleted an entry and reloaded the page to ensure it was gone. The second I did - bam - the app died.

I need to do a bit more research into how I can handle this in the future. There is an interesting article about a tool called Forever. It attempts to keep a Node app running, well, forever. But I don't think I could use that on a hosted solution like AppFog. The other thing I want to investigate is error handling in general. In a ColdFusion app, I'd have logic to dump the exception in development and mail/log it in production. Express has ways to handle dev versus production easily enough, I just haven't actually added it to my application yet.

Finally - I must say I'm rather disappointed by the support at AppFog. Here's my support ticket still waiting for someone to even look at it after three days:

As a paying customer, this does not make me happy. Of course, I know I'm at the cheapest tier, but I'd hope for some type of response within a day at least.

Edit: Last night, I finally did get a response from AppFog. I'm not happy that it took this long, but the response made sense to me though. Basically, next time my app crashes I need to check the logs via the CLI before I restart it. Here is their response in full:

Hi Raymond, we're still working on our diagnostic tooling. When you restart the app, it clears out the crashes and crashlogs information. If this happens again, you'll want to grab those logs before restarting the app. We would also recommend hooking your app up to our LogEntries addon: https://docs.appfog.com/add-ons/logentries
Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Gareth Arch posted on 6/24/2013 at 6:26 PM

I just looked over Node last week (well, I've see hundreds/thousands of posts, but finally dove in) and saw the note about it crashing when it encounters an error due to the whole "Event cycle" thing. That seemed a rather big sticking point with web servers that you don't want to go down completely, but encounter an error, handle it somewhat gracefully (for that user), then keep chugging along, rather than crashing for everyone trying to access it. I like the ease of using Node (and express once I saw it explained at Parse), and hope that there are some good solutions to this issue. I'll be keep an eye on your posts to see how others are handling these types of situations, especially after, as you said, how simply it is handled in CF.

Comment 2 by Aaron West posted on 6/24/2013 at 11:33 PM

Hey Ray, good to see you are enjoying Node and having some success with it. I wanted to offer some basic hints on things I've found useful with Node. First, most of my experience with Node comes from running a Web Services system (in production) for a little over a year. I struggled for a while - with my team - to get the app stable over last summer but it's now responding to more than 1 billion HTTP requests per month.

One initial thing we found helpful was writing our own logging mechanisms in Node. It doesn't have to be super fancy, but given how Node can "just die," it helps to write to log files what is happening as it happens. We write console type logs, file processing logs, and basic app-caught logs to different files on the Linux filesystem.

There's also a nifty tool called DTrace which can help debug your app from CLI. It doesn't run on all operating systems so YMMV.

Finally, we use Forever too. It's a nice tool to help kick up Node processes, monitor those processes, and kill/stop various Node services you may be running. We integrated some of our logging features into Forever so there isn't Forever logs + our home grown logs. This proved pretty useful. On restarting your app, Forever can certainly do this for you. You may want to use the "-m" option to ensure Forever doesn't restart your app, well forever, in the event you write an infinite loop in JavaScript/Node. It's also useful to ensure Forever starts on system boot and thus starts your app. Since no real init scripts come with Forever most people roll there own. We did this too with shell scripts and /etc/rc.local.

Comment 3 by Rob Dudley posted on 6/25/2013 at 12:55 PM

I've found similar issues with Node and to be honest it's at the core of my fear of using it in true production apps. Even with something like Express there are so many edge cases that could cause a critical crash that I'm always worried about missing something.

I suppose that this could be overcome by massive amounts of exception catching or of course by writing code that doesn't have any bugs.

That said I believe App Fog will (should) auto restart crashed apps up to a "flap threshold" of about 5 times in a period. And I know you can install forever-monitor as a Node Module which would be pulled in via your package.json and should require minimal work to update your app.js to support it.

Comment 4 by Raymond Camden posted on 6/25/2013 at 5:09 PM

@Aaron: Fascinating stuff, thank you for sharing that.
@Rob: So... forever can be part of my app itself - not a command line thing? Do you have an example?

Btw - I do have an update on AppFog. Since people seem to skip reading comments I'm going to add it as an edit to the above text. In 5, give it a read.

Comment 5 by Rob Dudley posted on 6/25/2013 at 5:35 PM

Yup. there is a modularized (not sure that's a word) version of Forever:

https://github.com/nodejits...

From the above site you'd make your server.js a wrapper around the forever monitor module as per the docs from the above reop. You can then move your core app code into app.js (or whatever you want to call it).

I've not used this on AppFog officially but have used it on Nodester ... which I'm pretty sure was bought out by AppFog and used the same platform.

Oh and +1 on hooking some logging in there. It's good to know what caused a failure so you can patch and over time reduce the restarts.

Final another quick gotcha that you may or may not run into is memory leakage. Early Node.js was terrible for it and though it's much improved it's worth keeping an eye on when your apps first hit production.

There was a great post (albeit a little dated now) on the MDN last Nov as part of their Node Holiday Season: https://hacks.mozilla.org/2...

Comment 6 by Adam Tuttle posted on 6/25/2013 at 8:38 PM

Nodejitsu automatically uses Forever under the covers to keep your app up even after crashes like this one. I've also been pretty pleased with their support, though I mainly interacted with them via IRC (#nodejitsu on freenode) instead of the web console.

Comment 7 by David Salter posted on 6/26/2013 at 9:43 PM

Hi Ray, this was an interesting read as I've just started looking at AppFog. I'm now going to look at forever as that looks like a very useful tool.

On a side note, I've just managed to crash a Node.js app of mine on AppFog and the app seems to have restarted itself (at least its running again now without manual intervention). I don't know if this is because of changes to AppFog since your original post, or because I got the log files after the crash (thanks for the tip).

Comment 8 by Raymond Camden posted on 6/26/2013 at 11:22 PM

Yeah, after posting this I heard from others that AppFog should restart your app automatically. I don't know what that means for my app since it was down for a few hours. (Technically it may have been less then that - don't know the exact metrics.)

All in all - even my own "familiar" server, this one, doesn't have 100% uptime and I don't necessarily know 100% of what goes on here. I'm ok with being removed a bit from the nitty gritty details I guess. It bugs me - but I may have to just live with it.

Comment 9 by emaV posted on 6/27/2013 at 6:37 PM

Just found this tool... http://devo.ps/blog/2013/06...

Comment 10 by Corey Butler posted on 6/29/2013 at 12:58 AM

First off, you can monitor your app for uncaught exceptions so it won't "just die". How you handle that is really up to you... it could fire an email/SMS off, or whatever you like, but if only one part of the app is failing, it doesn't have to crash the whole thing.

I have taken a liking to most of the 12-Factor app approach (http://www.12factor.net) for Node. I write my node apps as a series of processes, which makes them more robust. If something fails, I deal with the failing piece without toying with the rest of the system.

I initially used Forever to run my processes, but I was pretty frustrated with the fact it recreates a lot of what the OS does for you. A few months ago, I started creating a series of daemon utilities called node-windows, node-mac, and node-linux (http://github.com/coreybutler, MIT). These are really designed more for those running their own server though.

Comment 11 by Raymond Camden posted on 6/29/2013 at 1:02 AM

Corey: Interesting... thanks - and thank you to everyone for the suggestions.

Comment 12 by Johan posted on 7/7/2013 at 4:43 AM

Ray - further to feedback I sent on Google+ I also came across this which may be helpful:

http://shapeshed.com/uncaug...

Comment 13 by Raymond Camden posted on 7/7/2013 at 4:57 AM

Interesting. I need to read this again - maybe twice more - and think about what I'm going to do with my site. This sentence is probably the most important for people, like me, coming from a ColdFusion or PHP background: "Node.js does not separate your application from the server."

Comment 14 by Tighe Lory posted on 7/8/2013 at 6:12 PM

This article blows my mind that this kind of error handling wasn't the first thing build into Node.js! Now I have never used it, so I can't judge it, but at the same time I have a hard time understanding why it would be designed in such a flawed way.

Is there some logic to this die on error behavior that is beneficial?