Website crashes

Nope

Update 12/14 – We pushed a big update that should work around the problem. The servers should no longer crash – we’ll be monitoring it to make sure the fix worked, so cross your fingers. And thanks for your patience.

Update: 12/5 – We had a workaround that keeps the bug from happening all but once every few days. However, when the wow servers went down for maintenance this morning and traffic piled up, it put the stress on that bug in one of the MS libraries that causes problems when traffic is incredibly high. So it crashed again, and again, this morning.  Meanwhile, we are doing work to permanently work around this bug (by writing our own library), because we don’t know how long it will take for Microsoft to fix it. But they are aware of the problem and claim to be working on it.

Why are the crashes happening?

In the update to the new Microsoft framework that we are using, there is an inefficiency in one of the libraries that is causing the servers to lock up and restart. As a non-dev, here is a analogy I like to use, which probably makes dev’s shake their head:

We use pre-built libraries in the framework. I like to think of these ‘libraries’ like a kit for a model airplane. Instead of cutting all of the pieces out yourself, you get a kit, then you just have to assemble it.

One of those libraries that handles some of the database functions is making requests line up, one at a time. What you really want is a whole bunch of people to be able to access the database at the same time. This affects sites with really high database usage, which most sites don’t have at the level we do. We did some quick workarounds to minimize the impact, and that helped for a while.

Now that Antorus is out, people are dying to get new items (ha, really, we are all dying a lot in game). And when you get new items, you ask your favorite Robot to rank them. That means traffic is higher and the crashes are happening more frequently, 2-3x a day. It’s frustrating for us, and definitely for you.

How long until this is fixed?

Since we don’t know how fast Microsoft is going to fix this problem, we are writing our own library for this that doesn’t have a bug. Then we can avoid this problem altogether. We’re working very loooong hours to finish this and really appreciate your patience (and understand your frustration – we use the site for our own characters allll the time!)

This is how we feel right now:

I know this sucks. I hope it hasn’t ruined your day or made you really sad. But in an effort to boost your spirits while we work on fixing it… I thought I’d share one of my favorite annual photo contests:  Comedy Wildlife.

 

1 Comment

  1. 🙂 welcome to my day job. Glad to help if needed..

    How often does it reboot? Is it regular? Anything showing in event logs?
    Windows Update set to auto-update? Failed update may cause server to reboot randomly as it attempts to update file. Windows Defender updates fairly regularly.
    Performance monitor show anything? Load exceeding thresholds? Could cause reboot to recover resources..
    Assuming AV on server, is it excluding db? Had symantec cause lock-up on server because it was attempting to scan application temp folders.

Leave a Reply

Your email address will not be published.

*

© 2017 Mr. Robot's Blog

Theme by Anders NorenUp ↑