Pocket-Monkey(tm) Discussion Forum

[Home] | [News & Announcements] | Why we were offline[Forum Rules] [Login]

Why we were offline

Thread #1980 - Messages: 8   (some may be hidden)
 Why we were offline Message #21455    Replies: 4
posted by T.J. (T.J. Crowder) on 10/27 at 14:29
Hi folks,

I thought I'd take a moment to talk about why the site was offline for so long.

The short version: The hosting company we rent our server from turned out to be both incompetent and unresponsive. So I moved Pocket-Monkey to a different hosting company.

The long and probably boring version:

The hosting company told us they needed to physically move our server in the data center, and said for a seven-hour window of time we had to have our server effectively down. (If you're not familiar with web hosting, seven hours is a big outage to ask of your customers.) So I set us up for that outage and went to bed (as the work was being done overnight).

When I got the notification the next morning that they were done moving our server and it was all up-and-running again, it wasn't true. For several hours, the server would appear and disappear, and so of course I couldn't rely on it. I had to wait until I'd seen be stable for a bit and gotten confirmation from them that they were done doing whatever it was that was making it unstable. I'm still waiting, they've never come back to me with an explanation for the instability or a confirmation that it's definitely over now. (It's been more than two full days -- an eon in hosting terms.) It seems to be over, but...

Meanwhile, during one of the times the server was up, it notified me that one of its hard drives had failed. This happens in servers, hard drives take a real beating and have to be replaced periodically (although this was the third drive failure in seven months with this company -- unusually bad). Our system has two drives in a "mirror" and can run happily with just one of them, although of course at that point if that one remaining drive has a problem, we're in trouble. (We have off-server backups, of course, but still...) A failed drive doesn't have to be a big deal. We pay a premium for a server with "hot-swap" capability, which means that the failed drive can be replaced while the server is running. So I raised a ticket asking them to swap-out the drive.

That was 54 hours ago. (They have a four-hour replacement guarantee. I probably get a free T-shirt or something as compensation.) And there's some non-critical (and very large) data on that drive that I do want, but for technical reasons I don't to grab that data until the drive is replaced (just out of caution, the nature of it -- millions of little files -- would really exercise the drive), so let's just say having no kind of meaningful response on getting the drive replaced is Not Good.

I've been unhappy with this company for a while now and in fact had already identified a different company to move to, so on that first day after a few hours of complete unresponsiveness, I decided I'd better go ahead with the move. Doing that took a while, not least because of course I have my day job and clients to keep happy. :-) But the end result is that we're up and running in our new home, and if I need to do this again in a hurry (who knows, the new company could be just as bad as the old), it should be a lot faster. I decided to take the time to do it really right, without relying on pre-built packages for things since they seem to change where they put stuff every couple of months, making every new deployment a custom affair. So I now have reproducible "from source" steps for deploying Pocket-Monkey to a new server in just a couple of hours.

Some of you may be wondering if I should move Pocket-Monkey to a "cloud" provider. I may well do that, although there are reasons both for and against (not least cost control), but if I do, I won't move it to one "cloud" provider, because it's been proven now that you can't trust one provider's cloud and in this modern world, there's no reason Pocket-Monkey can't be running simultaneously in clouds from two different providers, either of which can take over if the other cloud has an outage like the one that hit Amazon's hosting stuff a while back. Adopting cloud technologies (which I do think is the way forward if I get some other stuff done and PM starts growing again) requires some significant changes to how Pocket-Monkey works under-the-hood, so it has to go on the list and be prioritized relative to other things I want to do with the site. Cloud is not the only way to have resiliency, esp. if in the event of an issue I can move nimbly to swap us into a different data center.

Anyway, long story that hopefully has a happy ending.

Enjoy your games!
--
T.J. Crowder
First Primate
Pocket-Monkey.com
 
 Re: Why we were offline Message #21457    Replies: 1
posted by niloc (Colin Palmer) on 10/27 at 16:30
Very well done for getting it up and running so fast.

You really do a fantastic job running this site.
   
 Re: Why we were offline Message #21465    Replies: 1
posted by Redbee (Lori) on 10/27 at 17:13
Amen, I 2nd that~!!!! Great job TJ
     
 Re: Why we were offline Message #21467    Replies: 0
posted by GeorgiasGrandad (Dave Laine) on 10/27 at 17:22
TJ is the greatest. Glad you're back
 
 Re: Why we were offline Message #21470    Replies: 0
posted by AM2010 (AM2010) on 10/27 at 17:49
Again- Well done! With a great sense of humor; along with a huge sense of responsibility as well!
Those who served you last- Have lost a great client indeed! :)
 
 Re: Why we were offline Message #21476    Replies: 0
posted by simply me (Janeen) on 10/27 at 21:59
You are awesome and so is the site. I would wait forever (well, slightly exaggerated) for you to do whatever is needed to be done to be back up and running. Pocket Monkey is the only site I know that actually cares about the person it produces its product for. Credit given when credit is due :)
 
 Re: Why we were offline Message #21490    Replies: 1
posted by Sesio (Bert) on 10/28 at 23:23
Really tremendous work you do, only for us to be able to play. And all that out of a hobby!

I'm a man, I love women...but I say it anyway: I LOVE YOU T.J.

;)
   
 Re: Why we were offline Message #21494    Replies: 0
posted by T.J. (T.J. Crowder) on 10/29 at 08:41
LOL Thanks!!

-- T.J. :-)

Forum software by Crowder Software
Pocket-Monkey and the Pocket-Monkey logo are trademarks of T.J. Crowder and Jock Murphy. All other trademarks are the property of their respective owners.