Announcement Unscheduled Maintenance

Discussion in 'Club Announcements & News' started by DeviateDefiant, Sunday 13th Apr, 2014.

  1. DeviateDefiant Co-Founder Staff Team

    United Kingdom Leo Northants
    9,206
    2,977
    3
    We'll be having a short period of downtime, likely a couple of hours, during the course of the day. It requires taking the board offline and stopping new content being posted. Everything should still be accessible in a "read only" capacity but I may have to completely complete take down the site at points while I'm completing the work.

    Our main database and server software is being swapped out for different packages, while this shouldn't be a hard task, the new packages are quite unfamiliar to me so we'll see just how seamlessly I can do it all.

    All for improvement in the long-run. Apologies for any inconvenience.
     
    Loading...
  2. DeviateDefiant Co-Founder Staff Team

    United Kingdom Leo Northants
    9,206
    2,977
    3
    We've had a chaotic couple of days, suffice to say we're back and sorry about the downtime. Issues with our server, and then subsequently our server provider's backend led to a lot of delays getting our server rebuilt and the site back up and running correctly. We don't plan on any more downtime, we're testing the configuration now on the new server and all being well that's the end of it. It's a pain in the arse for everyone.

    We've lost a few posts, that's anything made from when the initial announcement was posted, and subsequently coming back online for a few hours on Sunday evening - for reasons out of our hands, that server was completely lost due to the issues mentioned.

    We're back, we're kicking.

    I will go through and list any changes in the Site Updates Thread shortly.
     
    Loading...
  3. Nels Moderator Staff Team

    I can imagine just how stressful the last couple of days have been for you :Rant:

    Thanks for getting HK back and running :GoodJob:

    JD
     
    Loading...
  4. sunbeem Club Member ★ ☆ ☆ ☆ ☆

    I can sympathise, having just had my email hacked, and an excruciatingly tricky time re-establishing it. Welcome back.

    Sunbeem.
     
  5. Zoot Valued Contributor ★ ★ ★ ☆ ☆

    Scotland Ron Leeds
    368
    210
    Many thanks for your efforts to keep the site updated for the benefit of us, the members. Great work!! It's much appreciated.
     
    Loading...
  6. Doc Expert Advisor ★ ★ ★ ★ ★

    Matt Peterborough
    1,157
    202
    1
    Keep up the good work @DeviateDefiant it'll all be worth it in the end. :Thumbup:
     
    Loading...
  7. SpeedyGee Administrator Staff Team

    England Speedy Birmingham
    14,999
    5,595
    4
    You guys won't believe the mammoth number of hours that @DeviateDefiant puts into HK.

    You're efforts are greatly appreciated by all DD :Hey:
     
    Loading...
    Ichiban likes this.
  8. Ichiban Founder Staff Team

    England CJ Leeds
    30,177
    6,406
    516
    Yup have to concur, with me and Leo breathing down each other necks it was a eventful outage that putting it mildly.. :Baseballbat::telloff: joking a side

    WE have put a in place a robust back-end solution which isn't cheap with snapshots\platespin solution which gives us the ability to come back online within minutes and recover to a point easily on demand. We now have two off-site daily backup in different locations so long term availability cum contingency planning are adequately in place for any eventuality.

    I will be testing our backup weekly on a test site to ensure our backups work.. not suggesting they don't work.. they do as we recovered the site today from the same backups! but we need to get testing and have a full blown disaster recovery piece in place.

    So things will improve and hopefully we don't see this again and we have a more of professional approach to things when things don't go to plan.

    The site now uses content distribution network (CDN) which will improve response times.
     
    Loading...
  9. DeviateDefiant Co-Founder Staff Team

    United Kingdom Leo Northants
    9,206
    2,977
    3
    Thanks for the kind words guys, I'm shattered. As CJ touched upon I've been using this opportunity to revamp the entire infrastructure for the sites I help administer across multiple servers/datacentres. It's been a hell of a ride.

    Minutes might be slightly optimistic, but we have the ability now to deal with server issues by deploying a new server, configuring it with a copy of the live server while it's still running, then taking an image of the new fixed server and using it to replace the live one. Takes as long as the new server images takes to write to disk. Laymen's terms it's like an engine swap without having to screw everything in.

    In contrast, this time around we had to configure a new server, drop in the backup, make it live and wait for servers around the world to update with the new location before you guys could view it. To make matters worse we had an outage from the server provider/datacentre going on at the same time. When it rains eh? Anyway, we're sorted now.

    I'll have you know my backups are almost one-click replace when the right software packages are used :Laughing:

    I'll cover the full changes shortly in the updates thread.
     
    Loading...
    Chunkylover53 likes this.
  10. Ichiban Founder Staff Team

    England CJ Leeds
    30,177
    6,406
    516
    20 minutes tops fella for a snapshot recovery then merge DB.?
     
    Loading...
  11. DeviateDefiant Co-Founder Staff Team

    United Kingdom Leo Northants
    9,206
    2,977
    3
    It can take around 20 minutes to write the server image itself at our current size, if we're replacing a running server with an image we've made already, the downtime would be 30 minutes tops.

    If the site/server was corrupted completely and we needed to regenerate from backups, you'd have to factor in a little more time to transfer in the backup data, extract a couple of large archives, then import the latest database. Probably an hour tops.

    If the server failed on a hardware level and we could no longer access the IP, and therefore had to redirect to a new server. We've got DNS propagation to deal which calls for waiting it out for a few hours and some people not being able to access us for upto 24 hours. However, I forgot to mention on the phone earlier I have a "proof of concept" to run by you that worth be might investing in that would negate this completely. Basically new server allocated if our server goes bye-bye, without having to run the costs of having another server actually running at all.

    Anyway, all this is assuming it's not while I'm asleep :Laughing:
     
    Loading...