SharePoint DR planning some practical options to consider
One part of SharePoint deployments that sometimes tend to fall on the way side is planning and providing a viable and practical option for recovering data in the event of a server failure. In most cases when a solution architecture is proposed with SharePoint you need to consider the following options in order to formulate a viable and practical backup and recovery option. Not only should you plan these it should be tested out and the steps required and the times taken to do an actual recovery noted so that your team is familiar with what to expect should the need arise.
Depending on the type of solution being deployed and the impact it may potentially have on the business you may need to add availability as part of your solution architecture. This post however will showcase how you can successfully create and test a mock recovery plan for a small server farm with 2 web front end servers and a dedicated SQL server. As part of the recovery plan you need to typically consider content recovery (Usually via the built in Recycle Bin), site recovery (Accidental site deletions by site administrators) and the focus of this post disaster recovery. Disaster recovery in this context is when you lose one of the content databases or all of the databases related to the SharePoint deployment.
For example assuming that your solution includes regular SQL server backups of your content databases. You can log ship the databases to a secondary server or a network location. Log shipping is one of the practical and less complex options that you should consider to be part of your deployment. TechNet has detailed documentation on how to setup log shipping on your SQL server and what you need to consider. TechNet > Configuring Log Shipping (SQL Server 2005)
This post uses a simple and practical DR plan that can be implemented to provide basic DR capability. The emphasis is purely on the recovery and not on high availability. High availability usually means complexity and higher cost. Depending on your deployment you should provide these options and the pros and cons. My view is that simple DR is better than no DR and there is simply no excuse for not setting up such a setup from day one of your deployment.
The most important databases in your farm are the content databases of your SharePoint deployment. In this post I will highlight the steps needed to recover your content database(s). First of all let’s establish the overall process of the ‘mock’ fail over scenario. Remember this is just a simulation to ensure that all the required steps are noted and documented and followed through. In a real world scenario the steps outlined will result in a period of time that your users will not be able to access the data. As I said before this is not about high availability but a solution to recover data. The idea of the ‘mock’ fail over is to establish how long this process will take and provide a realistic estimate of the downtime. But most importantly this prepares your system administrators to act on a tested plan.
In this scenario I am not going to consider any of the Configuration or SSP and Search databases. Typically when you plan for DR you should have a standby web front end server pre-configured to match as close as possible with your production server. This typically means that you will have installed WSS or MOSS and created your web applications. For fail over you would also have a SQL instance on standby where you can attach your log shipped database(s) to.
Consider the following diagram.
](https://www.chandima.net/Blog/Lists/Posts/Attachments/185/DRSolution._2.png)
This is a somewhat simplistic view of what your DR plan may potentially look like. In this scenario You have a Live (Production) SharePoint farm in Wellington (Wellington is the capital of New Zealand for those not from New Zealand). Auckland which is situated at the top of North Island in New Zealand is where the DR farm is located. As mentioned previously my focus of the post is to highlight the steps in order for the recovery and not how to setup such a deployment.
For the DR plan to be effective you will need to have the following rights setup on the destination (Auckland DR) farm. That is basically the setup accounts used in your Wellington farm and the setup accounts in Auckland should be the same or if they are different they should have the following applied. Previously I have posted about setup accounts and why these are important when deploying SharePoint.
Steps to recover content database(s) from the log ship destination and restore the DB DR server in Auckland
Assuming that you are now operating the Auckland servers and have a copy of your database restored to the SQL server in the DR farm, you can now attach the content database to the standby web application server (DR-SPWFE). You can do this via the Central Administration Interface or the STSADM command line. Personally I prefer the STSADM command line.
Delete the existing database (Which is pre-configured without any site collections)
STSADM -o deletecontentdb -url <[http:// WebSiteName:port](https:// WebSiteName:port)> -databasename
Add the recovered database from the Wellington server.
STSADM -o addcontentdb -url <[http:// BackupServerName:port](https:// BackupServerName:port)> -databasename
Alternatively you can follow the steps via Central Administration
Repeat Steps 4-7 for each database that has failed over
The below diagram outlines the reverse scenario.
](https://www.chandima.net/Blog/Lists/Posts/Attachments/185/DRSolutionReverse_2.png)
Things that will take time are typically the time for the databases to be restored fully. The larger your content database sizes the longer it will take. Ideally you would have applied quota templates to your site collections or have setup multiple content databases so that you don’t end up with a DB larger than 50GB in size. I’d like to hear from anyone who had actually followed through such a proposed plan in a simulated setup to determine the time it would take to typically get back online. The best that I could do was 24 minutes from time of DR to recovery and back online for a very similar setup with 3 content databases of about 20 GB in size.