Windows Management and Scripting

A wealth of tutorials Windows Operating Systems SQL Server and Azure

Posts Tagged ‘DAG’

Run Exchange 2010 backup or skip this

Posted by Alin D on October 29, 2011

When Exchange Server 2010 RTM hit, some IT pros suggested that database availability groups in Exchange 2010 make traditional backups unnecessary.

I initially scoffed at the idea. However, Exchange 2010 has been available for a while and the idea of a “backup-less” Exchange server makes sense in some environments.

The concept behind ‘backup-less’ Exchange
A backup is nothing more than a point-in-time copy of your data. It is this deceptively simple definition that led to the idea of running Exchange 2010 without backups.

Some say running Exchange 2010 without backups is safe because of the way database availability groups (DAGs) work. A single DAG can contain up to 16 mailbox servers and an individual mailbox database can be replicated to any combination of mailbox servers within the DAG.

The argument against backing up Exchange 2010 boils down to how many copies of data you really need. If you already have 16 replicas of a mailbox database, do you really need a seventeenth copy as backup?

Important Exchange backup considerations
While the argument against backing up Exchange 2010 in environments with DAGs sounds logical, there are a number of important factors to consider before ditching your backup system.

  • DAG size
    While you can include up to 16 mailbox servers in a DAG, you can also create very small groups. Therefore, you must consider the size of your DAG before abandoning backups. Microsoft recommends that you only consider going without a backup if you have three or more mailbox servers in your DAG.
  • Transaction logs
    Typically, when you back up an Exchange mailbox server, the contents of transaction logs are committed to the database as part of the backup process. If you never perform a backup, the transaction logs accumulate until the volume runs out of disk space. Because of this, organizations that do not back up Exchange 2010 must enable circular logging to prevent log file accumulation.
  • Offsite storage
    It’s easy to think of a backup-less Exchange organization in the same way as a disk-based backup solution because database contents are replicated to other servers.However, organizations that depend on disk-based backups usually adopt a disk-to-disk-to-tape solution where the disk-based backups are periodically copied to tape and stored offsite. If the data center burns down, the backups remain safe.

    If you’re considering operating Exchange without backups, it’s smart to place a few DAG members in a remote data center. That way, your data remains protected even if something happens to your primary data center.

  • Point-in-time recovery
    The biggest disadvantage to running Exchange 2010 without backups is that you lose the option of accurate point-in-time recoveries. For example, imagine that your entire company became infected with a virus.In this situation, you could restore a backup that was made prior to the infection, rather than trying to remove every infected message from your mailbox database. This is simple with a traditional backup, but isn’t practical if you go without.

    Notice that I didn’t say that it’s impossible to perform a point-in-time recovery without a backup. Microsoft does let you create lagged database copies that log files are not immediately replayed on. That way, if you need to revert to a particular point in time, you can activate a lagged copy.

    The problem is that there’s a lot of guess work involved in the process. You must know exactly when the problem began in order to get rid of all of the transaction logs that were created after the problem occurred. This is accomplished by replaying the transaction logs that were created prior to the problem. Unfortunately, there isn’t an easy way to figure out which transaction logs should be used and which should be deleted.

    As you can see, it’s perfectly feasible to run Exchange 2010 without traditional backups in certain situations. That said, I advise backing up Exchange as you always have. If an unforeseen set of circumstances leads to data loss, you won’t have to explain to your boss or management that you don’t have backups.

Posted in Exchange | Tagged: , , , , , , | Leave a Comment »

The Exchange 2010 High Availability – Active Manager

Posted by Alin D on January 5, 2011

On each Exchange 2010 mailbox server there is a process running inside the Microsoft Exchange Replication Service (MSExchangeRepl) called Active Manager which is the high availability brain in Exchange 2010. Active Manager manages failover and switchover (known as *over) on DAG members by selecting the best database copy to be activated. It also frequently checks the Active Directory topology on each mailbox server for any changes.

On each DAG member we find either a Primary Active Manager (PAM) or a Standby Active Manager (SAM). All DAG members will have a SAM except on the node that owns the quorum that will be running a PAM. One of the Active Manager functions is to monitor the database and information store health. If the database fails on a DAG member, SAM informs the PAM about this failure to take action.

SAM also responds to queries from Client Access servers and Hub Transport servers for the active mailbox database owner. When a user connects to a CAS, the CAS queries the Active Manager on a DAG member for the active mailbox database where the user mailbox is located, to connect the user to. The SAM responds with the server on which the database is active.

If the quorum owner failed, the Active Manager on the node that obtains the quorum becomes the Primary.

If an active mailbox database fails, and if the database is configured for replication, the PAM is notified to select the best database copy to be activated on another node. This is done as follows:

  • The Best Copy Selection (BCS) algorithm is run.
  • After selecting the best copy, the PAM notifies the selected server to become the next database master.
  • The Microsoft Exchange Replication Service on the selected server will try to copy the logs from the previous active database master through the Attempt Copy Last Logs (ACLL) process. ACLL will query other servers where there is a healthy database copy with the highest log generation number by checking the LastInspectedLogTime.
  • After completing the ACLL, the PAM notifies the selected server to mount the database. If all the logs were copied successfully, the database mounts without any data loss. Otherwise if some of the logs could not be copied then the database will only be mounted if the number of missing logs (copy queue length) is less than the value configured for the AutoDatabaseMountDial parameter.

    The AutoDatabaseMountDial parameter can be set to BestAvailabilityGoodAvailability orLossless. If the value is BestAvailability, which is the default, the missing logs must be less than or equal to 12 for the database to be mounted automatically. If the value is GoodAvailability the missing logs must be less than or equal to 6 for the database to be mounted automatically. If the value is set to Lossless all the logs should be copied to the selected server for the database to be mounted, in other words, the number of the missing logs must be 0.

  • If the database could not be mounted on the selected server, the next candidate mailbox server obtained by the BCS process will be selected (if any). If there is no other mailbox server to be selected, the administrator has to manually mount the database and accept the data loss.

There are other reasons because of which the mailbox database might not be mounted on the selected server. A property can be configured for each DAG member to limit the number of simultaneous active databases on it. If the limit is reached, no other database copy can be activated or mounted on this server and the PAM will repeat the process of selecting the next database master again. This limit can be configured using the Set-MailboxServer cmdlet with the -MaximumActiveDatabases parameter.

Set-MailboxServer -Identity <MailBoxServer> -MaximumActiveDatabases <number>

Another case when the server selected to be the next database master will not automatically activate and mount the database copy is that when the automatic database activation is disabled on the server; i.e. theDatabaseCopyAutoActivationPolicy for the server is set to Blocked using the command:

Set-MailboxServer -Identity <MailBoxServer> -DatabaseCopyAutoActivationPolicy Blocked

Now after we described what happens after the best copy is selected, we need to know how the PAM selects that best copy. What is behind the BCS algorithm? This is what we will discuss next.

Best Copy Selection process

The BCS algorithm results in a list of database copies that represent a good candidate for activation. This list is then sorted based on (in order):

Primary key:

  1. The lowest Copy Queue Length CQL (the highest LastLogInspected)
  2. The lowest Reply Queue Length RQL
  3. Content Index CI status (Healthy or Crawling)

Secondary Key:

  1. The lowest Activation Preference (which you specify when adding a database copy to the mailbox database)

The previous selection criteria is applicable for both Exchange 2010 RTM and Service Pack 1. But in Exchange 2010 SP1 if the AutoDatabaseMountDial parameter for the mailbox server is set to Lossless then the list is sorted based on the Activation Preference as a primary key.

There will be ten possibilities based on the above mentioned criteria, they can be summarized as:

  1. ( CQL < 10 ) and ( RQL < 50 ) and ( CI is Healthy )
  2. ( CQL < 10 ) and ( RQL < 50 ) and ( CI is Crawling )
  3. ( RQL < 50 ) and ( CI is Healthy )
  4. ( RQL < 50 ) and ( CI is Crawling )
  5. ( RQL < 50 )
  6. ( CQL < 10 ) and ( CI is Healthy )
  7. ( CQL < 10 ) and ( CI is Crawling )
  8. ( CI is Healthy )
  9. ( CI is Crawling )
  10. If none of the nine set of criteria are met by the database copies, then the PAM will try to activate a database with a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource.

CQL – Copy Queue Length
RQL – Reply Queue Length
CI – Content Index

Example

In our example we have a DAG with members (EX14MBX1 – 4). The mailbox database Main-DB01 has copies on these four members. The status of the database copies is as shown in the following figure:

Main-DB01 is currently active on mailbox server Ex14MBx1. If Ex14MBx1 failed and the content index status of the database copies on the other three mailbox servers (using the command Get-MailboxDatabaseCopyStatus) is:

Also let’s assume that the AutoDatabaseMountDial for all servers is set to GoodAvailability (copy queue length must be less than or equal to 6), and server Ex14MBx2 is configured to host no more than two active mailbox databases at the same time (MaximumActiveDatabases is set to 2).

Now if server Ex14MBx1 fails, the PAM, after being notified, will start the BCS process:

  1. Because the AutoDatabaseMountDial is not set to lossless a list of servers based on the lowest copy queue length is created. The list will be: Ex14MBx2, Ex14MBx4, Ex14MBx3
  2. The list will be sorted based on the above mentioned criteria.
    • Ex14MBx2: (CQL<10), (RQL<50) and (CI is Crawling) – match criteria 2
    • Ex14MBx3: (CQL<10), (RQL<50) and (CI is healthy) – match criteria 1
    • Ex14MBx4: (CQL<10), (RQL>50) and (CI is healthy) – match criteria 6

    So the resulted list is

    1. Ex14MBx3
    2. Ex14MBx2
    3. Ex14MBx4
  3. ACLL will now try to copy the missing log files from the previous database master Ex14MBx1 to the first server in the list which is Ex14MBx3. However Ex14MBx1 is down and cannot be contacted. Furthermore since the AutoDatabaseMountDial is configured to GoodAvailability (copy queue length must be less than or equal to 6) and the actual copy queue length on server Ex14MBx3 is 8 (see above screen shoot) then Ex14MBx3 will not activate Main-DB01.
  4. The next server in the list Ex14MBx2 will be notified. The copy queue length is 5 so this database can be mounted on this server but Ex14MBx2 is configured to activate only 2 databases (MaximumActiveDatabases = 2) so it will not be activated on this server.
  5. Server Ex14MBx4 will be notified. Copy queue length is 5 and content index status is healthy. ACLL will notify PAM that the process succeeded on server Ex14MB4. PAM will notify Ex14MB4 to mount and activate the mailbox database Main-DB01.

In case none of the database copies can be mounted, then the administrator has to manually activate a database copy on one of the servers and accept the data loss.

Summary

Today we looked at the Active Manager role. Running on each Exchange 2010 mailbox server, it controls High Availability and monitors Database Availability Groups . On DAG members it controls switchover and failover by selecting the best candidate to activate the database copy on it based on a set of criteria. Finally we saw an example describing how the active manager selects the best database copy to mount and activate.

Posted in Exchange | Tagged: , , , , | Leave a Comment »

Exchange Server 2010 DAGs and VMware High Availability

Posted by Alin D on August 27, 2010

If you are planning to deploy Exchange Server 2010 Database Availability Groups, and you virtualize your Exchange environment, then it is important to understand the supported scenarios.

Microsoft makes it very clear in their system requirements for Exchange Server 2010:

Microsoft doesn’t support combining Exchange high availability solutions (database availability groups (DAGs)) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers. DAGs are supported in hardware virtualization environments provided that the virtualization environment doesn’t employ clustered root servers, or the clustered root servers have been configured to never failover or automatically move mailbox servers that are members of a DAG to another root server.

In short, DAG members should have any virtualization high availability options disabled.

This is in slight contrast to the advice from VMware themselves:

While the use of database availability groups on top of hypervisor based clustering is not a formerly (sic) supported configuration, internal VMware tests have shown that the two technologies can co-exist and can be a viable solution to ensure maximum recoverability in the case of a host failure.

To paraphrase, it isn’t supported but we think you’ll be fine.

You might get some push back from customers or managers who have been sold on the idea of VMware HA for everything, or who take the line from VMware as implied support for the configuration.  But in the real world I prefer to go with what is supported over what is possible.

Posted in Exchange | Tagged: , , , , , , | 1 Comment »