Web www.arconi.com

Author: Chuck Arconi

High Disk Queuing in Exchange Server using Qlogic 4010 HBA and NetApp SAN with CISCO 4507r

Recently I was experiencing severe performance issues with Exchange Server. The Exchange environment was as follows:

We had installed an email archiving product by the name of Mimosa (NearPoint). A lot of IT people were pointing the finger at Mimosa for the performance issues but after detailed analysis of the network we found out that the HBA was the culprit.

The main symptom was very high Disk queuing.

One of the other symptoms was the "Exchange is trying to retrieve mail" pop-up that you get in Outlook. Not a big issue but everyone was getting it and in some cases (such as a NetApp snapshot) mail would become unusable. Outlook would basically "lock-up". That was a BIG issue.

Finally after looking in all the wrong places, Exchange Server, NetApp Filer and the OS we finally looked at the actual iSCSI connection to the SAN (Storage Area Network) and we found that the 4010 HBA was sending lots of "bad" packets to the switch. As shown in one of the capture below. This was easy to test.
We set up monitoring and then started a large copy with small and large files in a folder from the internal drive (C) to the SAN attached drive (on our server it was the "H" drive).




We also monitored with Solar winds and gathered the data below.


We contacted Qlogic and opened a ticket. They asked us to run some utilities to gather information for them. Then they pretty much never got back to us. So I brought our NetApp rep into it and got them to work directly with Qlogic. As it turned out NetApp had many customers with the same issue. And they all had the same 2 things in common; Qlogic 4010 HBA's and Cisco 4507r's. So finally Qlogic has admitted there is a bug with their firmware but we never received any patch. So we pulled them out of all servers using them and moved to the internal NIC. It was very easy and straight forward.

As it turned out there were performance issues on all of the servers that had the Qlogic HBA's in them and removing them and running with the MS Initiator and a NIC solved the issue for us. In fact it was like we had new servers. I posted most of the performance gathering statics after the change out below.

After performing the HBA swap-out on "Exchange Server A" and "Exchange Server B" we have been monitoring performance at the Switch level, system level and the end-user experience. As far as the Switch goes below is a graph of the network data concerning the iSCSI connection to the SAN. This covers the entire month of May to date.


Compare this against just one days capture before the swap out-


Next we see the output from the connection to "Exchange Server B" again covering the month of May to date.



The data collected from Spotlight shows the decrease in disk queuing that we were experiencing before the change from HBA to internal NIC. Average disk queue is now at 2.25 with 1 or 2 peaks of 60 throughout the day.




Before the swap out we were experiencing the same RPC operations but significantly higher disk queue numbers. Our average was above 4 and our spikes were above 250 and frequent throughout the day.



The delivery time for messages from one server to the other ("Exchange Server B" to "Exchange Server A") has also improved significantly. Last month before the change we experienced an average of 1.5 to 2 seconds and spikes as high as 3 seconds. We also had many instances of alerts because we ran over the 40 second threshold.



Now after the change we see an average of 1 second and below with spikes of 2 seconds as well the amount of alerts breaking the 40 second threshold have diminished.




Over all the performance for our MS Exchange environment has improved greatly. Both at a server function level and in user experience. During all of this data gathering Mimosa was running. Last month Mimosa was experiencing data transfer issues, performance issues and Database issues. A full shadow copy with extraction took 4 – 5 days to complete for an entire server ("Exchange Server A"). After the change it is taking 1.5 days to complete the same task.
Other Mimosa performance issues have disappeared completely.

BIO  
Author:  
Experience:  
Area of Focus:  

 

About Me | Site Map | Privacy Policy | Contact Me | ©2006 ArconiSoftTools See who's visiting this page.