Web www.arconi.com

Author: Chuck Arconi

Email Archiving Solutions

What is an email archiving solution?

Basically it’s a stand alone server product that keeps a real time copy of your organizations email for as long as you set it to. But it really does much more as shown in the two examples below.

Example 1: Your databases are growing fast and furious and you can’t put really strict limits on your end users. Your company culture just won’t allow it. So you implement the Archiving solution and you set its email aging retention to 2 months. Now as the system churns through your email servers every message item that it finds that is older than 2 months will be moved off to your Archive storage and a stub file will be put in its place. This will effectively cut your information stores by as much as half their current size (likely even 70%). So the end user will still see the email listed in their mailbox but really it’s only a marker. When they click on the email the system will go and retrieve it from the archive storage and restore it to their mailbox and the end user will most likely never be the wiser about the transaction.

Example 2: So let’s say you implement an archiving solution on January 1st and you set its over all retention time for 6 months. On March 5th you need to find an email that was deleted back in February. No problem, just go to your archive system and (if it’s a good product) you should have only a few clicks to retrieve that email or even a folder. (Some products will allow you to restore an entire information store)


There is much more to these systems than I could possibly cover in 2 paragraphs but you should be able to understand the basic function now when someone says to you “what’s and email archiving system”.

So lets move on to my experience with a few different Email Archiving solutions and why I chose the product I use currently.

I started by talking with industry friends about what they were using and also “googling” (is that a word?) to see what was out there. I found many products, Zantaz, Mimosa, Symantec, GFI , IBM  and more that I don’t have listed here. I broke down my search to criteria based on what I needed the product to do, its cost and how it would fit into my current environment.

  1. I needed to reduce my information stores (email storage) and move it off to “low cost” storage (which is really a fallacy; more on this later).

  2. I needed easy and quick access to the archived email.

  3. Legal would need to search the archived email for “legal discoveries” pertaining to litigation.

  4. HR would need to search the archive for their usual nefarious reasons.

  5. I had an existing NetApp SAN (Storage Area Network) that it would need to work with.

  6. It needed to be easy enough to manage that my counter parts could take over without hours of training if I wasn’t around.

  7. The product could not be invasive to MS Exchange ( It could not install drivers, software and make lots of registry changes to my email servers, my reasons for this later)

  8. Needed to be accessible thru OWA (Outlook Web Access)

  9. Also a “nice to have” would be no Outlook client agent.

I brought in many vendors for onsite and web presentations of their products. I quickly narrowed the crowd to 3 products. I won’t list the 3 products only the one I finally chose; I would like to tell you, but in truth they were all good products (for the most part) but just not all a good fit so I don’t want to needlessly shame anyone.

I found that 2 of the 3 products were very complicated to use and manage. As I investigated I found that 2 of the products were derivatives of another Archiving solution (one of the first on the market years ago) and after I got past the flashy presentation and started to use them in the lab I found they were very archaic in design and very complicated to manage. Setting them up could not be done without the vendor, trouble shooting them could not be done without the vendor. Two of them were invasive to Exchange (not disclosed to me initially) One of them VERY invasive to exchange, many, many registry edits done by hand! (That’s crazy)

The Lab
So it’s very important to test whatever products you’re considering in a “Lab Environment”. I want to go into detail on my lab environment so you can get an idea of how and what you should test. Many vendors will give you a demonstration that works flawlessly and you could be impressed enough say let’s buy it! You wouldn’t do that? Maybe not but I have seen this happen more than once.

My lab consisted of:
1= Active Directory controller.
• I promoted a server to be a domain controller in my production AD; then disconnected it from the network and moved it into my lab (a completely separate network with no ties to the production network.) I then performed a cleanup of the production network to remove any traces of the promoted controller. On the disconnected AD controller I seized the FSMO roles.

1= Exchange server
• Built from scratch using the same “build SOP” that I used on all of the other exchange servers.

1 =Archive Server (or whatever the vendor’s spec requires)
• I found that some of the vendors had specific needs like a certain version of SQL or other 3rd party software. In this case I let them know if they wanted to be in the “Lab Test” they would need to provide that software, they all gladly complied.

1= Network Appliance 3020 filer and a one terabyte fiber-channel shelf. (Not common for a lab environment but the IT department director was very aware of the importance of real world testing.) This was configured as SAN storage.

Testing
For each test I used Microsoft’s LoadSim. This allowed me to create a large number of users and simulate mail transfer so I could properly load test the system. I rebuilt the Exchange server for each product test to make sure my “control” was clean and the test would be “fair”.
The Lab testing quickly narrowed the field even more. This is why lab testing is so important. I found several bugs with one of the products; in the demonstration the product was very slick and trouble free. In the lab the after installing (even in the lab it needed 2 servers for the test, and I was told it would require 4 in production) the vendor engineers could not get the product to function correctly and I could only test about 40% of the products capabilities. The vendor could not figure out the issue but assured me it would work in “production”. Yeah, sure it would ;-)

Of the final 2 products one again was very slick in the demo but after it was installed in the lab and I got a chance to play with it I found that the actual management interface (which was not shown in the demo) looked like something from NT 3.51 days. Really it did! I even asked the sales engineer about it and he said that it was actually the same underlying code and interface from their NT & Exchange 5.5 product. He then said they were working on an interface update that would be out soon. Almost a year later another engineer friend told me he had just finished looking at the same product and the interface was the same! But the product, even though archaic, worked as promised. Remember this if nothing else – “what you see is what you get” never make your decision based on a promise of future releases, they may never happen.

The third product was excellent and was what I eventually settled on. The only problems I had were not technical in nature but the company suffered from growing pains so the sales engineers suffered communication problems with the sales staff and thier corporate engineers. I did see this improve as we continued working together. They also didn’t do a proper evaluation of our current infrastructure which led to problems with integration in our current infrastructure. (VSS versions to be exact.) I did get to speak with upper management at Mimosa and they shared with us thier plan to solve these issues and from what I have seen since it looks as though they followed through with thier promises. But they had the best interface (read “most intuitive”) of the 3 tested by far and they also had no “ties” or software installed to the exchange server or Outlook clients. They worked with OWA and they had very aggressive pricing. And last but not least they had the most impressive performance in the Lab tests.

So the winner was Mimosa Nearpoint.

So on to my implementation experience.

So let’s go over what I needed to implement this. First my environment consisted of 2 back end exchange servers and 2 front-end servers (load balanced). The Exchange environment contained a total of 650 mailboxes, 425 gigs of email.

Mimosa needs at least 2 servers the Nearpoint server and a SQL server (SQL 2005 to be exact). They have a formula that determines your growth trend for set amount of years into the future and this will guide your storage choice. We used SAN attached storage, a SATA drive shelf attached to the 3020 filer. So I mentioned earlier that I would talk about "low cost storage". So we decided to go with the SATA shelf to indeed use low-cost storage, it really isn’t that low cost, only in comparison to "fiber-channel" drives does it seem cheap. Well we ran into problems almost immediately. The SATA shelf could not handle the huge throughput needs during the initial copy or "shadow" of the exchange servers and started to choke. So every other attachment to the SAN suffered. At one point we were asked to "compress" the IOR (Index Object Repository) which is the main storage facility of Mimosa to give us some breathing room. In doing this we overheated or overworked the SATA shelf and it went offline taking down the entire SAN! No joke even with NetApps help we couldn’t get it back online until we turned it off for a day, The next day (sort of) we turned it back on and it worked?!

So if you want really good performance you may have to use higher performance storage than SATA, or you should think about storing entirely the Nearpoint data on its own storage device or at least on a SAN that is not "mission critical". Once the system is up and running the load on the storage system is minimal. As well we ran into some performance issues with Mimosa database running on SQL. These were addressed with a patch and some work by Mimosa's engineers.

NOTE: I have to say that Mimosa did not turn their nose up at us once and they really were dedicated to resolving any issues we had until they were completely resolved. In fact we were having a performance issue that everyone (in my team) blamed on Mimosa but as it turned out it was do a conflict with a Qlogic card. When confronted with performance issue Mimosa put an entire team of their best people on it to try and resolve the issue. And when we found out that it wasn’t their fault they didn’t ask for payment or even squawk a little. I was very impressed with their willingness and dedication to make the system perfect in our environment no matter what.

The Mimosa/Nearpoint server was configured as follows:
• Dell 2850
• Dual Proc
• 2 logical drives on 800 gigs of internal RAID5 storage.
• 1 SAN attached drive
• 4 Gigs of RAM

The SQL server was configured as follows:
• Dell 2850
• Dual Proc
• 2 logical drives on 200 gigs of internal RAID1 storage.
• 1 SAN attached drive
• 4 Gigs of RAM

Mimosa’s Nearpoint server uses 3 drives to make up its storage architecture. They are:

1. Shadow
 This is where the Exchange information stores get copied on the initial “shadow Copy”. This can be scheduled for of hours and users can still access email during this operation, albeit the Email system will be less responsive.

2. IOR Indexed Object Repository
 This is where the broken apart messages, header, meta data and attachments are stored.

3. Difference
 This is where the changed data that will be pushed back to the Exchange server resides.


The Shadow and Diff are not critical and in fact we chose not to perform any kind of back up on these drives. These drives were separated from the IOR because of the intense reads and writes between them during  “smart extraction”.

NOTE: Our first attempt at moving to production failed because of performance issues related to having all of the data/drives located on the SAN. I moved the shadow and Diff to the internal storage and left the IOR on the SAN.

The IOR is critical and was on a SAN and therefore a snapshot was taken every four hours along with the corresponding Database on the SQL server.

The final layout was as follows:



We were able to perform initial copies which entailed, shadowing the exchange server stores (copying them to the shadow drive) running through smart extraction (breaking the message into it's 3 parts, indexing and storing it in the IOR) in about 2 days, that's 2 Exchange servers and about 425 gigs of email. So scheduling the initial copies on a weekend is important because there can be noticeable performance change for end users during the shadow copy. The Smart extraction only occurs on the Nearpoint server and does not affect the Exchange servers in any way.

The nice thing about this system was the restore features. Some of the other products I tested did not have any way to restore folders or storage groups and of those that listed that feature it was very limited in "real world" usage. With Mimosa I was able to restore an email, a folder, a mailbox, a storage group or an entire server with just a few clicks. Literally; I tested it and it was pretty impressive. Now that its in production this is a feature used regularly by the Helpdesk.

E-searching

The search feature which they call "eDisovery" was such a valuable part of the product (at least to the company I deployed it at) that legal justified the purchase of the system almost solely on the that capability. Once in place you can assign any user you want to have the capability to search the entire archive for any email based on search criteria they define. My main user of this part of the system, a paralegal in the legal dept, was using several systems to search for data in ongoing litigations, network data and email. She told me that the interface for this one was the best and she could use it almost right away with very little training.

If your trying to justify the purchase of an email archiving system don't overlook the value of legal searches and discoveries. This could save your company hundreds of thousands of dollars in lost time and the ROI on this part alone could pay for the entire system.

Backups

Once this system was in place we were able to reduce our snapshot schedule by 90%. We were snapshoting every 2 hours previous to the Mimosa installation. Now we snapshot once a twice a day and that will probably go away once there is more confidence in the Mimosa system (read " its been around for a while with no problems").

Compliance

So one of the features that all of the archiving products tout is "compliance", legal, SOX, corporate, government standards, etc.. So this is something I investigated in all of the products I had in the lab. And this is an important point I want to make. All of the product vendors really pushed the fact that they used "journaling" to record all messages and that Mimosa didn't. And that journaling was the accepted standard, so if I wanted to stay compliant for legal searches I needed that. But the truth is that Mimosa's method was in fact better and more secure than journaling. Look, journaling when turned on in the Exchange server keeps a record of all messages. It doesn't keep the actual message just a record of its travel through the system. And all of the products scan the information stores at scheduled intervals (usually at night) to perform archiving based on the rules they have in place. Journaling also puts a heavy hit on your exchange servers.

But Mimosa uses "log shipping" which is by far a better method. This will be standard feature built into Exchange 2007.  So Mimosa copies your Exchange store to the "Shadow" volume and then every time a new log file is written (reaches 5 mgs) it is shipped (copied) over to the Mimosa server where it is played back into the copies of the information stores on the shadow volume. This bypasses completely the end user and gives them no chance to delete a message. They may delete it from thier inbox but not from the archive. This is fundamentally different than all the other products. In the other products I tested the user could receive a message or send a message and then delete it before the archive product could grab it. So you would have a journal that shows the message existed but not the message itself. So this is worse in a legal search because you have "smoke with no gun" and it looks like your trying to cover something up or your incompetent. With Mimosa there's no chance of this happening and there's no increased load on your Exchange server.

I actually confronted the other vendors (and these were old, big players in the field) with this data and they couldn't answer me, I finally got one of them to admit that the "journaling" method did have holes they couldn't address with their product.

Finally

So That's all I have for now, if you have questions about this article please feel free to write me and as always if I garner any new information that's valuable I will add it.

Author BIO  
Name:  
Experience:  
Area of Focus:  

 

About Me | Site Map | Privacy Policy | Contact Me | ©2006 ArconiSoftTools See who's visiting this page.