Author: Chuck Arconi
Email Archiving Solutions
What is an email archiving solution?
Basically it’s a stand alone server product that keeps a real time copy of
your organizations email for as long as you set it to. But it really does
much more as shown in the two examples below.
Example 1: Your databases are growing fast and furious and
you can’t put really strict limits on your end users. Your company culture
just won’t allow it. So you implement the Archiving solution and you set its
email aging retention to 2 months. Now as the system churns through your
email servers every message item that it finds that is older than 2 months
will be moved off to your Archive storage and a stub file will be put in its
place. This will effectively cut your information stores by as much as half
their current size (likely even 70%). So the end user will still see the
email listed in their mailbox but really it’s only a marker. When they click
on the email the system will go and retrieve it from the archive storage and
restore it to their mailbox and the end user will most likely never be the
wiser about the transaction.
Example 2: So let’s say you implement an archiving solution
on January 1st and you set its over all retention time for 6 months. On
March 5th you need to find an email that was deleted back in February. No
problem, just go to your archive system and (if it’s a good product) you
should have only a few clicks to retrieve that email or even a folder. (Some
products will allow you to restore an entire information store)
There is much more to these systems than I could possibly cover in 2
paragraphs but you should be able to understand the basic function now when
someone says to you “what’s and email archiving system”.
So lets move on to my experience with
a few different Email Archiving solutions and why I
chose the product I use currently.
I started by talking with industry friends about what they were using and
also “googling” (is that a word?) to see what was out there. I found many
products,
Zantaz,
Mimosa,
Symantec, GFI ,
IBM and more that I don’t have listed here. I broke down my search
to criteria based on what I needed the product to do, its cost and how it
would fit into my current environment.
-
I needed to reduce my information stores (email storage) and move it off to “low cost” storage (which is really a fallacy; more on this later).
-
I needed easy and quick access to the archived email.
-
Legal would need to search the archived email for “legal discoveries” pertaining to litigation.
-
HR would need to search the archive for their usual nefarious reasons.
-
I had an existing NetApp SAN (Storage Area Network) that it would need to work with.
-
It needed to be easy enough to manage that my counter parts could take over without hours of training if I wasn’t around.
-
The product could not be invasive to MS Exchange ( It could not install drivers, software and make lots of registry changes to my email servers, my reasons for this later)
-
Needed to be accessible thru OWA (Outlook Web Access)
-
Also a “nice to have” would be no Outlook client agent.
I brought in many vendors for onsite and web presentations
of their products. I quickly narrowed the crowd to 3 products. I won’t list
the 3 products only the one I finally chose; I would like to tell you, but
in truth they were all good products (for the most part) but just not all a
good fit so I don’t want to needlessly shame anyone.
I found that 2 of the 3 products were very complicated to use and manage. As
I investigated I found that 2 of the products were derivatives of another
Archiving solution (one of the first on the market years ago) and after I
got past the flashy presentation and started to use them in the lab I found
they were very archaic in design and very complicated to manage. Setting
them up could not be done without the vendor, trouble shooting them could
not be done without the vendor. Two of them were invasive to Exchange (not
disclosed to me initially) One of them VERY invasive to exchange, many, many
registry edits done by hand! (That’s crazy)
The Lab
So it’s very important to test whatever products you’re considering in a
“Lab Environment”. I want to go into detail on my lab environment so you can
get an idea of how and what you should test. Many vendors will give you a
demonstration that works flawlessly and you could be impressed enough say
let’s buy it! You wouldn’t do that? Maybe not but I have seen this happen
more than once.
My lab consisted of:
1= Active Directory controller.
• I promoted a server to be a domain controller in my production AD; then
disconnected it from the network and moved it into my lab (a completely
separate network with no ties to the production network.) I then performed a
cleanup of the production network to remove any traces of the promoted
controller. On the disconnected AD controller I seized the FSMO roles.
1= Exchange server
• Built from scratch using the same “build SOP” that I used on all of the
other exchange servers.
1 =Archive Server (or whatever the vendor’s spec requires)
• I found that some of the vendors had specific needs like a certain version
of SQL or other 3rd party software. In this case I let them know if they
wanted to be in the “Lab Test” they would need to provide that software,
they all gladly complied.
1= Network Appliance 3020 filer and a one terabyte fiber-channel shelf. (Not
common for a lab environment but the IT department director was very aware
of the importance of real world testing.) This was configured as SAN
storage.
Testing
For each test I used Microsoft’s
LoadSim. This allowed me to create a large number of users and simulate
mail transfer so I could properly load test the system. I rebuilt the
Exchange server for each product test to make sure my “control” was clean
and the test would be “fair”.
The Lab testing quickly narrowed the field even more. This is why lab
testing is so important. I found several bugs with one of the products; in
the demonstration the product was very slick and trouble free. In the lab
the after installing (even in the lab it needed 2 servers for the test, and
I was told it would require 4 in production) the vendor engineers could not
get the product to function correctly and I could only test about 40% of the
products capabilities. The vendor could not figure out the issue but assured
me it would work in “production”. Yeah, sure it would ;-)
Of the final 2 products one again was very slick in the demo but after it
was installed in the lab and I got a chance to play with it I found that the
actual management interface (which was not shown in the demo) looked like
something from NT 3.51 days. Really it did! I even asked the sales engineer
about it and he said that it was actually the same underlying code and
interface from their NT & Exchange 5.5 product. He then said they were
working on an interface update that would be out soon. Almost a year later
another engineer friend told me he had just finished looking at the same
product and the interface was the same! But the product, even though
archaic, worked as promised. Remember this if nothing
else – “what you see is what you get” never make your decision
based on a promise of future releases, they may never happen.
The third product was excellent and was what I eventually settled on. The
only problems I had were not technical in nature but the company suffered
from growing pains so the sales engineers suffered communication problems with the sales staff and thier corporate engineers. I did see this improve as we continued working together.
They also didn’t do a proper evaluation of our
current infrastructure which led to problems with integration in our current
infrastructure. (VSS versions to be exact.) I did get to speak with upper management at Mimosa and they shared with us thier plan to solve these issues and from what I have seen since it looks as though they followed through with thier promises. But they had the best interface
(read “most intuitive”) of the 3 tested by far and they also had no “ties” or
software installed to the exchange server or Outlook clients. They worked
with OWA and they had very aggressive pricing. And last but not least they
had the most impressive performance in the Lab tests.
So the winner was Mimosa Nearpoint.
So on to my implementation experience.
So let’s go over what I needed to implement this. First my environment
consisted of 2 back end exchange servers and 2 front-end servers (load
balanced). The Exchange environment contained a total of 650 mailboxes, 425
gigs of email.
Mimosa needs at least 2 servers the Nearpoint server and a SQL server (SQL
2005 to be exact). They have a formula that determines your growth trend for
set amount of years into the future and this will guide your storage choice.
We used SAN attached storage, a SATA drive shelf attached to the 3020 filer.
So I mentioned earlier that I would talk about "low cost storage". So we
decided to go with the SATA shelf to indeed use low-cost storage, it really
isn’t that low cost, only in comparison to "fiber-channel" drives does it
seem cheap. Well we ran into problems almost immediately. The SATA shelf
could not handle the huge throughput needs during the initial copy or
"shadow" of the exchange servers and started to choke. So every other
attachment to the SAN suffered. At one point we were asked to "compress" the
IOR (Index Object Repository) which is the main storage facility of Mimosa
to give us some breathing room. In doing this we overheated or overworked
the SATA shelf and it went offline taking down the entire SAN! No joke even
with NetApps help we couldn’t get it back online until we turned it off for
a day, The next day (sort of) we turned it back on and it worked?!
So if you want really good performance you may have to use higher
performance storage than SATA, or you should think about storing
entirely the Nearpoint data on its own storage device or at least on a SAN
that is not "mission critical". Once the system is up and running the load
on the storage system is minimal. As well we ran into some performance
issues with Mimosa database running on SQL. These were addressed with a
patch and some work by Mimosa's engineers.
NOTE: I have to say that Mimosa did not turn their nose up
at us once and they really were dedicated to resolving any issues we had
until they were completely resolved. In fact we were having a performance
issue that everyone (in my team) blamed on Mimosa but as it turned out it
was do a conflict with a Qlogic card. When confronted with performance issue
Mimosa put an entire team of their best people on it to try and resolve the
issue. And when we found out that it wasn’t their fault they didn’t ask for
payment or even squawk a little. I was very impressed with their willingness
and dedication to make the system perfect in our environment no matter what.
The Mimosa/Nearpoint server was configured as follows:
• Dell 2850
• Dual Proc
• 2 logical drives on 800 gigs of internal RAID5 storage.
• 1 SAN attached drive
• 4 Gigs of RAM
The SQL server was configured as follows:
• Dell 2850
• Dual Proc
• 2 logical drives on 200 gigs of internal RAID1 storage.
• 1 SAN attached drive
• 4 Gigs of RAM
Mimosa’s Nearpoint server uses 3 drives to make up its storage architecture.
They are:
1. Shadow
This is where the Exchange information stores get copied on the initial
“shadow Copy”. This can be scheduled for of hours and users can still access
email during this operation, albeit the Email system will be less
responsive.
2. IOR Indexed Object Repository
This is where the broken apart messages, header, meta data and attachments
are stored.
3. Difference
This is where the changed data that will be pushed back to the Exchange server
resides.
The Shadow and Diff are not critical and in fact we chose not to perform any
kind of back up on these drives. These drives were separated from the IOR
because of the intense reads and writes between them during “smart extraction”.
NOTE: Our first attempt at moving to
production failed because of performance issues related to having all of the
data/drives located on the SAN.
I moved the shadow and Diff to the internal storage and left the IOR on the
SAN.
The IOR is critical and was on a SAN and therefore a snapshot was taken
every four hours along with the corresponding Database on the SQL server.
The final layout was as follows:

We were able to perform initial copies which entailed, shadowing the exchange server stores (copying them to the shadow drive) running through smart extraction (breaking the message into it's 3 parts, indexing and storing it in the IOR) in about 2 days, that's 2 Exchange servers and about 425 gigs of email. So scheduling the initial copies on a weekend is important because there can be noticeable performance change for end users during the shadow copy. The Smart extraction only occurs on the Nearpoint server and does not affect the Exchange servers in any way.
The nice thing about this system was the restore features. Some of the other products I tested did not have any way to restore folders or storage groups and of those that listed that feature it was very limited in "real world" usage. With Mimosa I was able to restore an email, a folder, a mailbox, a storage group or an entire server with just a few clicks. Literally; I tested it and it was pretty impressive. Now that its in production this is a feature used regularly by the Helpdesk.
E-searching
The search feature which they call "eDisovery" was such a valuable part of the product (at least to the company I deployed it at) that legal justified the purchase of the system almost solely on the that capability. Once in place you can assign any user you want to have the capability to search the entire archive for any email based on search criteria they define. My main user of this part of the system, a paralegal in the legal dept, was using several systems to search for data in ongoing litigations, network data and email. She told me that the interface for this one was the best and she could use it almost right away with very little training.
If your trying to justify the purchase of an email archiving system don't overlook the value of legal searches and discoveries. This could save your company hundreds of thousands of dollars in lost time and the ROI on this part alone could pay for the entire system.
Backups
Once this system was in place we were able to reduce our snapshot schedule by 90%. We were snapshoting every 2 hours previous to the Mimosa installation. Now we snapshot once a twice a day and that will probably go away once there is more confidence in the Mimosa system (read " its been around for a while with no problems").
Compliance
So one of the features that all of the archiving products tout is "compliance", legal, SOX, corporate, government standards, etc.. So this is something I investigated in all of the products I had in the lab. And this is an important point I want to make. All of the product vendors really pushed the fact that they used "journaling" to record all messages and that Mimosa didn't. And that journaling was the accepted standard, so if I wanted to stay compliant for legal searches I needed that. But the truth is that Mimosa's method was in fact better and more secure than journaling. Look, journaling when turned on in the Exchange server keeps a record of all messages. It doesn't keep the actual message just a record of its travel through the system. And all of the products scan the information stores at scheduled intervals (usually at night) to perform archiving based on the rules they have in place. Journaling also puts a heavy hit on your exchange servers.
But Mimosa uses "log shipping" which is by far a better method. This will be standard feature built into Exchange 2007. So Mimosa copies your Exchange store to the "Shadow" volume and then every time a new log file is written (reaches 5 mgs) it is shipped (copied) over to the Mimosa server where it is played back into the copies of the information stores on the shadow volume. This bypasses completely the end user and gives them no chance to delete a message. They may delete it from thier inbox but not from the archive. This is fundamentally different than all the other products. In the other products I tested the user could receive a message or send a message and then delete it before the archive product could grab it. So you would have a journal that shows the message existed but not the message itself. So this is worse in a legal search because you have "smoke with no gun" and it looks like your trying to cover something up or your incompetent. With Mimosa there's no chance of this happening and there's no increased load on your Exchange server.
I actually confronted the other vendors (and these were old, big players in the field) with this data and they couldn't answer me, I finally got one of them to admit that the "journaling" method did have holes they couldn't address with their product.
Finally
So That's all I have for now, if you have questions about this article please feel free to write me and as always if I garner any new information that's valuable I will add it.
| Author BIO | |
| Name: | |
| Experience: | |
| Area of Focus: |


