Open Letter To Our Email Service Resellers

Dear Resellers,

I am writing today to speak to you directly about what happened this week with Cluster A of our Email Service. This will not refer to specific elements of the outage, there are other venues for that. The things I most want to communicate are my deep sorrow, why it won’t happen again and what we will do for you.

More than anything one thought keeps going through my mind as I think about this, the future determines the past. I will return to this thought.

First, and most importantly, we are sorry. I am sorry. I have been in this business a long time and do not know if I have ever been more sad about what we have done to you, to your customers and to how people think about us. An email outage in 1995 was different from one in 2000 and even more different from one in 2008. I know what this does to your reputations, to your customers and to your staff - and I and so many people here are just sad about that.

While it seems trite right now, we really define ourselves by how we make it easier for you in your businesses and with your customers and in our deep understanding of those relationships. That means the pain here is that much greater and believe me I know our pain here does not matter, yours does. Just know we are grieving.

Second, what will we do about it and why will this never happen again? I know for some of you that doesn’t matter, you are done with us, but I want to express this for the rest of you. Let me start here with things that were not the problem, old equipment, people, capacity or redundancy. The equipment is new, the people are great, we have plenty of capacity and redundancy. What this will mean for us is clearly the need to take the other elements of the service to a completely new level. Here I mean monitoring, change management, emergency protocols and procedures and operating efficiencies.

We had decided long before this that the most important part of email was reliability, not features, not groupware, not web 2.0 integration but reliability and deliverability. I have been at this a long time and really believe that these people and this service can be the best in the world, better than Google, Yahoo or Microsoft and most importantly the best partner for service providers. We owe you this and will deliver it.

Lastly, what we will do for you as a result of this? Let me start here by saying two things, we will certainly be doing something and that there is nothing we can do that will make up for your loss of reputation in your customers’ eyes. We know that. The people who will participate in that decision are fried right now, as I know even in your anger you can well imagine. I will ask your indulgence that you give us this week to make our plan in this regard.

There is one thing that I can offer now. I would like to make myself personally available to any of you who would like me to either reach out to your customers, or to any specific customer, with a letter, an email or a phone call. I know this will not often matter but perhaps in a few cases it might. My message here would be simple, this was our fault not yours and while you are responsible for the suppliers you pick, you had good reason to pick us and it was us who let you down. This offer stands whether you are leaving or staying.

In closing, the future determines the past. If we move forward and run the most reliable, service-provider focused, email service the world has ever seen this will be remembered as the few days that turned it around, as being a very important event in forging out mutual future. If we have no change in reliability or in service levels this will barely be remembered. It will just be a point on a mediocre line. I will do everything in my power to make it the former not the latter.

Regards,

Elliot Noss


7 Responses to “Open Letter To Our Email Service Resellers”

  1. gsyoungblood Says:

    I have started an OpenSRS Resellers and Users group on LinkedIn.

    http://www.linkedin.com/e/gis/1012737

    This is an independant place where we resellers can discuss matters of interest to us. After this latest outage, I thought it would be a good idea for us to have a truly independent forum for us to discuss things.

    This includes what we are doing to recover from outages, keep our customers, and/or improve our businesses. It also includes items related to our types of business where Tucows/OpenSRS provides some of all the services we rely. Topics do not have to include Tucows or OpenSRS specifically.

    Joining the group is easy, especially if you are already on LinkedIn, using the link: http://www.linkedin.com/e/gis/1012737

  2. gsyoungblood Says:

    I am not familier with Dovecot, nor do I use NetApps. That said, I have come to the conclusion that you have core architectural deficiencies in at least one area of your email design.

    Both in August and last week we were told “rebuilding indexes” or “reindexing” were the cause for the last 2-3 days of recovery during the outage. That means if the reindexing problem could be addressed service would have been restored and available 2-3 days EARLIER.

    I don’t know your architecture and, as I said above, I don’t know the software you are using. That said, it seems as though it should be possible to do get around this bottle neck, or at least delay the impact somehow.

    In reading about Dovecot it seems that it supports standard maildir mode as well as its modified internal folder model. Perhaps there is a way to go to (the potentially) slower standard maildir model while rebuilding indexes in a more controlled manner allowing service to be restored and indexes rebuilt at the same time. Yes, at completion, the indexes wouldn’t be 100%, but based on comments in the video perhaps they would be clean and could be updated at the time of login with minimal impact, assuming of course that dovecot is smart enough to update and not rebuild in that scenario.

    The other suggestion is that you find a way to move affected users in a down cluster to another cluster for immediate restoration of email functionality. Users could then send and receive current email while the failed cluster is repaired. They would not have access to old messages. This would be “degraded” service but not a full outage. It would still be bad, but this would take the sting out.

    Lastly, if you implement bounces (preferably timeouts during delivery so messages stay in queues and don’t get lost assuming you can deliver them within 5-7 days), please make it adjustable (if possible) with a setting in MAC or reseller interface. I can certainly understand why some would want bounce notifications, but probably not everyone will. I am on several lists that auto unsubscribe you on bounces, so for myself personally I prefer not to have them bounced. Of course, my users often feel differently. :)

    I suspect that if you have a mail server in-line with the rest of your system as the front line, it can return a timeout delivery warning message and hold the message in the queue until it can be delivered. It sounds like you are already doing a lot of that right now, short of the bounce message. I believe typical configurations are 4 hours for delivery timeout warning message and 5 days for delivery failure. I’d probably bump up the 5 days to 15 to allow for crazy problems, but you get the idea.

    These are just a few thoughts and observations.

    I look forward to hearing about what changes you make.

  3. enoss Says:

    we have talked about something down this road (the indexing point), and the fundamental point is a good one and will definitely be part of the discussion.

  4. Mark Says:

    In my line of work, everything depends on trust. My clients have to trust me and I have to trust my suppliers. I regret the two outages in two months time. Sadly, this is more than enough to hurt my trust (also because of the lack of information during the outages). So, I had to move my mail accounts to Google Apps. I will be your customer regarding domain names but your mail solution is something I can’t trust anymore.

    good luck,

    Mark

  5. enoss Says:

    I agree completely with what you say about trust.

    I have my own view about whether you should EVER use a service that is free (even if you pay, you are still paying for a service tuned for “free”) for a number of reasons, but I am in no position to judge right now :-(

  6. mirrorboy Says:

    honesty may be the best policy but is too often the most sour medicine.

    fix it. tell us how you fixed it. tell us what was actually wrong. tell us what will happen if it happens again before ‘10. live by it. simple things that are hard to do. push up your quarterly to address this, and to get the bad news out soon, and where it will serve you best.

    “my isp sucks less, should not be our goal.”

  7. Cassiopeia Says:

    I agree that using a FREE service for clients is NEVER the way to go. On the other hand Tucows is only a step up from FREE, isn’t it?
    For a small Reseller like me it’s this or nothing. There is no way I could afford the whooping monthly fees of other SP’s, nor would the route of in-house mail server be insurance against disaster. Quite the contrary I’m sure.
    As to trust, there isn’t really any left.
    In spring 2007 there was so much unpredictability with the email service, it caused me to loose a good customer. I have not really pushed email as much as I could have due to the fact that I don’t really trust your service 100% - and that is a shame, really.
    Thank you for the apology, though.
    Now I’d like to know what exactly happened.
    The more we are kept in the loop the better. Since we are literally your front lines, it would be good to know what we are ‘fighting’ against.