Starting in January of this year, a customer with Exchange 2003 on a Small Business Server system started having chronic outbound mail-delivery problems to some providers, and the story of resolution may help others understand a bit more about what Exchange is doing under the hood and how SMTP can appear to be outright broken. I'm also hoping to use the right keywords so Google will pick this up.
The problem started with a new Canon ImageRunnger 5570 copy/scan/fax device's scan-to-email feature (fast forward: there was never anything wrong with the Canon) while relaying through Exchange. Though mail to internal users worked fine, some external recipients would consistently be unreachable, timing out multiple attempts and then bouncing.
This was completely repeatable: sending to Yahoo! or GMail (for instance) would always fail, and putting a Wireshark sniffer on the outbound connection showed why: the SMTP protocol transaction was broken.
During the DATA phase, when the message headers and body are transmitted, each line ends in <CR><LF>, and the whole transaction is marked with a dot on a line by itself. This means that the receiving mailserver should see:
<CR><LF> (dot) <CR><LF>
to end the DATA phase. Since the first CR/LF is considered part of the mail message itself — as required by the RFC — this suggests that there is no way to directly send a message that doesn't end in CR/LF (one uses attachments or encodings if binary data needs to be sent).
But the file being sent ended in just a <LF>, so the end of the DATA phase looked like:
%%EOF <LF> (dot) <CR> <LF>
The %%EOF <LF> was the tail end of a PDF file.
But since the magic <CR><LF> (dot) <CR><LF> pattern was not seen by the receiving mailserver, it assumed this was all part of regular data, so it waited for more, expecting the final proper dot to come later. When it didn't come, because the sender was waiting for the acknowledgement of the end of the DATA phase, the whole transaction timed out with failure.
Some mailservers would apparently accept this clearly-broken SMTP transaction, but most would not.
I briefly thought that the Canon must somehow be contributing to this problem, but sniffing the the SMTP going into Exchange showed a different transaction entirely: it was doing full and proper encoding of the data, including a CR/LF pair at the end of every line, including the dot to end the DATA phase.
For some reason, Exchange was mucking with the form of the message, and this seemed really odd. For a mature product like Exchange 2003, it was very, very difficult to believe I had actually found a fundamental protocol bug.
At around the same time, emails from the customer's line-of-business software to their own customers started bouncing, and watching the SMTP transaction showed the same problem: Exchange wasn't properly terminating the end of the DATA phase, so it was timing out.
These new failed messages had nothing to do with the Canon, and at this point I'm completely sure that Exchange couldn't just be innately broken - something else was going on. Unfortunately, it took weeks to track down.
Exchange and SMTP
It will surprise many to learn that Exchange 2003 and prior do not have an SMTP engine. Instead, this function is performed by IIS (Internet Information Service), which obviously does more than just webservice.
It's this IIS SMTP engine that allows a Server 2003 system to function as a real mailserver even without Exchange (though with far less functionality).
The SMTP engine has provisions for hooks — known as "event sinks" — to allow outside products (such as antivirus scanners or Exchange itself) to add their own custom special processing at various stages through the mail flow.
I had a sneaking suspicion that one of these hooks involved, but at the time was all a black hole to me, with no obvious way to look around, or even how to ask coherent questions about it.
As a side note, I understand that Exchange 2007 has its own standalone SMTP engine, separate from that provided by IIS.
Looking in the \Exchsrvr\Mailroot\vsi 1\Queue\ folder, where mail files are kept pending delivery, it was apparently that the message body had in fact been rewritten from the form that it arrived during SMTP: this put the onus on the SMTP receive process for mucking with the message body, not the SMTP send.
Stuff that didn't help
To rule out the notion that Exchange 2003 was broken in general, I created a small perl script that injected the questionable message via SMTP, and was able to re-run tests consistently. Running it at another customer with the same version of SBS/Exchange worked fine, so it was clearly something in our own environment responsible for all this fun.
An obvious change was to fool with the 7bit/8bit encoding used by the SMTP transport, and that didn't help at all.
I don't know what changed that made things break starting in ~January, I don't believe we had made any changes in the Exchange configuration, but it didn't seem out of the question that antivirus could be responsible here: they use content-filtering hooks of one kind of another, either via the event sinks or the Virus Scanning API, so at least a mechanism was in place to explain a possible avenue here.
But uninstalling / reinstalling the antivirus didn't have any apparent effect, nor did upgrading to the newest version, and combing through vendor release notes / known issues didn't yield any hints of others having this same problem.
What also didn't help was reinstalling Exchange; this is a long process, complicated somewhat by the presence of Service Pack 2, but I managed to get through it. No joy.
At this point I was feeling quite daunted: we had rounded up all the usual suspects without success, and the black hole was getting darker.
Event Sinks
With no other real avenue, I decided to dig into this event-sink business, and the foray was into the registry, following a mind-numbing chain of GUIDs. This looked like a morass from which I'd never escape, until I found a Microsoft tool for dealing with exactly this: smtpreg.vbs
smtpreg.vbs Event Management Script
This little script can query and modify the registry stuff required, and one always starts with enumerating the existing event sinks:
C> cscript smtpreg.vbs /enum
Source {1B3C0666-E470-11D0-AA67-80C04FA345F6} {
DisplayName = smtpsvc 1
OnArrival Sinks {
Binding {31653B8C-688E-4AA7-94AF-2D8B214BF2DB} {
DisplayName = XRecipientList
SinkClass = XRecipientList.AddXHeaders
Status = Enabled
SourceProperties {
Priority = 28010
}
SinkProperties {
}
}
Binding {66F4180A-627E-4073-8D31-51F628AA8366} {
DisplayName = POP3 Connector Event Sink
SinkClass = Imbdlvres.IMBDeliveryEventSink
Status = Enabled
SourceProperties {
rule = RCPT TO=*mspop3connector.*
priority = 0
}
SinkProperties {
}
}
Binding {7FD4C849-5506-46F2-B2F9-5DFBBAAFB22B} {
DisplayName = Exchange Transport XEXCH50 Submission sink
SinkClass = peexch50.submit
Status = Enabled
SourceProperties {
priority = 100
}
SinkProperties {
}
}
Binding {92DE29D2-A8AA-4A82-B8AA-657CF32A67C8} {
DisplayName = Exchange Transport AntiVirus API
SinkClass = Exchange.TransportAVAPI
Status = Enabled
SourceProperties {
priority = 28000
}
SinkProperties {
}
}
Binding {A6F75DBA-217C-11D2-9A57-00C04FA32883} {
DisplayName = ISM SMTP Transport
SinkClass = CDO2EventSink.IsmSink1
Status = Enabled
SourceProperties {
Rule = RCPT TO=_IsmService@e15ed15c-...._msdcs.customer.lan
priority = 8192
}
SinkProperties {
}
}
}
}
Here we see OnArrival sinks, which are used by SMTP, and some of them appeared familiar to me: if I can disable them, one at a time, perhaps I'll find the culprit?
First on the chopping block was the third sink, Exchange Transport XEXCH50 Submission sink, because we had long ago disabled the XEXCH50 verb in SMTP. This verb allows two Exchange servers which are part of a common organization to start a more efficient secret handshake.
But since Small Business Server installations are essentially never part of larger Exchange organizations, this does nothing but cause trouble: I'd previously written a Tech Tip on how to disable this keyword:
Unixwiz.net Tech Tip: Disabling XEXCH50 in Exchange 2003
Disabling the use of the XEXCH50 verb is different from disabling the associated OnArrival event sink, and since the verb had been disabled years ago, it was hard to see how this could have caused a problem in the last few months.
But no harm in trying:
C> cscript smtpreg.vbs /disable 1 Onarrival "Exchange Transport XEXCH50 Submission sink"
Alas, no luck: incoming mail messages were being rewritten into a form that broke when sent to strict mailservers. so I moved onto the "XRecipientList" sink.
This is a third-party addon that adds an X-Recipient: header to represent the original recipient of the message, something that's very useful for sites that have many distribution lists. Though often the list-name is visible in the headers, if the message was sent via Bcc: you'd have no way of looking to know which of the many lists routed the message to your inbox. XRecipientList solves this problem, and it's been very helpful at one customer who had a zillion distro lists.
Disabling this sink:
C> cscript smtpreg.vbs /disable 1 Onarrival "XRecipientList"
and then sending another test message: Bingo!.
Apparently, XRecipientList is modifying the message in a way incompatible with sending back out again, and removing it from the system made the problem go away.
Lingering questions
The first question is: why did this suddenly break with two unrelated systems? The answer appears to be that it was a coincidence.
The Canon device was new to the scene, and the line-of-business software had apparently made a change in email format, from one valid format to another, and both just happened to trip across this message-formatting thing. That's one dormant landmine!
The deeper question is why the SMTP component can be so badly fooled by a malformed message file: as if it can't tell what's at the end of the file it's sending such that it does proper CR/LF termination. If nothing else, being able to report (say, in the event log) that something is broken would be better than failing mysteriously.
I've communicated the details of this to the author of the free XRecipientList tool, but as it appears to be a dormant project, I think it's unlikely that anything will be done about it.
But the problem is fixed, and I learned a lot.
I hope you did too :-)





Comments