Friday, April 11, 2008

Duplicate e-mails detection by a mail client

This is the topic i worked on for last 2 days and i still do not have a certain answer. Basically i was trying to find out how can a mail client like outlook or may be Exchange server detect a duplicate message.
The scenario i considered is like the following
I send a mail from a@a.com to d@d.com. Now there can be 2 paths from SMTP server for a.com to SMTP server d.com and the paths are a.com->b
.com->d.com and a.com->c.com->d.com Now what happens if a.com relays the message to both b.com and c.com . This can be because of a bug or may be because of some implementation logic. In such a case d.com would receive the same mail twice and then the question is will the SMTP server at d.com or a mail client for d@d.com be able to detect that it is the same copy of 2 messages.
Me and my colleagues' thoughts on looking at the mail headers of various mails in our inbox was that the Message-Id parameter in the message header should help detect and delete/merge such duplicates.
So i did a simple POC by sending 2 messages with same message id (even same header) to my inbox. And i got 2 messages in my inbox as well :(.
Now the question is, is it a exchange/outlook bug or a feature. I still do not have an answer. I'll be trying it for my gmail account soon.
In the meantime i got SMTP specs by Jonathan Postel on http://www.ietf.org/rfc/rfc0821.txt and a cursory look at it makes me conclude 2 things
1) An ideal SMTP server would never do something as crazy as i am thinking (It very clearly discusses the protocol for relaying a message from 1 SMTP to another and i kind of feel that it would do it only to 1 SMTP server and would jump to next only when it gets an error code. No clue on what happens if other SMTP goes down after receiving this message and before giving a OK)
2) Message Id is something we can't rely on as primary key for an e-mail message as it is quite possible that there are malicious SMTP servers which can change the header while mail is passing through their place.

So on the whole my current feeling is ideally the duplicate e-mails won't be generated and if they are then there's no fool proof way to figure them out and merge them at the destination server/ mail client.


~Abhishek

PS :- I am off for triple thrill of Climbing, burma bridge and rappelling this weekend. Will write a post on that once back and some more details on the above topic as well :)

No comments: