Content here is by:
Michael Still
mikal@stillhq.com

All my Open Source projects
Online CVS server
Extracted view of CVS
Home
Site map
July 2008
Sun Mon Tue Wed Thu Fri Sat
   
   

ImageMagick book
MythTV book





Sat, 22 Mar 2008



Normalising mail server package names

posted at: 11:37 | path: /research/smtp/survey | permanent link to this entry
There are no comments on this post which have survived moderation. 5 posts have been culled and 0 blocked. Be the first to make a non-spam comment here, please!


Thu, 20 Mar 2008



Announcing early results of my survey of SMTP servers

posted at: 01:54 | path: /research/smtp/survey | permanent link to this entry
There are no comments on this post which have survived moderation. 5 posts have been culled and 0 blocked. Be the first to make a non-spam comment here, please!


Sat, 15 Mar 2008



Mikal, tell something I didn't know about SMTP servers on the Internet

    As part of my
    survey of SMTP servers on the Internet (a graphical representation of the results from that post are here), I need to find SMTP servers to survey. One of the ways that I've been doing that is I've been performing large numbers of DNS Mail eXchanger (MX) lookups and then probing the SMTP servers identified by those lookups. I haven't been able to perform those lookups on every domain registered, because not all registrars make their zone files available to researchers. I have a compendium of what I've learnt about zone file access agreements online if you're interested.

    Specifically, I performed the following lookups:

    Zone Number of lookups
    .arpa 5
    .asia 9,044
    .com 72,529,657
    .mobi 819,849
    .net 10,734,157
    .root 281
     84,092,993


    For each of these domains a DNS MX record lookup was performed using around 100 machines, and the results stored in a series of sharded tables in a MySQL database.

    In aggregate, the results look like this:

    Total (IP, domain) tuples:72,863,506
    Total unique IPs:2,136,511
    Total unique domains:46,993,011


    There are some interesting things to be found in the MX record data. For example, only 55.8% of the domains I scanned have an MX record at all. That might seem a bit counter intuitive, but when you take into account that a lot of domain names are unused or used simply for a web site, I guess its not that surprising. I would like to spend some more time verifying that this isn't a bug in my survey code, but I haven't gotten around to doing that yet.

    Another interesting fact is that GoDaddy appears to be hosting a very large number of domains. Specifically, I found 12,105,590 domains which had one of just two IP addresses owned by GoDaddy as their MX record. That's 25.76% of all of my results. This means that's GoDaddy's domain hosting business is massive -- certainly much larger than I realized previously.

    The IP addresses in question are 64.202.166.11 and 64.202.166.12. Some detail:

    IPDNS Reverse
    64.202.166.11mailstore1.secureserver.net
    64.202.166.12smtp.secureserver.net


    secureserver.net is a domain registered to "Wild West Domains, Inc.", who appear to be part of the GoDaddy family (according to this GoDaddy help page, secureserver.net is used for GoDaddy DNS servers among other things). To determine how many of these domains are parked, I fired off some download jobs to download the top level page of each domain. At the moment, 1,087,885 of those downloads are complete.

    Domains parked with GoDaddy HTTP 302 redirect from the top level page to a URL which is the domain name followed by a short identifier. For example, rastegarenterprises.net 302 redirects to rastegarenterprises.net/?bdb1d640 -- which is a page displaying advertising. Of the sites I have tested so far, 714,455 are parked in this manner.

    That means GoDaddy currently has approximately 7,950,196 domains parked. That's around 9.4% of all the domains I have scanned!

    Based on looking at IPs serving as MX for an unusual number of domains, the only other immediately obvious entry is that 184,213 domains point to 127.0.0.1. That seems a little bit odd to me.

    I'm sure there is other interesting information in this MX data, but I think I'll leave it here for now.

    Tags for this post: research(S) smtp(S)

posted at: 13:14 | path: /research/smtp | permanent link to this entry
There are 1 comments on this post, and 4 comments which didn't survive moderation. 2 were blocked by trained gerbils. Click here to see them.


Fri, 07 Dec 2007



Initial SMTP survey poster results in a pie chart

posted at: 10:31 | path: /research/smtp/survey | permanent link to this entry
There are no comments on this post which have survived moderation. 2 posts have been culled and 1 blocked. Be the first to make a non-spam comment here, please!


Sun, 02 Dec 2007



Microsoft Exchange the most popular SMTP server on the Internet?

    Eric McCreath from the Department of Computer Science at the Australian National University and I presented a poster entitled "Inferring Relative Popularity of SMTP Servers" at USENIX LISA 2007. This blog post is a brief discussion of the content of the poster, as well as a landing page for the paper version of the poster as well as the the PDF of the actual poster. For more detail into the measurement techniques used, please check out the complete paper.

    We conducted this research because there is little data on the relative popularity of the various available SMTP server implementations. This data is of interest because it aids the development of systems which interact with these servers. For example, a potential DDoS protection system should be tested with the most common SMTP servers, as these are the ones that it is most likely to encounter in everyday use.

    Many businesses rely on email of some form for their day to day operation. This is especially true for product support organisations, who are largely unable to perform their role in the company if their in-boxes are unavailable. Allman in "Spam, Spam, Spam, Spam, Spam, the FTC, and Spam" states that Nuclear Research studies estimate that spam costs US businesses $87 billion a year. It seems reasonable to assume that if a low level attack is costing that much, then a complete outage would impose an even greater burden on an enterprise.

    There has been little research conducted into the current state of SMTP servers on the Internet, perhaps because this area of research has not been particularly fashionable in comparison to the HTTP metrics which are commonly collected. This is an important area of research however given the level of traffic served by these systems has been growing for years. Barracuda Networks cite Radicati research which indicates that in 2009 228 billion emails will be sent per day, with the vast majority being spam (see Barracuda's site for more details). Afergan and Beverly in "The state of the email address" evaluate the state of email servers in an attempt to determine how SMTP servers are coping with the growth in traffic. Their approach involved sending out probe emails to a variety of domains. The email was crafted to have a strong assurance of bouncing because of not being addressed to a valid address. The authors then monitored the bounce traffic. They concluded that corporate SMTP servers are under surprising levels of strain and do not bounce undeliverable emails in a predictable manner.

    We have therefore started to undertake research into SMTP servers as they appear on the Internet, with our first study being a simple survey of which SMTP implementations are most commonly deployed. Our poster discussed the current state of that survey, and provide some early results.

    The challenge with determining the popularity of various SMTP server implementations is twofold -- firstly, not all of the SMTP servers which interact with the Internet are able to be probed from the public Internet (for example SMTP routers which route email that came from the Internet, but are not themselves accessible from the Internet); and secondly the sheer number of SMTP servers connected to the network. We have therefore used both passive and active measurements to survey these servers. Each of these measurement techniques is described below.

    Bearing in mind that our survey is quite new, and that only 34.6 million IP addresses have been probed so far, the initial results are quite interesting.



    You can see from the graph that the most popular SMTP server in our dataset is Microsoft Exchange, followed by Postfix and then Sendmail.

    Additional analysis of our existing data, as well as further development of the email parser will improve the accuracy of our survey, which will also increase the number of machines included in the survey. The survey also needs a wider set of inputs for possible IP addresses to probe -- one example of another possible source of probable SMTP servers is MX records for registered domain names. The distributed probing system needs further development to handle the scale of the proving required for a large number of SMTP servers to be included in the survey, and improvements to the reliability of the central server are also required.

    This SMTP survey is in its early stages, and there is much work still to do. However, research of this nature is likely to produce results which are of interest to both the research community, as well as software developers and systems administrators. So far a small dataset has been analysed, which has resulted in a reasonably robust distributed probing system being constructed. Further work on the survey will continue in the future, with updated results being published from time to time.

    Tags for this post: research(S) smtp(S) survey(S)

posted at: 09:27 | path: /research/smtp/survey | permanent link to this entry
There are 1 comments on this post, and 7 comments which didn't survive moderation. 2 were blocked by trained gerbils. Click here to see them.