- pydns: used for DNS lookups
- IPy: used for IP address manipulation
- tlslite: a light weight SSL implementation used for SMTP TLS checks
I've been meaning for a while to release the RemoteWorker stuff I use for my SMTP server as open source. I finally got around to it.
You can find the source code here. Here is what the README file in that tarball has to say for itself:
RemoteWorker is licensed under the terms of the GNU GPL v2 and is Copyright (C) Michael Still (firstname.lastname@example.org) 2007 and 2008. Note that several of the dependencies have their own licenses, as described at the end of this file.
This is version v70 of RemoteWorker, a system intended for measuring the Internet by running on many machines at once. Setup of RemoteWorker is not simple, because everyone's use of the system will be different. You should expect to have to write your own bootstrap script, your own SSH babysitter, as well as probably needing to implement more commands in RemoteWorker itself to meet your measurement needs.
RemoteWorker is very simple. It takes a list of filenames on the command line, and executes the commands from each of these files in the order they are listed on the command line. The commands currently implemented are:
START: Denotes the start of a command file. Not required. END: Denotes the end of a command file. Not required. COMMAND-FETCH: Uses HTTP to fetch a new command file. The new commands are stored in a file named todo.new in the current working directory SLEEP: Sleep for the named number of seconds SERVICE-THREAD: Run a service thread which can be connected to on the port specified by the command. If you then telnet to this port, you'll be told the currently running command, and the amount of memory in use by RemoteWorker HEARTBEAT-THREAD: Run a heartbeat thread which will regularly log that the RemoteWorker process is still running SMTP-PROBE: Probe a remote SMTP server on TCP port 25. This will perform a traceroute (and return traceroute results), and a SMTP probe if the remote server is not hosted on, or routed via, an AS which has asked to not receive probe traffic. DEST-AS-CHECK: Check if named IP is hosted by an AS which doesn't like probe traffic. TRACEROUTE: Perform a traceroute to the IP, including AS information MX-LOOKUP: Perform a DNS MX lookup for a named domain DNS-LOOKUP: Perform a forward DNS lookup DNS-REVERSE: Perform a reverse DNS lookup IPv6-DNS-LOOKUP: Perform an IPv6 DNS lookup will be truncated if it is too long HTTP-FETCH: Fetch the named URL, and log the HTML of the page. The HTML
A sample command file might look like this:
SMTP-PROBE 126.96.36.199 DNS-REVERSE www.stillhq.com HTTP-FETCH http://www.stillhq.com/research/http_test.txt
Note that RemoteWorker needs to be run as root for the traceroute implementation to work. Run that command file to see what happens!
There are a few other things to note:
You will need to install the gflags module from http://code.google.com/p/google-gflags/ before some of the helper scripts will work. The worker nodes do not however need gflags.
Command and control
My workers run a simple shell script which sits in a loop (sitter.sh). This script runs a series of command files. One of these command files uses the COMMAND-FETCH command to contact a CGI script and download more work. Your RemoteWorkers will need to do something similar, but there is nothing forcing you to deliver the work via HTTP. You could for example just write a simple script which calculates what work to perform in some manner, and then runs that. If you specify a file named "-" on the command line, then RemoteWorker will run in interactive mode and read its commands from stdin.
Not all PlanetLab nodes are created equal, and I therefore need a bootstrap script which checks the capabilities of a node. This script when writes out the command file which fetches new commands. The capability checks are in the command files named sample and sample-smtpprobes.
RemoteWorker doesn't return its results in real time, it just writes them to stdout. My sitter script then writes these to log files on disk, and I use a SSH based babysitter script to download and process them. This is useful because it means that the server handing out work doesn't need to be up when the commands are executed, and they just sit around on disk waiting for me to collect them later.
SMTP probes, traceroutes, and the need for AS maps
If you intend to use the SMTP probing or tracerouting functionality, then you need to have a recent mapping of IP ranges to AS numbers. I don't however include this map in this download because it is quite large and becomes out of date rapidly. You can create your own mapping trivially, by running these commands:
This will download an AS map from the U Washington Computer Science department, and then preprocess it into the correct format for the workers. However, you should only do the download once, and then push the processed file to the workers, as processing the file into the correct format is quite expensive.
RemoteWorker is intended to be installed on worker nodes with a minimal python installation without needing other things to be installed on the system itself. There are therefore several dependencies which are shipped with RemoteWorker, but weren't written by me. I take no credit for these dependencies, and it should be noted that they each have their own license and usage rules. Please check that you are obeying the licenses for these dependencies.
Dependencies currently shipped with RemoteWorker are:
So there you are.
Tags for this post: research remoteworker release
Related posts: RemoteWorker v74; Measuring the popularity of SMTP server implementations on the Internet; Announcing early results of my survey of SMTP servers; Methodology for my SMTP survey; Craigslist personal ad styled on the Yahoo reorg; The web is probably parkier than it seems
posted at: 14:03 | path: /research | permanent link to this entry