<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
<article>
<title>UDP shell scripts with inetd</title>

<sect1><title>About the author</title>
<para>
Michael has been working in the image processing field for several years, including a couple of years managing and developing large image databases for an Australian government department. He currently works for TOWER Software, who manufacture a world leading EDMS and Records Management package named TRIM. Michael is also the developer of Panda, an open source PDF generation API, as well as a bunch of other Open Source code.
</para>

<para>
Michael has a web site at http://www.stillhq.com.
</para>
</sect1>

<sect1><title>Introduction</title>
<para>
This conference paper is about two things. The main focus of the paper is to discuss how to write useful UDP servers in a common scripting language such as bash. The other, more minor, focus of the paper is to give a brief tutorial on the differences between disconnected and connected sockets.
</para>

<para>
In this paper I assume that you have a working knowledge of both bash scripting and C programming. If you don't then hopefully you'll still get something out of the paper, but you might want to trust me about what the code snippets actually do.
</para>
</sect1>




















<sect1><title>UDP</title>
<para>
It strikes me as logical to start with an extremely brief introduction to UDP. At some points it makes sense to compare this with the alternative, TCP.
</para>

<para>
Both UDP and TCP sit on top of the Internet protocol, which handles all the plumbing of actually getting the data out of the back of the client machine to the server. This is where the similarity ends. UDP is the <emphasis>User Datagram Protocol</emphasis>, and is unreliable. On the other hand, TCP is the <emphasis>Transmission Control Protocol</emphasis>, and is reliable.
</para>

<para>
What is reliability? Well, TCP will hold your hand and ensure for you that every packet you send is received by the other machine. It will also ensure that the packets arrive in the right order. There are also some games played with the choice of initial sequence number to make it harder for man in the middle attacks to be successful.
</para>

<para>
UDP is unreliable, which means it does none of this for you. It is the programmer's responsibility to ensure that all the packets sent arrived, and were in the right order.
</para>

<para>
So why would you ever use UDP? This reliability in TCP comes at a price. That price is performance. Before a TCP connection is established, the following protocol sequence occurs:
</para>

<execute><cmd>eqimg</cmd><args>tcp-handshake.png</args></execute>

<para>
So before the two machines can even communicate, they've spent a round trip, and processing time, just setting up the connection (the ACK can be sent at the same time as the first packet, as we don't have to wait for an ACK ACK to come back).
</para>

<para>
UDP, on the other hand, does none of this. A single packet is sent, and it either arrives or it doesn't. Normally the client application will note that a reply was never received after a given timeout, and retransmit the packet.
</para>

<para>
A good example of a common network protocol that uses UDP is the Domain Name System (DNS). UDP is well suited here because the packets are short (they fit in a single datagram) and we therefore don't have to worry about our packets arriving out of order. We also want DNS to be as fast as possible. 
</para>

<sidebar><title>DNS and UDP</title>
<para>
It should be noted that RFC 1034 <quote>Domain names &mdash; Concepts and Facilities</quote> does specify that TCP can also be used for DNS. In fact, because UDP packets are limited to 512 bytes, TCP has to be used for zone transfers. However, RFC 1035 <quote>Domain Names &mdash; Implementation and Specification</quote>does recommend that normal queries occur via UDP.
</para>
</sidebar>

<para>
See the further reading section at the end of the this paper for recommendations of places to read more about UDP, IP, and DNS.
</para>

<sect2><title>Disconnected sockets</title>
<para>
By default when a UDP socket is created, it is disconnected. What's a disconnected socket? The short answer is that it isn't associated with any given remote machine. This means that each time we fetch data from the socket we have to use the <emphasis>recvfrom</emphasis>(2) function call:
</para>

<programlisting>
int recvfrom(int s, void *buf, 
             size_t len, int flags, 
             struct sockaddr *from, 
             socklen_t *fromlen);
</programlisting>

<para>
The <emphasis>struct sockaddr</emphasis> argument in this function call is populated with enough information for the program to be able to determine where to send the reply packet. The response needs to be sent with the <emphasis>sendto</emphasis>(2) call:
</para>

<programlisting>
int sendto(int s, 
           const void *msg, 
           size_t len, 
           int flags, 
           const struct sockaddr *to, 
           socklen_t tolen);
</programlisting>

<para>
The <emphasis>sendto</emphasis> function call takes another <emphasis>struct sockaddr</emphasis> argument which specifies where the packet should be sent.
</para>

<para>
The code listing below is an example of how to use the <emphasis>recvfrom</emphasis> and <emphasis>sendto</emphasis> functions. This example is a simple UDP echo server. The program waits on a given port, and whenever it receives data, sends it straight back.
</para>

<execute><cmd>code2db</cmd><input>/home/opensource/shdns/sockets/disconnected.c</input></execute>

<para>
Let's have a look at this program running. As we can see from the source code, the program listens on UDP port 1234 (very imaginative). We can use <command>netcat</command> to test the program. First, we need to start the server in a different terminal, this is as simple as running it.
</para>

<sidebar><title>netcat rocks</title>
<para>
<command>netcat</command> rocks. Its a little application which lets you push data to arbitrary port numbers using both UDP and TCP. It can also be used to create very simple servers, because <command>netcat</command> can do all the listening for you. For more information, and download details, checkout http://www.atstake.com/research/tools/network_utilities/.
</para>
</sidebar>

<para>
A client interaction with the server looks like this:
</para>

<programlisting>
<execute><cmd>cat</cmd><input>netcat-client.txt</input></execute>
</programlisting>

<para>
Note that the <quote>punt!</quote> is me hitting control c on <command>netcat</command>. The server output for this session is:
</para>

<programlisting>
<execute><cmd>cat</cmd><input>server.txt</input></execute>
</programlisting>

<para>
There are a couple of things worth noting here. Firstly, the UDP server can handle multiple clients at once in the single process. This is because each packet comes in, and is then responded to &mdash; there is no assumption made that packets come from the same machine. This model works well for the single packet communication paradigm. Secondly, since the server simply waits for a packet, responds, and then starts waiting again, the server survives disconnects from the client, it simply waits until a new packet comes along.
</para>
</sect2>

<sect2><title>Disconnected sockets and scripting</title>
<para>
This paper is really about writing useful UDP servers using scripting languages &mdash; specifically <emphasis>bash</emphasis>. In this instance, the disconnected nature of these default UDP sockets causes great pain. This is because there is no trivial way in bash to call <emphasis>recvfrom</emphasis> and <emphasis>sendto</emphasis>. Shell scripts want to be able to use the standard <emphasis>read</emphasis> and <emphasis>write</emphasis> calls, because these languages are intended for file and pipe input output, as opposed to socket traffic.
</para>

<para>
We can demonstrate this by writing a very simple, and probably quite impractical <command>inetd</command> server.
</para>

<sidebar><title>What is inetd?</title>
<para>
<command>inetd</command>, and its fairly common <command>xinetd</command> alternative are super daemons. Their role in the networking food chain is as a way of running programs to process network traffic as demand requires. For example, many <command>rsync</command> servers operate from <command>inetd</command>. What this means is that <command>inetd</command> runs in the background as a daemon waiting for connections from clients to the <command>rsync</command> port. It then starts a new copy of <command>rsync</command> for each of these connections, and <command>rsync</command> processes the traffic (and perhaps responds).
</para>

<para>
Many developers write network servers which are intended to be run by <command>inetd</command> because it simplifies development of the application. <command>inetd</command> handles most of the plumbing for the application.
</para>

<para>
Please bear in mind that <command>inetd</command> and <command>xinetd</command> are very configurable, and this is only a general description. There is a brief discussion on how to configure <command>inetd</command> and <command>xinetd</command> later in this paper. You should refer to the relevant documentation for more information.
</para>
</sidebar>

<para>
This is a particularly useful example, because our shell scripts are eventually going to run from <command>inetd</command>, so having an understanding of how it works is opportune.
</para>

<execute><cmd>code2db</cmd><input>/home/opensource/shdns/sockets/disconnexec.c</input></execute>

<para>
All this program does is wait on port 1234 until a connection comes in, forks, and the new child process starts executing the server program (in this case <command>cat</command>). This technique relies on the fact that the child process will share the environment of the parent process, including its file handles. These are duplicated to stdin and stdout before the server program starts executing, so that network input goes into stdin, and server output is sent back over the network.
</para>

<para>
This of course doesn't work because the socket is disconnected, so <command>cat</command>'s <emphasis>read</emphasis> call will work, but its <emphasis>write</emphasis> call fails because the socket layer doesn't know where to send the response to.
</para>
</sect2>









<sect2><title>Connected sockets</title>
<para>
The answer to our need to use <emphasis>read</emphasis> and <emphasis>write</emphasis> for our shell scripts is a connected socket. A connected socket is simply a socket which has had the <emphasis>connect</emphasis>(2) function called upon it.
</para>

<para>
When the <emphasis>connect</emphasis> function is called, it simply records which machine the network packet came from, so that the socket layer knows where to send the reply packet when it is output with the <emphasis>write</emphasis> function call.
</para>

<para>
Here's a simple example of a connected socket version of the disconnected echo server above. Note that we now use <emphasis>read</emphasis> and <emphasis>write</emphasis> function calls for all the network input output.
</para>

<execute><cmd>code2db</cmd><input>/home/opensource/shdns/sockets/connected.c</input></execute>

<para>
We do cheat a little in this example, there is one <emphasis>recvfrom</emphasis> call. This is used to <quote>peek</quote> at the data which is waiting on the socket to determine the address to which we will connect. The <emphasis>MSG_PEEK</emphasis> means that the data is not actually removed from the queue of data to be processed by this call.
</para>

<para>
I won't include an example of what this program looks like when it runs, because it looks exactly the same as the disconnected echo server above.
</para>
</sect2>

<sect2><title>Back to our simple inetd server</title>
<para>
You'll recall that a couple of examples ago I showed you the code for a simple <command>inetd</command> server. This example below expands on that so it uses a connected socket. This example will behave just like the hard coded echo server in the first and third examples in this paper (except that the server debugging output doesn't happen any more).
</para>

<execute><cmd>code2db</cmd><input>/home/opensource/shdns/sockets/connexec.c</input></execute>
</sect2>

<sect2><title>inetd and xinetd configuration</title>
<para>
The two standard inetd implementations that most people use are called <command>inetd</command> and <command>xinetd</command>. <command>inetd</command> doesn't implement a connected socket for the server before starting it, which means that it is effectively impossible to write a standard shell script which processes UDP network traffic using a stock <command>inetd</command>. <command>xinetd</command> however does connect the socket before calling the script.
</para>

<para>
It seems logical to me to include a patch to make <command>inetd</command> behave in the manner that I would like it to. After all, if our goal is as noble as implementing a server in shell script, then surely the operating system should be modified to accommodate us?
</para>

<sect3><title>inetd patch</title>
<para>
The package which contains <command>inetd</command> as run on Debian is called <emphasis>netkit</emphasis>. The following is a patch to use connected sockets for UDP and TCP servers.
</para>

<execute><cmd>code2db</cmd><input>/home/opensource/shdns/netkit.patch</input></execute>
</sect3>

<sect3><title>Configuring inetd</title>
<para>
A simple example of how to configure <command>inetd</command> to handle our UDP echo server is shown below &mdash; the configuration file this line is added to is /etc/inetd.conf. It should be noted that <emphasis>myecho</emphasis> is the name of a service I had to add to /etc/services because <command>inetd</command> insists on all services being listed there...
</para>

<programlisting>
<execute><cmd>cat</cmd><input>inetd.txt</input></execute>
</programlisting>

<para>
This service has the following characteristics:
</para>

<itemizedlist>
<listitem><para>The service name is <emphasis>myecho</emphasis>. The listing in /etc/services is:</para>
<programlisting>
<execute><cmd>cat</cmd><input>services.txt</input></execute>
</programlisting>

<sidebar><title>Why not echo?</title>
<para>
Note that we can't use the name <emphasis>echo</emphasis> for our service for a couple of reasons: it is already defined in /etc/services; it is an internal service which <command>inetd</command> already offers using the normal port number; and we wanted to define a different port number for the service, without breaking things which might use the old /etc/services entry.
</para>
</sidebar>
</listitem>

<listitem><para>The socket type is <emphasis>dgram</emphasis>, which is short for datagram, the type of packet that the UDP protocol uses.</para></listitem>

<listitem><para>The protocol is UDP.</para></listitem>

<listitem><para><emphasis>nowait</emphasis> is used to indicate that one server should be started per packet, instead of waiting for the previous server to exit. This argument should always be nowait for TCP servers.</para></listitem>

<listitem><para>The server runs as root.</para></listitem>

<listitem><para>The server is located at /bin/cat</para></listitem>

<listitem><para>The server should be started with these arguments. Note that the name of the executable is always the first argument. Checkout <emphasis>execve</emphasis>(2) for more details.</para></listitem>
</itemizedlist>
</sect3>

<sect3><title>Configuring xinetd</title>
<para>
The same echo service configured with <command>xinetd</command> looks like this:
</para>

<programlisting>
<execute><cmd>cat</cmd><input>xinetd-config.txt</input></execute>
</programlisting>

<para>
Again, the service is a dgram using the UDP protocol. <command>xinetd</command> will wait for previous requests to be processed before moving onto the next request. The echo service also runs as root, using cat. Note the difference in the way the server arguments are configured.
</para>

<para>
You should also note that on Red Hat systems, services are often configured within their own configuration files, located in /etc/xinetd.d &mdash; this is done to make it easier to add new services during package installation.
</para>
</sect3>
</sect2>
</sect1>













<sect1><title>shdns</title>
<para>
This paper wouldn't be complete without an example of a UDP shell script to run from <command>inetd</command> or <command>xinetd</command>. Something non-trivial is required here, so I present for your enjoyment <command>shdns</command>, a DNS server implemented in bash shell script.
</para>

<sect2><title>The code</title>
<para>
The code for <command>shdns</command> is broken into two components. The first is a simple script which redirects the input packet to a file, so that <command>shdns</command> can move backwards and forwards within the input packet to grab the bits it needs as required.
</para>

<execute><cmd>code2db</cmd><input>/home/opensource/shdns/shdns-server</input></execute>

<para>
The other part of the code is the script which actually parses the input DNS UDP packet and responds. Note that this is merely a proof of concept at the moment, it doesn't attempt to implement the entire DNS protocol &mdash; there is only enough here to perform a name lookup.
</para>

<execute><cmd>code2db</cmd><input>/home/opensource/shdns/shdns</input></execute>

<para>
The <command>shdns</command> implementation strategy is summarized by the following diagram.
</para>

<execute><cmd>eqimg</cmd><args>shdns-design.png</args></execute>

<para>
You'll note that the reply to the DNS request is constructed in a temporary file and then sent back to the client. This is because <command>inetd</command> will send the output of each <emphasis>write</emphasis> as its own packet. <command>shdns</command> therefore needs to force all of the reply packet to be written at one time.
</para>
</sect2>

<sect2><title>Setting up shdns</title>
<para>
The only configuration required to run <command>shdns</command>, apart from configuring you <command>inetd</command> or <command>xinetd</command> daemon, is to put a file named lookup in the path pointed at by the variable <emphasis>execpath</emphasis> in the <command>shdns</command> source file.
</para>

<para>
Here's a sample lookup file:
</para>

<programlisting>
<execute><cmd>cat</cmd><input>lookup.txt</input></execute>
</programlisting>

<para>
You also need a <command>usleep</command> on your path. I couldn't find a Debian package containing this, so I wrote my own. If you need to install a <command>usleep</command>, then just compile usleep.c with a <emphasis>gcc -o usleep usleep.c</emphasis>, and copy it somewhere on your path.
</para>
</sect2>
</sect1>

<sect1><title>Further reading</title>
<para>
The following is a list of reading I found useful whilst working on this paper. If you're interested in this topic area and want to learn more, then starting with this list might be helpful.
</para>

<itemizedlist>
<listitem><para>A list of DNS related Internet RFCs -- http://www.faqs.org/rfcs/dns-rfcs.html</para></listitem>
<listitem><para>An excellent introduction to DNS, as well as many other protocols -- TCP/IP Illustrated, Volume One: The protocols, by W Richard Stevens, published by Addison-Wesley</para></listitem>
<listitem><para>The Freshmeat page for NetKit -- http://freshmeat.net/projects/netkit/?topic_id=150</para></listitem>
<listitem><para>The xinetd home page -- http://www.xinetd.org/</para></listitem>
<listitem><para>The source code for shdns -- http://www.stillhq.com/extracted/shdns/</para></listitem>
</itemizedlist>

</article>
