
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
<article>
<title>trivsql: A trivial SQL engine for your application</title>

<sect1><title>Introduction</title>
<para>
<command>trivsql</command> is a simple SQL engine, implemented on the <command>tdb</command> database engine. The intention of <command>trivsql</command> is that it is embedded into your application, allowing the use of familiar SQL constructs for accessing data stored within your application.
</para>

<para>
<command>trivsql</command> is licensed under the terms of the GNU GPL, as is the <command>tdb</command> engine upon which it depends.
</para>
</sect1>

<sect1><title>About the author</title>
<para>
Michael has been working in the image processing field for several years, including a couple of years managing and developing large image databases for an Australian government department. He currently works for TOWER Software, who manufacture a world leading EDMS and Records Management package named TRIM. Michael is also the developer of Panda, an open source PDF generation API, as well as being the maintainer of the comp.text.pdf USENET frequently asked questions document.
</para>

<para>
Michael has a website at http://www.stillhq.com.
</para>
</sect1>

<sect1>
<title>An introduction to tdb</title>
<para>
<command>tdb</command> is a (name, value) pair database similar to <command>gdbm</command>. It was originally implemented for the Samba project, as a way of avoiding having to implement a large number of in memory linked lists for storing usage and access information. The code for <command>tdb</command> is available from the Samba CVS server.
</para>

<para>
To quote from the Samba source code: <quote>This is a simple database API. 
It was inspired by the realisation that
in Samba we have several ad-hoc bits of code that essentially
implement small databases for sharing structures between parts of
Samba. As I was about to add another I realised that a generic
database module was called for to replace all the ad-hoc bits.

I based the interface on gdbm. I couldn't use gdbm as we need to be
able to have multiple writers to the databases at one time.</quote>
</para>

<para>
To make use of a <command>tdb</command> database at the most trivial level, you open the database with a <command>tdb_open</command> function call, you can then add pairs to the database using the <command>tdb_store</command> function, and retrieve pairs from the database with the <command>tdb_fetch</command> function. The database is closed when you are finished with the via the <command>tdb_close</command> function.

<footnote>
<para>
There are also functions which allow you to do slightly more interesting operations, such as enumerating all of the keys defined in the database.
</para>
</footnote>

</para>
</sect1>

<sect1><title>Why trivsql?</title>
<para>
The functionality offered by <command>tdb</command> is satisfactory for simple databases where all you want to store is small nuggets of data. For instance, when you want to store that the user Daniel is logged on, then a piece of code such as:
</para>

<programlisting>
  TDB_CONTEXT db;
  TDB_DATA dbkey, dbdata;

  // Open the database
  document->db = tdb_open ("foo.tdb", 0,
                           TDB_CLEAR_IF_FIRST, O_RDWR | O_CREAT | O_TRUNC,
                           0600);

  // Store the value
  dbkey.dptr = "Daniel";
  dbkey.dsize = strlen (key) + 1;
  dbdata.dptr = "LoggedOn";
  dbdata.dsize = strlen (value) + 1;

  if (tdb_store (db, dbkey, dbdata, TDB_REPLACE) != 0)
    {
      fprintf(stderr, "Could not store value");
    }
</programlisting>

<para>
And to retrieve the value:
</para>

<programlisting>
  TDB_CONTEXT db;
  TDB_DATA dbkey, dbdata;

  // Open the database
  document->db = tdb_open ("foo.tdb", 0,
                           TDB_CLEAR_IF_FIRST, O_RDWR | O_CREAT | O_TRUNC,
                           0600);

  // Fetch the value
  dbkey.dptr = "Daniel";
  dbkey.dsize = strlen (key) + 1;

  dbdata = tdb_fetch (document->db, dbkey);
</programlisting>

<para>
This doesn't work so well when the data you want to store is more complex. For example, let's imagine that we want to store some more information about Daniel -- for instance how many bytes he has transfered this session and a list of his favourite users. I can think of four ways that we can store this information in a <command>tdb</command> database:

<orderedlist>
<listitem><para>We can define a structure that contains these elements, and then store the pointer to this structure in the <command>tdb</command>. The problem with this is that it doesn't persist across restarts of the program, because the pointer will no longer be valid.</para></listitem>

<listitem><para>We can use a series of interestingly named keys, each storing a piece of information. For example:
     <orderedlist>
     <listitem><para>daniel-bytestransfered</para></listitem>
     <listitem><para>daniel-friends</para></listitem>
     <listitem><para>et cetera</para></listitem>
     </orderedlist>

The problem with this is that it produces horribly verbose code, because the code to read a single key is repeated all over the place.
</para></listitem>

<listitem><para>We can push all the information into one string, and then store that in the <command>tdb</command>, but this means that we have to handle parsing the information again when you need it back. Since writing <command>trivsql</command> the Samba team has dealt with exactly this problem in this manner. To ease the implementation, Dr Andrew Tridgell has written <command>genstruct</command>, which takes c structures (including pointers) and serializes them into strings. <command>genstruct</command> is available from http://www.samba.org.</para></listitem>

<listitem><para>You can use <command>trivsql</command>, which looks after all the ugliness for you, and allows you to perform select and insert statements in a manner familiar to many programmers.</para></listitem>
</orderedlist>
</para>
</sect1>

<sect1><title>Why SQL?</title>
<para>
SQL was selected as a logical layer over <command>tdb</command> because there are many developers in the world which are already familiar with it. It is also relatively simple to parse and process the commands.
</para>

<sect2><title>Parsing SQL</title>
<para>
The SQL parser for <command>trivsql</command> is implemented in <command>flex</command> and <command>bison</command>, the listings for which are included at the end of this article. These files define a simple grammar for the supported SQL commands.
</para>

<para>
The following SQL commands are currently supported:
</para>

<itemizedlist>
<listitem><para><command>CREATE</command>: used to create a table, for example <emphasis>CREATE TABLE newtable (cola, colb, colc);</emphasis>. Inside the brackets is a comma separated list of column names. Note that <command>trivsql</command> doesn't currently support datatypes for the columns of tables, so it is up to the user to make sure that they parse the string values in each column correctly -- this will change in a later version of <command>trivsql</command>.</para></listitem>

<listitem><para><command>INSERT</command>: for example <emphasis>INSERT INTO newtable (cola, colc) VALUES ('123', 'banana');</emphasis>. Here we are inserting the given values into the table <emphasis>newtable</emphasis> as specified in the column list. <command>trivsql</command> supports NULL values, so the list of columns here does not need to include every column in the table.</para></listitem>

<listitem><para><command>SELECT</command>: this is the most compilicated of the SQL commands supported by <command>trivsql</command>. An example is <emphasis>SELECT * FROM newtable WHERE cola = '123';</emphasis>. You are not required to have a conditional applied to the select statement -- so <emphasis>SELECT * FROM newtable</emphasis> will return all the rows in the table. If the select has returned a column value which has a NULL value (i.e. data has never been inserted there), then you'll recieve a (char *) NULL. The available conditions are discussed below.</para></listitem>

<listitem><para><command>ALTER</command>: <emphasis>ALTER newtable ADD COLUMN cold;</emphasis> allows you to add columns to a table. At the moment you can only add columns to a table with the <emphasis>ALTER</emphasis> command.</para></listitem>

<listitem><para><command>UPDATE</command>: allows commands of the form <emphasis>UPDATE newtable SET two = 'g' WHERE two = 'f';</emphasis>. This is useful for changing existing values in a row without having to remove the entire row and then readd it.</para></listitem>
</itemizedlist>
</sect2>

<sect2><title>Available conditional statements</title>
<para>
The following selectors are currently available:
</para>

<itemizedlist>
<listitem><para><command>Equality</command>: for instance 'foo' = 7</para></listitem>
<listitem><para><command>Like</command>: although the % syntax is not implemented</para></listitem>
<listitem><para><command>And / or</command>: so that you can have more than one criteria (including using brackets to apply precidence)</para></listitem>
</itemizedlist>
</sect2>

<sect2><title>Execution strategy for conditional statements</title>
<para>
The hardest bit about implementing conditional statements is that they can be non-trivial in form. For instance, it is important that the database be able to handle queries with conditions such as:
</para>

<programlisting>
SELECT * FROM foobar WHERE ((cola = "tree" OR cola = "bush") AND colb = "dead");
</programlisting>

<para>
<command>trivsql</command> handles this scenario by first parsing the list of conditions, and then building a tree of the conditions that must be evaluated. This tree is then traversed for each row in the table, and those which match the tree are appended to the recordset to be returned. The code which implements this is at the end of this article -- in the file named internal.c.
</para>
</sect2>
</sect1>

<sect1><title>The current state of play</title>
<para>
As shown by the list of SQL commands supported, <command>trivsql</command> is by no means a complete SQL implementation at this time. It is however quite useful, and stable. Because it is open source, it is also possible for users to add support for their own SQL commands if they are needed.
</para>

<para>
<command>trivsql</command> is still being actively developed -- future releases will include greater functionality, so users should keep track of what has recently been released by occassionally visiting http://www.stillhq.com
</para>
</sect1>

<sect1><title>Interface to trivsql</title>
<para>
The interface to <command>trivsql</command> has been modelled on the behaviour of the ADO interfaces under win32. This is again on the basis that this is probably the behaviour that most developers out there are familiar with already.


TODO
</para>
</sect1>









<sect1><title>An example usage</title>
<para>
<command>trivsql</command> is really easy to use. The code snippet below is from the <command>trivsql</command> distribution. It simply reads commands from standard input, and executes them with <command>trivsql</command>. If the command returned a recordset (which is how all data is returned to the caller), then the recordset is displayed.
</para>

<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/example.c</input></execute>

<sect2><title>Input</title>
<para>
The following input:
</para>

<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/example.sql</input></execute>

<para>
Produces the following output:
</para>

<execute><cmd>~mikal/opensource/trivsql/sample</cmd><args>article.tdb</args><input>~mikal/opensource/trivsql/example.sql</input></execute>
</sect2>
</sect1>







<sect1><title>Genstruct as an alternative</title>
<para>
Given that the Samba team chose to use <command>genstruct</command> to solve this same problem, it is worthwhile to briefly discuss their solution.
</para>
</sect1>

<execute><cmd>builddb</cmd><input>../short-genstruct/article.sgml</input></execute>

<sect1><title>Genstruct pros and cons</title>
<para>
Advantages of the Samba solution to this problem are:
</para>

<orderedlist>
<listitem><para>Fewer database accesses</para></listitem>
<listitem><para>MORE?</para></listitem>
</orderedlist>

<para>
However, the disadvantages of this solution are:
</para>

<orderedlist>
<listitem><para>Requiring pre processing of source code with a perl script</para></listitem>
<listitem><para>The computational effort expended in data serialization</para></listitem>
<listitem><para>MORE?</para></listitem>
</orderedlist>
</sect1>


<sect1><title>Conclusion</title>
<para>
This paper has described <command>trivsql</command>, a SQL engine which can be embedded into your code. This code is still a work in progress, and updates on it's development can be found at http://www.stillhq.com
</para>
</sect1>






<sect1><title>Code listings</title>
<para>
All the code included here is available for download from http://www.stillhq.com.
</para>

<sect2><title>SQL lexer</title>
<para>
The lexer for my SQL implementation is:
</para>

<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/lexer.l</input></execute>

<para>
I am not completely happy with this implementation. My main objection is the way I have implemented case insensitivity, although I am not aware of a better method.
</para>
</sect2>

<sect2><title>SQL grammar</title>
<para>
The grammar for my SQL implementation is:
</para>

<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/parser.y</input></execute>
</sect2>

<sect2><title>Automake Makefile.am</title>
<para>
The following file is used to automatically generate a Makefile, in conjunction with <command>autoconf</command>.
</para>

<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/Makefile.am</input></execute>
</sect2>


<sect2><title>Autoconf configure.in</title>
<para>
The following file is used to automatically generate a configure script, which will build a Makefile appropriate to the build host.
</para>

<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/configure.in</input></execute>
</sect2>


<sect2><title>C source code</title>
<para>
The following files implement <command>trivsql</command> and <command>tdb</command>...
</para>

<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/trivsql.h</input></execute>
<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/interface.c</input></execute>
<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/internal.c</input></execute>
<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/selectors.c</input></execute>
<execute><cmd>code2db</cmd><input>~mikal/opensource/trivsql/example.c</input></execute>

<para>
I haven't included the TDB code, as it is not strictly part of <command>trivsql</command>, and may be downloaded from http://www.samba.org
</para>
</sect2>
</sect1>
</article>
