trivsql: A trivial SQL engine for your application Introduction trivsql is a simple SQL engine, implemented on the tdb database engine. The intention of trivsql is that it is embedded into your application, allowing the use of familiar SQL constructs for accessing data stored within your application. trivsql is licensed under the terms of the GNU GPL, as is the tdb engine upon which it depends. About the author Michael has been working in the image processing field for several years, including a couple of years managing and developing large image databases for an Australian government department. He currently works for TOWER Software, who manufacture a world leading EDMS and Records Management package named TRIM. Michael is also the developer of Panda, an open source PDF generation API, as well as being the maintainer of the comp.text.pdf USENET frequently asked questions document. Michael has a website at http://www.stillhq.com. An introduction to tdb tdb is a (name, value) pair database similar to gdbm. It was originally implemented for the Samba project, as a way of avoiding having to implement a large number of in memory linked lists for storing usage and access information. The code for tdb is available from the Samba CVS server. To quote from the Samba source code: This is a simple database API. It was inspired by the realisation that in Samba we have several ad-hoc bits of code that essentially implement small databases for sharing structures between parts of Samba. As I was about to add another I realised that a generic database module was called for to replace all the ad-hoc bits. I based the interface on gdbm. I couldn't use gdbm as we need to be able to have multiple writers to the databases at one time. To make use of a tdb database at the most trivial level, you open the database with a tdb_open function call, you can then add pairs to the database using the tdb_store function, and retrieve pairs from the database with the tdb_fetch function. The database is closed when you are finished with the via the tdb_close function. There are also functions which allow you to do slightly more interesting operations, such as enumerating all of the keys defined in the database. Why trivsql? The functionality offered by tdb is satisfactory for simple databases where all you want to store is small nuggets of data. For instance, when you want to store that the user Daniel is logged on, then a piece of code such as: TDB_CONTEXT db; TDB_DATA dbkey, dbdata; // Open the database document->db = tdb_open ("foo.tdb", 0, TDB_CLEAR_IF_FIRST, O_RDWR | O_CREAT | O_TRUNC, 0600); // Store the value dbkey.dptr = "Daniel"; dbkey.dsize = strlen (key) + 1; dbdata.dptr = "LoggedOn"; dbdata.dsize = strlen (value) + 1; if (tdb_store (db, dbkey, dbdata, TDB_REPLACE) != 0) { fprintf(stderr, "Could not store value"); } And to retrieve the value: TDB_CONTEXT db; TDB_DATA dbkey, dbdata; // Open the database document->db = tdb_open ("foo.tdb", 0, TDB_CLEAR_IF_FIRST, O_RDWR | O_CREAT | O_TRUNC, 0600); // Fetch the value dbkey.dptr = "Daniel"; dbkey.dsize = strlen (key) + 1; dbdata = tdb_fetch (document->db, dbkey); This doesn't work so well when the data you want to store is more complex. For example, let's imagine that we want to store some more information about Daniel -- for instance how many bytes he has transfered this session and a list of his favourite users. I can think of four ways that we can store this information in a tdb database: We can define a structure that contains these elements, and then store the pointer to this structure in the tdb. The problem with this is that it doesn't persist across restarts of the program, because the pointer will no longer be valid. We can use a series of interestingly named keys, each storing a piece of information. For example: daniel-bytestransfered daniel-friends et cetera The problem with this is that it produces horribly verbose code, because the code to read a single key is repeated all over the place. We can push all the information into one string, and then store that in the tdb, but this means that we have to handle parsing the information again when you need it back. Since writing trivsql the Samba team has dealt with exactly this problem in this manner. To ease the implementation, Dr Andrew Tridgell has written genstruct, which takes c structures (including pointers) and serializes them into strings. genstruct is available from http://www.samba.org. You can use trivsql, which looks after all the ugliness for you, and allows you to perform select and insert statements in a manner familiar to many programmers. Why SQL? SQL was selected as a logical layer over tdb because there are many developers in the world which are already familiar with it. It is also relatively simple to parse and process the commands. Parsing SQL The SQL parser for trivsql is implemented in flex and bison, the listings for which are included at the end of this article. These files define a simple grammar for the supported SQL commands. The following SQL commands are currently supported: CREATE: used to create a table, for example CREATE TABLE newtable (cola, colb, colc);. Inside the brackets is a comma separated list of column names. Note that trivsql doesn't currently support datatypes for the columns of tables, so it is up to the user to make sure that they parse the string values in each column correctly -- this will change in a later version of trivsql. INSERT: for example INSERT INTO newtable (cola, colc) VALUES ('123', 'banana');. Here we are inserting the given values into the table newtable as specified in the column list. trivsql supports NULL values, so the list of columns here does not need to include every column in the table. SELECT: this is the most compilicated of the SQL commands supported by trivsql. An example is SELECT * FROM newtable WHERE cola = '123';. You are not required to have a conditional applied to the select statement -- so SELECT * FROM newtable will return all the rows in the table. If the select has returned a column value which has a NULL value (i.e. data has never been inserted there), then you'll recieve a (char *) NULL. The available conditions are discussed below. ALTER: ALTER newtable ADD COLUMN cold; allows you to add columns to a table. At the moment you can only add columns to a table with the ALTER command. UPDATE: allows commands of the form UPDATE newtable SET two = 'g' WHERE two = 'f';. This is useful for changing existing values in a row without having to remove the entire row and then readd it. Available conditional statements The following selectors are currently available: Equality: for instance 'foo' = 7 Like: although the % syntax is not implemented And / or: so that you can have more than one criteria (including using brackets to apply precidence) Execution strategy for conditional statements The hardest bit about implementing conditional statements is that they can be non-trivial in form. For instance, it is important that the database be able to handle queries with conditions such as: SELECT * FROM foobar WHERE ((cola = "tree" OR cola = "bush") AND colb = "dead"); trivsql handles this scenario by first parsing the list of conditions, and then building a tree of the conditions that must be evaluated. This tree is then traversed for each row in the table, and those which match the tree are appended to the recordset to be returned. The code which implements this is at the end of this article -- in the file named internal.c. The current state of play As shown by the list of SQL commands supported, trivsql is by no means a complete SQL implementation at this time. It is however quite useful, and stable. Because it is open source, it is also possible for users to add support for their own SQL commands if they are needed. trivsql is still being actively developed -- future releases will include greater functionality, so users should keep track of what has recently been released by occassionally visiting http://www.stillhq.com Interface to trivsql The interface to trivsql has been modelled on the behaviour of the ADO interfaces under win32. This is again on the basis that this is probably the behaviour that most developers out there are familiar with already. TODO An example usage trivsql is really easy to use. The code snippet below is from the trivsql distribution. It simply reads commands from standard input, and executes them with trivsql. If the command returned a recordset (which is how all data is returned to the caller), then the recordset is displayed. code2db~mikal/opensource/trivsql/example.c Input The following input: code2db~mikal/opensource/trivsql/example.sql Produces the following output: ~mikal/opensource/trivsql/samplearticle.tdb~mikal/opensource/trivsql/example.sql Genstruct as an alternative Given that the Samba team chose to use genstruct to solve this same problem, it is worthwhile to briefly discuss their solution. builddb../short-genstruct/article.sgml Genstruct pros and cons Advantages of the Samba solution to this problem are: Fewer database accesses MORE? However, the disadvantages of this solution are: Requiring pre processing of source code with a perl script The computational effort expended in data serialization MORE? Conclusion This paper has described trivsql, a SQL engine which can be embedded into your code. This code is still a work in progress, and updates on it's development can be found at http://www.stillhq.com Code listings All the code included here is available for download from http://www.stillhq.com. SQL lexer The lexer for my SQL implementation is: code2db~mikal/opensource/trivsql/lexer.l I am not completely happy with this implementation. My main objection is the way I have implemented case insensitivity, although I am not aware of a better method. SQL grammar The grammar for my SQL implementation is: code2db~mikal/opensource/trivsql/parser.y Automake Makefile.am The following file is used to automatically generate a Makefile, in conjunction with autoconf. code2db~mikal/opensource/trivsql/Makefile.am Autoconf configure.in The following file is used to automatically generate a configure script, which will build a Makefile appropriate to the build host. code2db~mikal/opensource/trivsql/configure.in C source code The following files implement trivsql and tdb... code2db~mikal/opensource/trivsql/trivsql.h code2db~mikal/opensource/trivsql/interface.c code2db~mikal/opensource/trivsql/internal.c code2db~mikal/opensource/trivsql/selectors.c code2db~mikal/opensource/trivsql/example.c I haven't included the TDB code, as it is not strictly part of trivsql, and may be downloaded from http://www.samba.org