<sect1><title>Interpolation</title>

<sect2><title>Introduction</title>
  <para>One problem with GPS datasets is that there are often gaps in the data, which
    may skew data results and prevent transformation into the frequency
    domain. Another problem is that the time scales in GPS data sets
    are often non-linear.  Interpolation solves these problems by filling the gaps in the data.
    A large number of interpolation methods exists, all having their advantages and disadvantages for
    different applications. As a result, the <command>GDMS</command> offers six different interpolation methods:-
  </para>

  <orderedlist>
  <listitem><para>Nearest Neighbour</para></listitem>
  <listitem><para>Linear</para></listitem>
  <listitem><para>Cubic Spline</para></listitem>
  <listitem><para>Natural Spline</para></listitem>
  <listitem><para>Newton Divided Difference</para></listitem>
  <listitem><para>Least Squares</para></listitem>
  </orderedlist>
</sect2> <!--Introduction-->

<sect2><title>Theory</title>
  <para>
  Theoretically, each of the datasets that we analyse are sampled once a day at
  exactly 12:00. This is, however, not always possible for reasons such as
  such as equipment failure, extreme weather and in some case local politics.
  The Fourier transform algorithms available to us require that the dataset be regular.
  This means that if a dataset is missing a point or has an irregular sample rate then
  it can not be transferred to the frequency domain.  The dataset therefore, must be made regular
  either by removing points or interpolating new points.
  </para>
  <para>
  This irregular sampling rate has been compounded by having the datasets stored on a
  non-linear time scale.  The data that we have been provided has been sampled 
  once a day at noon as timed by an atomic clock, thereby allowing an accurate sample rate.
  The sample date/time, however, has then been stored in decimal year format to eight significant
  figures, and this creates a problem.  During a regular
  year one day is 0.00273972, except one in every four years is a leap year and the
  length of a day then becomes 0.00273224.  Also at the start of each year a
  rounding occurs so that the first reading of the year occurs at XXXX.0014.  This
  rounding creates the illusion of a different time distance between the last
  day of one year and the first day of the next. Due to the nature of this problem, it is necessary to
  transform the data scale to be independent of calender years before carrying out interpolation.
  </para>
  <para>
  The new time scale that we convert to is what is known as a truncated Julian day.
  The Julian day system uses an integer day count since the first of January 4714 BC.
  This, however, produces extremely large numbers which could potentially cause
  loss of accuracy due to machine limitations.  To prevent this we have used the first
  of January 1901 as our start date.  The existing decimal dates then converted to
  truncated Julian day and rounded to the nearest whole number.  Once the timescale
  is linear, interpolation can be carried out on the data without fear of loss of data or returning inaccurately
  interpolated points.
  </para>
  <para>
  Different interpolation methods have different strengths and weaknesses.  On an
  arbitrary dataset it is impossible to tell which interpolation method will be
  the most accurate, and as such six different interpolation methods have been
  provided. Each will be outlined below, including details on its operation, strengths and weaknesses.
  </para>

  <sect3><title>Nearest Neighbour Interpolation</title>
  <para>
    Nearest neighbour interpolation is the simplest method provided. This method
    sets the value of any new point to the value of the nearest point on the
    original dataset.  The advantage of this is that any added points will be of
    the same approximate value as nearby points. There are a number of disadvantages
    with nearest neighbour approximation.  Firstly, any new points will not follow
    any linear or frequency trends in the data, which could lead to inaccuracy of
    models both in the time and frequency domains. In addition, if new points lie close to
    outliers then the points generated can be very inaccurate.
  </para>
  </sect3>

  <sect3><title>Linear Interpolation</title>
  <para>
    Linear interpolation is widely used as it is simple, yet
    generally produces better results than nearest neighbour.  In this interpolation method,
    each new point is placed on a line between the two adjacent points.
    The Equation for adding a new point is:
  </para>
  <equation>
    <title>Linear Interpolation Between Points</title>
    <execute><cmd>eqimg</cmd><args>interp_linear.bmp</args></execute>
  </equation>

  <para>
    Linear interpolation has several advantages.  Firstly, each added point has a
    value that is approximately the same as nearby value thus obtaining values significantly out of range
    are very rare. Secondly, the added point will follow any local
    linear trends between the two points. The disadvantages of linear interpolation stem from the interpolation method only working
    with the two adjacent points. If the line between these two points does not follow
    the overall trend of the data then loss of accuracy occurs.
  </para>
  </sect3>

   <sect3><title>Cubic Spline Interpolation</title>
  <para>
    The <command>GDMS</command> includes one of the many implementations of cubic spline interpolation.
    This specific implementation, has no particular name and is given the generic title of cubic spline
    interpolation. The only
    difference between this method and a natural spline is the choice of
    second derivatives used at the end points. With this implementation the values of the
    second derivative for the end points are set the same as the second derivatives of the
    second to end points.
  </para>
  <para>
    To produce a spine for a dataset of <command>n+1</command> points you have to produce <command>n</command> separate
    cubics.  Each of these cubic should have each end match up exactly with the
    points to either side.  And on the same point the two adjacent cubic should
    have shared first and second derivatives. Each of these cubics can be described by the equation:
  </para>
  <equation>
    <title>Equation of a cubic (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_cubic.bmp</args></execute>
  </equation>
  <para>
    Because each of these cubics begins at a any existing point we can say:
    <command>d<subscript>i</subscript> = y<subscript>i</subscript></command>
  </para>
  <para>
    where <command>h<subscript>i</subscript>=(x<subscript>i+1</subscript> -
    x<subscript>i</subscript>)</command>, is the width of the the interval.
  </para>
  <para>
    Also if we make <command>S</command> the set of second derivatives; then through algebraic simplification
    we can get:
  </para>
  <equation>
    <title>Elements of a cubic (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_spline_abc.bmp</args></execute>
  </equation>
  <para>
    With cubic splines, solving for <command>S</command> is achieved by solving:
  </para>
  <equation>
    <title>Solving for S (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_S_cubic.bmp</args></execute>
  </equation>
  <para>
    The two end values, <command>S<subscript>0</subscript></command> and <command>S<subscript>n</subscript></command>, are then
    set to equal <command>S<subscript>1</subscript></command> and <command>S<subscript>n</subscript></command> respectively.
  </para>
  <para>
    A full mathematical derivation can be found in Gerald &amp; Wheatley (1999), page 238.
  </para>
  <para>
   Splines can provide extremely accurate results when the
    original sample rate is notable greater than the frequency of fluctuation in the data.
    However if the sample rate is less than half the frequency of change (including signal
    noise) in the data then the results can be erratic.  Splines also have a problem
    when it comes to large gaps in the dataset. Because the gap between two points is
    represented by a cubic, large gaps result in peaks or troughs in the dataset.  The
    final problem with spline is that the end sections are inaccurate due to arbitrary
    methods used to assign the derivative of the end points. In the case of cubic splines, the end points
    tend to be too curved and it is for this reason
    that two cubic spline methods have been provided for this application.
  </para>
  </sect3>


  <sect3><title>Natural Spline Interpolation</title>
  <para>
    Natural splines are a type of cubic spline, originating in ancient  times, that is before the birth of Christ, when 
    drafts-men and builders
    would create a smooth curve by pegging a flexible piece of wood between a number of points.
    As with all cubic spline implementations, the interpolation is achieved by inserting a cubic between
    each two adjacent points. Equalising the derivatives has the effect of making the resulting
    interpolation appear smooth and visually pleasing.  For this reason splines
    are often used in graphics.
  </para>
  <para>
    In the mathematical process of building splines it is impossible to infer the
    derivatives of the two end points of a dataset.  It is the values assigned to
    these end points that determine the type of spine.  In a natural spline the
    end points are set to 0.  This gives a result similar to the ancient wooden
    splines used by builders.  Mathematically this produces a result where the
    end segments can be too straight.
  </para>
  <para>
    To produce a spine for a dataset of <command>n+1</command> points you have to produce n separate
    cubics.  Each of these cubic should have each end match up exactly with the
    points to either side.  And on the same point the two adjacent cubic should
    have shared first and second derivatives.
  </para>
  <para>
    Each of these cubics can be described by the equation:
  </para>
  <equation>
    <title>Equation of a cubic (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_cubic.bmp</args></execute>
  </equation>
  <para>
    Because each of these cubics begins at a point we can say:
    <command>d<subscript>i</subscript> = y<subscript>i</subscript></command>
  </para>
  <para>
    Where <command>
            h<subscript>i</subscript>=(x<subscript>i+1</subscript> - x<subscript>i</subscript>)
	  </command>,
    is the width of the the interval.
  </para>
  <para>
    Also if we make <command>S</command> the set of second derivatives; then through algebraic simplification
    we can get:
  </para>
  <equation>
    <title>Elements of a cubic (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_spline_abc.bmp</args></execute>
  </equation>
  <para>
    With natural splines solving for <command>S</command> is done by solving:
  </para>
  <equation>
    <title>Solving for S (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_S_natural.bmp</args></execute>
  </equation>
  <para>
    The two end values, <command>S<subscript>0</subscript></command> and <command>S<subscript>n</subscript></command>, are then
    set to zero.
  </para>
  <para>
    A full mathematical derivation can be found in Gerald &amp; Wheatley (1999), page 238.
  </para>
  </sect3>

  <sect3><title>Newton Divided Difference Interpolation</title>
  <para>
    Mathematicians have long know that a nth order difference table can be used to accurately
    recreate a polynomial of degree n.  Newton divided differences uses a similar method to
    approximate new points within an tributary dataset.
  </para>
  <para>
    The Newton divided differences method is based on building a table of divided differences.  The
    divided difference between two points in a dataset is defined as:
  </para>
  <equation>
    <title>Standard divided difference (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_DD1.bmp</args></execute>
  </equation>
  <para>
    Likewise higher order divided differences can be defined as:
  </para>
  <equation>
    <title>Divided difference of differences (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_DD2.bmp</args></execute>
  </equation>
  <para>
    These divided differences are built into a table as follows:
  </para>
  <equation>
    <title>Divided difference table (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_DD3.bmp</args></execute>
  </equation>
  <para>
    This table can then be used to generate an equation:
  </para>
    <equation>
    <title>Newton divided differences (Gerald and Wheatley, 1999)</title>
    <execute><cmd>eqimg</cmd><args>interp_DD4.bmp</args></execute>
  </equation>
  <para>
    A full mathematical derivation can be found in Gerald &amp; Wheatley (1999), page 229.
  </para>
  <para>
    Because divided differences estimates a curve by using polynomial, using a polynomial
    of too low or high a degree can result in errors. The accuracy available by using the
    double precision floating point format also creates a limit.
  </para>
  <para>
    To detect and limit the errors caused by these factors the error estimates can be calculated.
    As the divided difference table grows it calculates the rough error for
    a table of size one less.  If the error estimate has shrunk then the program continues
    building the table to a higher order.  If the error has grown then the program stops
    building the table.
  </para>
  <para>
    The error for a divided difference of a given order is approximately equal to the change
    that will be incurred by increasing the order of the divided difference table by one.
    So for an order n newton divided difference interpolation the mathematical definition
    of the error is:
  </para>
  <para>
    A full mathematical derivation can be found in Gerald &amp; Wheatley (1999), page 229.
  </para>
  </sect3>
</sect2> <!--Theory-->

<sect2><title>Conclusion</title>
  <para>
    Interpolation methods provide a useful toolkit that can be used to remove defects
    within the dataset.  They can be used to fill holes in the time
    series, allowing the general trend of these missing points to be seen as well Fourier transforms to be 
    applied to the given data system. No one interpolation method, however, is guaranteed to correctly estimate the missing data 
    points for every data set, due to the nature of the individual characteristics of the gaps. It is for this reason that six different
    interpolation methods have been provided.  
  </para>
</sect2> <!--Conclusion-->


</sect1> <!--Interpolation-->

