<sect1><title>Time Domain Analysis</title>

<sect2><title>Introduction</title>
  <para>
   Analysis of discrete signals in the time domain is essential to the understanding of geodetic data produced
  by GPS, VLBI and SLR. By applying a linear least squares regression model to the system underlying data trends can be ascertained,
  such as, the average rate of continental drift. Linear regression analysis also allows such phenomenon as Random 
  Walk and white noise, both of which which are known to affect each type of geodetic data to differing degrees,
  to be detected and removed. This, in turn, enables the geodetic community to gain a greater understanding of the elements that may 
  affect the accuracy of a given data set. In addition, an analysis of residuals resulting from a linear regression not only 
  give a measure of how accurately the model approximates the data set it can also aid in the detection and deletion of 
  erroneous points.        
  </para>
</sect2> <!--Introduction-->


<sect2><title>Theory</title>
  <sect3><title>Linear Least Squares Regression Modelling</title>
    <para>
    One of the major problems in discrete signal analysis is that these systems are almost always inconsistent, that is, 
    there is no exact solution to the equation of the line 
    
    <equation>
      <title>Equation of the Line</title>
      <execute><cmd>eqimg</cmd><args>time-line.bmp</args></execute>
    </equation>

    
    where <command>m</command> is the slope and <command>c</command> is the y-intercept. This is usually due to the fact that there
    are more data points in the system than those required to solve this equation. Linear least squares regression modelling is a 
    method used to calculate <command>m</command> and <command>c</command>, such that, it is the closest approximation
    <command>y</command> for the given set of observations in the system, also known as a line of best fit. In the analysis of
    geodetic data the line of best fit is given as the solution to the matrix equation
    
    <equation>
      <title>Linear Least Squares Regression Equation 1a</title>
      <execute><cmd>eqimg</cmd><args>time-lsreg.bmp</args></execute>
    </equation>
    
    where <command>X</command> is the corrections to the apriori estimates
    
    <equation>
      <title>Corrections to Apriori Matrix</title>
      <execute><cmd>eqimg</cmd><args>time-corrections.bmp</args></execute>
    </equation>
    
    and <command>n</command> is the number of observations in the system. The <command>A</command> matrix is the know as the design
    matrix
    
    <equation>
      <title>Design Matrix</title>
      <execute><cmd>eqimg</cmd><args>time-design.bmp</args></execute>
    </equation>
    
    <command>P</command> is the <command>n</command> x <command>n</command> weighting matrix and <command>L</command> is the apriori
    matrix 
    
    <equation>
      <title>Apriori Matrix 1a</title>
      <execute><cmd>eqimg</cmd><args>time-apriorimatrix.bmp</args></execute>
    </equation>
    
    </para>
    
    <para>When carrying out least squares regression analysis on geodetic data it is common to set the initial apriori estimates of
    the slope and y-intercept to zero, thereby making all value of <command>L<subscript>c</subscript></command> = 0. By applying
    this initial constraint the apriori matrix becomes
    
    <equation>
      <title>Apriori Matrix 1b</title>
      <execute><cmd>eqimg</cmd><args>time-apriorimatrix-2.bmp</args></execute>
    </equation>
    
    and hence Equation 2 becomes
    
    <equation>
      <title>Linear Least Squares Regression Equation 1b</title>
      <execute><cmd>eqimg</cmd><args>time-lsreg-2.bmp</args></execute>
    </equation>

    </para>
        
    <para>Another common constraint use in this type of analysis is to set y-intercept (<command>c</command>) to the point of the
    first time observation in the given system. This is desirable because of the time format used in geodetic
    data systems, that is, a decimal number where the numerator corresponds to the year and number of days into the year is 
    denominator. For example, the date value 2002.0014 corresponds to the Gregorian calender date of 12:00, 1 January 2002. If 
    x-axis was not constrained, the y-intercept would always be given at <command>x</command> = 0, which corresponds to a date of 
    4000 BC (approximately 500 years before the Bronze age), thus 
    rendering this value virtually meaningless. Therefore, the x-axis is constrained by setting 
    <command>x</command> = <command>t - t<subscript>0</subscript></command> where <command>t<subscript>0</subscript></command> 
    is the first time observation in the given system. This has the effect of making the <command>c</command> value more 
    meaningful and simplifying any graph of this data. 
    </para>
  </sect3><!--Linear Least Squares Regression Modeling-->
  
  <sect3><title>Weighting Matrices - Variance Co-variance and Random Walk</title>
    <para>It is not uncommon in geodetic data, especially data taken over a long period of time (i.e. a number of years) for some
    of the value to be less reliable than others. This can be due to the failure or replacement of data gathering 
    equipment or natural phenomena such as earthquakes. In addition, random or white noise is known to
    affect different types of geodetic data to varying degrees. This type of signal distortion is truly random and has a zero mean
    (<command>&mgr</command>) and a variance (<command>&sgr</command>), so that, over a sufficently large data set its
    effects are cancelled out. The problem occors when smaller systems are used and the white noise is not cancelled 
    out, this can distort the mean value of the given system. It is important, therefore, to have some mechanism of ensuring 
    that this potentially erroneous data does not influence the linear regression analysis in an adverse manner. 
    In order to do this what is known as a Variance Co-variance (VCV) weighting matrix is employed. 
    </para>
    
    <para>A VCV weighting matrix is a strictly diagonal matrix where correlation is not admitted, that is 
    <command>P<subscript>ij</subscript></command> = 0 where <command>i</command> &ne; <command>j</command> 
    which models a standard Gaussian distribution (also known as a bell curve).
    
    <equation>
      <title>Standard Gaussian Distribution</title>
      <execute><cmd>eqimg</cmd><args>time-gaussian.bmp</args></execute>
    </equation>
    
    <figure>
      <title>Diagram of Standard Gaussian Distribution</title>
      <execute><cmd>eqimg</cmd><args>TD-Gaussian.gif</args></execute>
    </figure>
     
    This method works by pre-multiplying the observation values with a number that represents its 
    "correctness" (see equation 2-2). Therefore the weighting value specified on the diagonal of the VCV matrix should be a 
    percentage value ranging from one to zero, whereby a value of one means the given data point fully participates 
    in the calculation and a value of zero indicates that the data point was erroneous and is discarded. 
    </para>
    
    <para>Due to the nature the of method of weighting data values, great care must be taken so that the linear regression model
    calculated is not distorted. This is due to the fact that, by pre-multiplying the observation data with a weighting matrix
    their values are essentially being changed. If too few erroneous data points are un-weighted, the least squares line calculated 
    will not accurately represent the general data trend. Similarly, weighting out too may data points will cause the
    same error to occur. For example, one method of weighting an inconsistent system would be to assign a weighting value 
    according to standard deviation, that is, for all points within standard deviation above or below the mean would be given a 
    weight of 1. Similarly, points within two standard deviations, but greater than one standard deviation would be weighted 0.75 
    and any point two standard deviations above or below the mean would be given a weight of 0.5. Finally, all other points would 
    be weighted to zero. This model, however, has been known to unnaturally distort the data when applied to geodetic systems. 
    This is due to the fact that the data maybe skewed, that is where the third moment is 
    
    <equation>
      <title>Skewness</title>
      <execute><cmd>eqimg</cmd><args>time-skew.bmp</args></execute>
    </equation>
    
    <figure><title>Box Plot of a Skewed Data Set</title>
        <execute><cmd>eqimg</cmd><args>TD-boxskew.gif</args></execute>
    </figure>

    where <command>s</command> &ne; 0 and/or affected by kurtosis where the forth moment is
    
    <equation>
      <title>Kurtosis</title>
      <execute><cmd>eqimg</cmd><args>time-kur.bmp</args></execute>
    </equation>
    
    <figure><title>Box Plot of a Data Set affected by Kurtosis</title>
        <execute><cmd>eqimg</cmd><args>TD-boxkur.gif</args></execute>
    </figure>

    for <command>k</command> &ne; 3.
    By applying this model, these anomalies will only be accentuated in the resulting linear regression model and either too many
    data points will be weighted less than one or too few points will be weighted out. A far safer method of assigning a weighting 
    matrix is to simply weight any data point greater 
    than three standard deviations above or below the mean to zero and all other values as one. This ensures the resulting model
    will be calculated using a standard Gaussian distribution, and thus will be unaffected by these phenomenon.
    </para>
    
    <para>Weighting matrices will also remove any underlying noise that may be affecting the system, one such
    trend is known as Random Walk, which is a phenomenon that the geodetic community believes occurs, particularly in GPS data.
    Detection of this trend has become possible in recent times because it is believed that Random Walk can only be accurately 
    seen in systems that span at least six years, or preferably 10 years and data sets of this length have not become available 
    until now. If it is assumed that the data follows a standard Gaussian distribution (see figure 2-1) that is, 
    the observation values are evenly distributed then the expected Random Walk values are 
    </para>    

    <table frame='all' shortentry='0' toCEntry='1' >
    <title>Table Of Random Walk Values</title>
    <TITLEABBREV>Table of Random Walk Values</TITLEABBREV>
    <tgroup cols='5' align='left' colsep='1' rowsep='1'>
    <thead>
      <row>
        <entry>Time</entry>
        <entry>1</entry>
        <entry>2</entry>
        <entry>...</entry>
        <entry>n</entry>
      </row>
    </thead>
    <tbody>
    <row>
      <entry><command>RW</command></entry>
      <entry>R<subscript>1</subscript></entry>
      <entry>R<subscript>1</subscript> + R<subscript>2</subscript></entry>
      <entry>...</entry>
      <entry>R<subscript>1</subscript> + R<subscript>2</subscript> + ... + R<subscript>n</subscript></entry>
    </row>
    </tbody>
    </tgroup>
    </table>

    <para>
    and thus the weighting matrix is
    </para>

    <equation>
      <title>Random Walk Weighting Matrix</title>
      <execute><cmd>eqimg</cmd><args>time-rwweight.bmp</args></execute>
    </equation>
    
    <para>
    It is anticipated that using such a weighting matrix will remove any underlying Random Walk trend that may be affecting the data. 
    At this time, however, this is some debate as to which equation should be used in order to determine the random walk co-efficients 
    and therefore the optimal weighting matrix. As a result, there is limited support for this type of calculation within the scope of 
    the project.
    </para> 
  </sect3><!--Weighting Matrices - Variance Co-variance and Random Walk-->
  
  <sect3><title>Residuals and Determining VCV Weighting Models</title>
    <para>Residual space is another important aspect of Time Domain analysis. In essence, a residual is vector that represents the
    distance between the observed value and the least squares regression line of best fit. The residual vectors are therefore
    calculated by the matrix equation
    
    <equation>
      <title>Residual Equation 1a</title>
      <execute><cmd>eqimg</cmd><args>time-residual.bmp</args></execute>
    </equation>
    
    Where <command>L</command> is specified by equation 2-5. If the same constraint is applied to the apriori matrix as was used in
    the linear least squares model, that is, it is assumed that the initial model has a slope and y-intercept of 0, the residual
    equation becomes
    
    <equation>
      <title>Residual Equation 1b</title>
      <execute><cmd>eqimg</cmd><args>time-rest-residual.bmp</args></execute>
    </equation>
    
    </para>
    
    <para>At the very least, the residual matrix shows how well the regression line approximates the given system, that is, the 
    smaller the values of <command>V</command>, the better the fit of the model. In addition, if the residuals are transformed into 
    the Frequency Domain via a Fast Fourier Transform, an optimal model would give a totally flat frequency response. 
    Another property of residuals is that their sum should always be equal to zero. This is a very useful 
    property for testing the validity of any residual calculation.
    </para>
    
    <para>Calculating residuals are also essential for determining the VCV weighting model that should be applied to the given 
    system. As it has been seen, calculating an optimal VCV weighting model relies heavily on determining the standard 
    deviation of a system according to a Gaussian distribution. One method achieving this is to calculate the new weighting 
    matrix as
    
    <equation>
      <title>Weighting Matrix Equation 1a</title>
      <execute><cmd>eqimg</cmd><args>time-weighting.bmp</args></execute>
    </equation>
    
    where <command>N</command> is given by
    
    <equation>
      <title>Weighting Matrix Equation 1b</title>
      <execute><cmd>eqimg</cmd><args>time-weighting-2.bmp</args></execute>
    </equation>
    
    This method leads to the new weighting matrix <command>&Sgr</command> that is not strictly diagonal, this is
    undesirable becuase it is computationally much slower to calculate. For example, if the matrix equation
    <command>A<superscript>T</superscript>P</command> was to be caulculated  
    where <command>A</command> was a <command>n</command> x 2 matrix and <command>P</command> was a
    <command>n</command> x <command>n</command> weighting matrix. If the weighting matrix was strictly
    diagonal then then it would take  2 x <command>n</command> multiplication operations to compute. 
    Conversely, if <command>P</command> was a non-diagonal matrix then it would take 2 x
    <command>n</command><superscript>2</superscript> and 2 x
    <command>n</command><superscript>2</superscript> addition operations to compute the same equation.   
    </para>
    
    <para>Because of the slowness of using a weighting matrix that is not diagonal, another method of calculating a weighting
    matrix was sought. This method works entirely in the residual space and ensures that the weighting matrix is always diagonal.
    In this calculation, a line of best fit is calculated using equation 2-7 and the residuals are calculated using equation 2-13. 
    The residual matrix is then sorted in ascending order. From there the median and interquartile range can be calculated. The
    threshold of three standard deviations above or below the mean, assuming a standard Gaussian distribution, can be calculated by
    adding 1.5 x the interquartile range to the 75% limit and subtracting 1.5 x the interquartile range from the 25% limit, 
    resulting in a box and whiskers plot. 
    
    <figure><title>Box Plot of Data Set Re-weighting Using Resdiuals</title>
        <execute><cmd>eqimg</cmd><args>TD-boxplot.gif</args></execute>
    </figure>

    Therefore, any residual with a value greater than three standard deviations above the mean is considered to be erroneous and 
    will be weighted out.
    
    </para>
    
     
  </sect3><!--Residuals and Detetermining VCV Weighting Models-->
</sect2> <!--Theory-->

<sect2><title>Conclusion</title>
  <para>Time domain analysis of discrete geodetic signals is a powerful tool in determining data trends. Linear Regression Modelling
  can be employed to establish any general patterns in the data and give an indication of any data anomalies. It has also been seen 
  that by using a weighting matrix, erroneous data can be removed, thereby allowing the most accurate regression model to be fitted. 
  Weighting matrices can also be employed to detect and remove any underlying data trends, such as Random Walk that may be adversly
  affecting the system. Time Domain analysis allows a given system to be modelled in the residual space. This is important for 
  may reasons, firstly, the residuals give an indication of how well the regression model actually fits the observation data in both 
  the time and frequency domains. In addition, the residual space allows an optimal VCV weighting matrix to be calculated, such 
  that it is always guaranteed to be strictly diagonal. This vastly reduces the computational time required to calculate a 
  least squares solution.     
  </para>
</sect2> <!--Conclusion-->

</sect1> <!--TDA-->
