Position log

Remco Bloemen

The logging app I’m using is GPS Logger. I’m storing in GPX and KML files. Their formats are as follows:

Keyhole Markup Language: The KML. Is very simple, but also doesn’t offer much. It Looks like:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
    <!-- Then many repetitions looking like  -->
<gx:coord>2.662407 49.23818433 147.0</gx:coord>


GPS Exchange Format. The GPX format looks like:

<?xml version="1.0" encoding="UTF-8" ?>
<gpx version="1.0" creator="GPSLogger - http://gpslogger.mendhak.com/"

    <!-- It then proceeds with repetitions of -->

<trkpt lat="49.24454858" lon="2.67159307">

    <!-- Mixed with repetitions of -->

<trkpt lat="52.4060613" lon="6.9069997">


After navigating the spec (combining version 1.0 with the more extensively documented 1.1), the following can be found:

Element Documentation
trkpt A Track Point holds the coordinates, elevation, timestamp, and metadata for a single point in a track.
lat The latitude of the point. Decimal degrees, WGS84 datum.
lon The longitude of the point. Decimal degrees, WGS84 datum.
ele Elevation (in meters) of the point.
time Creation/modification timestamp for element. Date and time in are in Univeral Coordinated Time (UTC), not local time! Conforms to ISO 8601 specification for date/time representation. Fractional seconds are allowed for millisecond timing in tracklogs.
course Instantaneous course at the point. (degrees, true) (only in 1.0)
speed Instantaneous speed at the point. (meters per second) (only in 1.0)
src Source of data. Included to give user some idea of reliability and accuracy of data. “Garmin eTrex”, “USGS quad Boston North”, e.g.
sat Number of satellites used to calculate the GPX fix.
hdop Horizontal dilution of precision.
vdop Vertical dilution of precision.
pdop Position dilution of precision.
geoidheight Height (in meters) of geoid (mean sea level) above WGS84 earth ellipsoid. As defined in NMEA GGA message.


Dilution of precision

Measurements are never exact and always have an error. This modelled as either an interval, or a probability distribution around the true value. The hdop, vdop and pdop values provide an error estimate. According to wikipedia they are defined as:

hdop=σxx+σyyvdop=σzzpdop=σxx+σyy+σzz \begin{aligned} \text{hdop} &= \sqrt{\sigma _{xx} + \sigma _{yy}} \\ \text{vdop} &= \sqrt{\sigma _{zz}} \\ \text{pdop} &= \sqrt{\sigma _{xx} + \sigma _{yy} + \sigma _{zz}} \end{aligned}

The σ\sigma ’s are diagonal elements of the covariance matrix of the four space-time coordinates:

Σ=cov[xyzt]=[σxxσyxσzxσtxσxyσyyσzyσtyσxzσyzσzzσtzσxtσytσztσtt]=ΣT \Sigma = \operatorname{cov} \begin{bmatrix} x \\ y \\ z \\ t \end{bmatrix} = \begin{bmatrix} \sigma _{xx} & \sigma _{yx} & \sigma _{zx} & \sigma _{tx} \\ \sigma _{xy} & \sigma _{yy} & \sigma _{zy} & \sigma _{ty} \\ \sigma _{xz} & \sigma _{yz} & \sigma _{zz} & \sigma _{tz} \\ \sigma _{xt} & \sigma _{yt} & \sigma _{zt} & \sigma _{tt} \end{bmatrix} = \Sigma ^T

Since pdop2=hdop2+vdop2\text{pdop}^2 = \text{hdop}^2 + \text{vdop}^2 we really only have two values out of the ten that are relevant. And we are still missing a description of distribution of the errors. Bbzippo wrote about the error distribution. He/she mentions that the simple assumption of independent normally distributed coordinates agrees well with measurements. Let’s run with this assumption.

xN(x,σxx)yN(x,σyy)zN(x,σzz)tN(x,σtt) \begin{aligned} x &\sim N(\bar x, \sigma _{xx}) & y &\sim N(\bar x, \sigma _{yy}) & z &\sim N(\bar x, \sigma _{zz}) & t &\sim N(\bar x, \sigma _{tt}) \end{aligned}

For the elevation, zz, we can directly use the distribution. For the time we have no error estimates whatsoever. The horizontal error is a bit more involved, we know σxx+σyy\sqrt{\sigma _{xx} + \sigma _{yy}} but not the individual values of σxx\sigma _{xx} and σyy\sigma _{yy}. If we assume σxx=σyy=σ\sigma _{xx} = \sigma _{yy} = \sigma then we can set σ=½hdop2\sigma = ½\mathrm{hdop}^2 . We can move further and calculate the horizontal error distribution:

|[xy][xy]|=(xx)2+(yy)2Rayleight(σ) \left\vert \begin{bmatrix} x \\ y \end{bmatrix} - \begin{bmatrix} \bar x \\ \bar y \end{bmatrix} \right\vert = \sqrt{ (x - \bar x)^2 + (y - \bar y)^2 } \sim \operatorname{Rayleight} (\sigma )


cat *.csv | sort > full.csv
echo "time,longitude,latitude,elevation,hdop,vdop" > short.csv
awk 'NR % 100 == 0' full.csv >> short.csv