Powered by Blogger.

Time Traveling Traffic

Throughout this week, I have been continuing work on the traffic speed visualization program I was previously working on. In my last blog post I had a visualization that showed traffic speeds on streets across Austin; Since then, I have added a few new features to the program, including:
  • Live display of traffic incident locations, such as crashes or roadwork, and highlighting links (roads) which are affected.
  • Live display of the driving trajectories of people using the Metropia app.
  • Option to move forward and back in time to view incident, user trajectory, and traffic speed data from different times of day.



In order to make my visualization tool easy to use, I have been working on developing a user interface so anyone who wants to use the visualization can just input the city and times they want to view. Because the data is visualized using QGIS, the GIS tool Metropia uses, I am making the UI by developing a QGIS plugin. QGIS offers plugins for it's users to provide additional tools in the software, such as generating contour lines on maps or performing Cad like functions. QGIS allows users to develop their own plugins using C++ or Python. Between these two, Python is much more commonly used (especially by beginners), so there is a lot more information online on developing Python plugins than C++ plugins. So, I as plugin beginner, I decided to go with Python, even though C++ is better. Unfortunately, the last time I coded in Python was with Udacity's "Intro to Computer Science" course about three years ago when I first started programming. So, I barely know the language, making the process of developing a tool with it a bit more challenging. Thankfully, Python is just English, which is a language I'm fairly familiar with.

In order to simplify the process of developing a plugin, I used a plugin for developing plugins called Plugin Builder, which created a plugin template for me to just fill in. Initially I created the UI using QtDesigner, which amazing because all I had to do was drag and drop pictures of text boxes and input fields. However my joy was short lived, because with QtDesiner, once the possible input options are determined, they can no longer be changed by the plugin. Metropia is constantly updating the traffic data, so new traffic predictions are constantly being made available. Thus, the times to select from need to be periodically changed automatically. So, I had to hard code it :(

As a result my current UI is horrifically ugly, but it does successfully accomplish it's basic purpose of communicating user inputs to my program that processes the traffic information for the visualization.
The text box is there because I forgot to delete it.


At this point, my tool works in three parts:
  1. A QGIS plugin that has a UI that takes in user input.
  2. A Java program that pulls the user specified data (from the QGIS plugin UI) from the Metropia server, then processes the information for visualization.
  3. A QGIS script that re-loads the visualization every time there is an update in the data or user specifications.
Clearly this is an awful design, so my goal is to continue to develop the tool by re-writing most of my Java code into python so everything can be part of the QGIS plugin. Then, when a user opens up the QGIS plugin and enters their specifications, the QGIS plugin will do app the processing and visualization updates on its own without the help of outside programs. 


Accelerating Accelerometer Analysis

With the driving scores project, we have begun the first round of collecting data samples of different driving events. In order to create a program that identifies and assesses driving events, we first need to create a large dataset of examples of to teach our classifier (an algorithm or statistical model) to recognize different events such as turns or texting on the phone. Once we have a method of classifying events, we can calculate a driving score by penalizing phone usage or rewarding smooth turns.

To gather event samples, two of my co-workers and I drove to the U of A, and while doing so completed a list of actions such as pretending to text, take a call, or check our phone. We recorded accelerometer and gyroscope data during the whole trip using an app called SensorLog, so that we could have data on what these actions looked like inside a driving car.


Analyzing Our Data

From our drive, we collected six sets of data: gyroscope X, gyroscope Y, gyroscope Z, accelerometer X, accelerometer Y, and accelerometer Z. Most of the data was highly obscured by the motions of the car. On the bumpy, pothole ridden roads of Tucson, driving cars are constantly moving up and down even when the car is just driving straight.
Gross.
In order to analyze the data, I have been using a statistical programming language called R. Because R is mostly used by people who need to perform statistical analysis, rather than software developers, it is a relatively easy language to learn and has libraries specifically tailed for data analysis. As a first step, I wanted to be able to write a program that would identify the driving events for me - i.e. to identify when I picked up my phone, or when the car turned, based off the data I can collected from my phone. To do this, I decided to focus on the gyroscope Z data. Because car motions mostly cause the phone to move around on the horizontal plane, the phone does not sense any rotation around the z-axis unless the car is turning or the phone is being picked up. Thus, the data from the z-axis gyroscope has very little noise, as can be seen below:

With little noise, we can see each driving event extremely clearly in the time series graph. The relatively flat areas are areas when the car was simply driving. The tall spikes are times when I picked up my phone to text or make a call.  Right in between the 0 and 5000 points, there are two times when I picked up my phone, held it up for a while (resulting in more shaking), and then put it back down. At the 5000 mark, there is a time when I quickly picked up my phone and put it back down. The small dips, such as right after 5000 and before 1000, are turns. Because there is so much data, it is still quite hard to examine the events. So, I wrote a program in R which identifies areas where something is happening, and zooms into them to paint a clearer picture of what the data points look like. 
The first time I picked up my phone, right in 
between the first and 5000th point.
The first bump in the data, most likely produced
by when the car made a turn. 

While simple events such as texting versus turning can be easily distinguished with only gyroscope data, more complex events such as texting while making a sharp turn versus simply turning texting more information differentiate. In addition, even when driving events that can be recognized, they cannot be assessed based of turn radius or acceleration. Later on in the project when we will be scoring driving based off accelerometer and gyroscope data, we need more information about speeds and acceleration. So, we still need to take into account the accelerometer data. 


How Do We Understand Noisy Data?

When understanding time series, the goal is to identify trends or patterns in data. However, there are often random fluctuations in the data that obscure these patterns and trends. Instead of smooth straight lines, we have constant fluctuations.


As a result, the challenge with times series data is often figuring out how to distinguish the noise from the actual trends we are looking for. From a computer science perspective, this makes the problem of analyzing the data much more complex because we don't get to look for exact patterns, unless we somehow transform the noisy data.

There are many ways to transform time series data, such as with Fast Fourier Transforms, but the most common way is called segmentation. Segmenting time series data means to approximate the data using a series of straight lines. Segmenting the time series data is especially important in identifying driving events, because the driving events need to be accurately parsed out of the data, and we need to simplify the data as much as we can to efficiently classify the event types. Because time series are almost always extremely messy, there has been a large amount of research into developing segmentation algorithms to apply to noisy data. The three main algorithms used are the Sliding Window Algorithm, Top-Down Algorithm, and Bottom-up Algorithm.

1. Sliding Window - Start at the beginning of the time series, and continue adding segments of data to the current line segment until some error bound is exceeded. 
2. Top-Down - The time series data is recursively divided until segments of lines ("divide and conquer") to try out every way to divide the data into sections which are each approximated with one straight line. The combination of straight lines that has the least total error is selected as the best combination.  
3.  Bottom-Up - The time series is divided into a bunch of tiny segments, and segments are combined if one line can approximate both of them and still be less than the specified error bound. The best combination can be determined by testing out every possible combination to see which has the least error (backwards way to do Top-Down ) 


Each of these algorithms are optimal for different situations. For example, the Sliding Window Algorithm is better for segmenting noisier data or data which needs to be segmented even while more data is being inputted. I am most interested in using the Bottom-Up Algorithm, because it only runs in O(n) time (meaning the time is takes for the algorithm to run is a linear function of the size of the input) and performs best when more subtle events need to be identified. Throughout next week I hope to read more about time series data in order to find the best method to segment our data, and try applying various segmentation algorithms to better understand and simplify the accelerometer data. 


Citations
1. Keogh, Eamonn, et al. "Segmenting time series: A survey and novel approach."Data mining in time series databases 57 (2004): 1-22.
2. Lovrić, Miodrag, Marina Milanović, and Milan Stamenković. "Algorithmic Methods for Segmentation of Time Series: An Overview."

    Visualizing Vehicles

    I started off my week with a deviation from my main project: creating a visualization of traffic speeds using Metropia's traffic data for Austin, Texas.

    In order to tell its users the fastest route, Metropia determines and predicts the speed of traffic on every road. This information allows the Metropia app to provide users with the optimal route which takes into account current and future traffic conditions. However, occasionally there have been issues with the routes provided, often caused by incorrect estimations of traffic speed resulting from GPS error. In order to allow Metropia's developers to visualize these errors, I was asked to create a map showing the traffic speeds on every street. With a visualization showing different speeds as different colors, abnormalities such as a sudden 90mph point on an average 30mph road can be more easily detected by debugging developers.

    Using QGIS, an open source graphical information system, I started off with creating a visual representation of the traffic speeds based off old traffic data, shown in Figure 1. I then wrote a java program that automatically reads in changes in traffic data to an excel file with every new update from the Metropia server, and a python script to get QGIS to re-render the image with any changes in data, so that the image would update live alongside Metropia's updates.

    Figure 1: Map of traffic speeds in Austin, Texas. The blue shows lower
    speeds, while the red shows higher speeds. 
    Throughout next week, I will be working on allowing configuration of the time of day for the traffic speeds shown in order to view predictions for future traffic speed levels as well as past data. In addition, I will also be looking at how to make my program more easily usable for Metropia's developers, perhaps by making a UI (if time permits).

    Accelerometer Basics

    As a research intern at Metropia, my main task is to help develop a program to assess driving through the motions of a smartphone located in the car. Thus, my first week at Metropia has been primarily focused on gaining familiarity with phone accelerometers and reading up on current research in the area of understanding accelerometer data. 

    Accelerometers are small devices which detect proper acceleration. Conceptually, accelerometers work somewhat like a spring, detecting forces on the spring by an attached mass.


    Figure 1: (a) Side view of the blue seismic mass which moves to push against the green capacitors. (b) The motion against the capacitors causes a change in capacitance which is detected.


    Phones have a three dimensional accelerometer, meaning there are three separate accelerometers to measure acceleration in the x, y, and z directions. Whenever your phone’s screen rotates when you rotate your phone, or when you play games by moving your phone, the accelerometers and gyroscopes in your phone are being use to recognize the motion.

    Figure 2: Three accelerometers of the phone.  


    Already, there have been a number of studies which have looked at using phone accelerometers to identify driving motions. Mobile Phone Based Drunk Driving Detection took a relatively simple approach of tracking the longitudinal and lateral acceleration of a car through placing a smartphone inside of the vehicle, and then judging whether the acceleration patterns indicated whether the driver was drunk.

    Figure 3: Image from Mobile Phone Based Drunk Driving Detection which shows different examples indicating drunk driving, such as wide turn radius and inability to remain in one lane. 



    Driving Behavior Analysis with Smartphones took the most sophisticated approach. The researchers parsed accelerometer data by identifying driving "events", periods where the simple moving average of the acceleration increased a certain threshold, meaning the car was doing something other than just driving straight. The researchers classified different “events” such as U-turns, left turns, rapid acceleration, etc. and then made 40 different templates for aggressive and non-aggressive examples of each of the events. When an event would be identified in the data, the signal it produced would be compared to the templates by seeing how similar they were when aligned using the Dynamic Time Warping (DTW) Algorithm. The method was very solid, with 97% of aggressive events successfully identified.

    In addition to identifying driving events, accelerometers have also been widely studied as a method of identifying human activities. Activity Recognition from User-Annotated Acceleration Data studied accelerometer data to identify activities ranging from washing dishes to king fu movements. The researchers used a variety of machine learning techniques (techniques to identify patterns in data), including decision tables, decision trees, the nearest neighbor algorithm, and the naïve Bayes classifiers, and then compared and contrasted their results. Classification of Motor Activities through Derivative Dynamic Time Warping Applied on Accelerometer Data was another study looking at identifying human activity, but instead used the DTW algorithm like Driving Behavior Analysis with Smartphones to match recorded templates of activities with observed activities.  In addition the study also looked an improved version of the DTW algorithm, the Derivative Dynamic Time Warping (DDTW) algorithm. The classic DTW algorithm fails to properly match two similar signals if there is too much variation in the y-axis between the two signals, so DDTW algorithm fixes this issue, and as a result is much more successful at identifying similarities between signals. 


    Figure 4: Images of the matchings made by the DTW and DDTW. DDTW aligns the signals as one would expect, while DTW shows erratic matchings when the signals' y axes vary.  

    So far, I am the most interested in using the DDTW algorithm to identify driving events such as aggressive turns or texting. The DTW algorithm has been shown to be able to successfully identify both driving events and human activities, so it will be one of the main algorithms I hope to use in processing accelerometer data.