In the process of collecting data for my dissertation experiment, I amassed something like 200 sessions of game play, each about 5 minutes in duration. Each log file from these sessions had millisecond sampling of player position, camera pitch and yaw, and the player's health. This gives me millions of samples I can use to analyse my game level in a variety of ways.
However, managing that much data is tricky -- especially when the software development was (is) ongoing and there are inevitably bugs in the system that require re-processing of the raw log data.
In order to ensure data integrity and the ability to sort erroneous and/or invalid session logs from valid ones, I needed a dedicated utility application -- enter LogImporter.
It's not the sexiest piece of software I've ever seen, but it has proven IMMENSELY helpful in verifying my data and trusting that everything that should be in the database has actually reached the database successfully.
In the process of this development, I wrote a general-purpose HistogramWidget (in Java/QtJambi). The widget can take either a query string (e.g. "SELECT rowcount as valueField FROM tableFoo") or an existing TreeMap of data, and then it renders a histogram of the values in that resultset. The dark blue bar represents the bucket containing the mean, the darker purple box represents 1 standard deviation, the lighter blue represents 2 standard deviations, and the red bar indicates the currently selected log file (in the list on the left).
This means that when you select an existing log file, it will show you where that falls in the distribution compared to all the other sessions in the database. This is extremely handy when you're not sure if the session you're looking at is vastly outside the norm compared to its peers. While I am currently only using it to show the rowcount for each log file, new histograms can be added extremely easily (if there's some other thing you'd like to compare across log files).


