This is a user guide for Hindsight that covers the basics on how to get the tool installed and running and then interpret the final report. It also details some Chrome artifacts and explains at a high level what Hindsight extracts from them.
A pdf version of this guide is available on the Hindsight Google Code site.
Hindsight is a free tool for analyzing the browsing history of the Google Chrome web browser. It can collect a number of different types of Chrome artifacts, including URLs, the text content of some viewed pages, download history, bookmarks, autofill records, HTTP cookies, and Local Storage records (HTML5 cookies). Once the data is extracted from each file, it is correlated with data from other history files and placed in a SQLite database. Hindsight can add data from multiple Chrome installations to the same database.
After the data is extracted, Hindsight runs a number of plugins against the data to try to further interpret what it has found. A plugin is a separate Perl file that Hindsight runs that performs a specific action, such as parsing a particular URL or cookie. Plugins could perform actions that just use local resources (such as parsing Google Analytics tracking cookies) as well as connecting to and using remote resources (such as looking up visited URLs to flag ones associated with malware or phishing). Users can choose which plugins to run and are welcome to submit ideas for new ones or to create their own.
The last piece of Hindsight is the reporting. Once the data has been collected and the plugins have run, Hindsight creates an .xlsx spreadsheet with the resulting information. The xlsx format was chosen for a number of reasons, including the ability to do advanced filtering and the fact that most end users are already familiar with it. Whenever possible, Hindsight tries to group similar types of data from different browser artifacts into one column enabling the reader to more easily scan the data and quickly understand it.
Hindsight is an open source tool, which means anyone who is interested can view how it works and even modify it. It is written in Perl, and can run on Windows, Linux, or Mac systems where Perl (and the required Perl modules) are installed.
Installation and Prerequisites
Hindsight is written in Perl and requires the Perl interpreter to be installed on the analysis workstation. Perl is available in a number of places online, such as ActiveState. Hindsight requires a number of additional Perl modules not included in the default interpreter installation, and a number of Hindsight plugins also require extra modules. These modules are available via CPAN (Comprehensive Perl Archive Network). CPAN also has a guide for installing modules on various operating systems. The list of all required modules for Hindsight and current plugins is as follows:
The next step is to prepare Hindsight itself, which is very simple. Download Hindsight (http://code.google.com/p/hindsight-internet-history/downloads/list) and extract the contents of the .zip file into the directory where Hindsight will run from. After extracting the files the directory should have a hindsight.pl file and a subdirectory called plugins. Hindsight is now ready to run. The last optional step is to add the directory containing Hindsight to the analysis system’s PATH variable; otherwise the user will have to navigate to the folder containing Hindsight before running it.
Hindsight is a script that runs from a command line interface. In Windows, the command line is typically accessed via the Command Prompt (cmd.exe). Linux and Mac users can use the Terminal application. Command line programs can be intimidating to users unfamiliar with them, but are actually fairly easy to use. Many command line programs are run by entering the program’s name into the command prompt followed by some options that tell the program what to do.
Running Hindsight from the command line is fairly straightforward, as it only has four options. The first option is for a directory to process. This could be the directory of the local Chrome installation on the system, or an evidence directory containing files collected from another computer. Specify the location which contains the Chrome data files using the –i (input) option followed by the path to the directory.
By default, Hindsight will create a new SQLite database and output a new .xlsx spreadsheet, each with a base name of “Hindsight Internet History Analysis (yyyy-mm-dd hh-mm-ss)” using the time that Hindsight started executing. If one desires a more descriptive name, the –o (output) option can be used to specify the base name.
Additionally, if one wants to rerun analysis on an existing Hindsight database file, or add more data to an existing database, use the –o option and enter the base name of the existing SQLite database (with no extension). A prompt will appear, alerting the user that the specified database file already exists, and asks for the action Hindsight should take.
Choosing Add to it enables the user to add data from multiple Chrome installations and view it in one report. Overwrite it directs Hindsight to delete the database file and create a new one with the same name. Rerun Plugins allows the user to run all plugin files against the already collected data in the database, skipping the (often time-consuming) step of parsing the Chrome data files. This last option can be especially useful when a user has a new plugin and would like to run it against old cases.
The last two command line options for Hindsight are –h and –r. –h displays a simple help message; this message is also displayed when Hindsight fails to understand the options the user entered. Adding the –r option allows the use of remote resources. By default, Hindsight will not connect to any remote resources (such as web sites or APIs), as in some instances investigators do not want any data from their cases going out to a third party. However, Hindsight can do a more complete analysis of the collected files if it is allowed to query other services. For example, the Safe Browsing API Lookup plugin checks each URL against Google’s Safe Browsing service to check for known malware or phishing sites.
After the command line options are set and the program is launched, Hindsight does its best to keep the user apprised of the program’s progress. The first major task Hindsight undertakes is processing Chrome’s numerous artifacts. An entry is displayed for each artifact type, along with Hindsight’s progress in processing it. Hindsight will also note if a particular artifact is not found, as some are not always present.
After all the artifacts are parsed, analysis using plugins begins. Hindsight will load all plugins located in the plugin folder. Each plugin has configuration information that lists key information about it, including its name, version, description, the artifacts it applies to, and if it uses remote resources. Hindsight checks this information to determine if the plugin should be allowed to run (for example, if a plugin uses remote resources and the –r option is not set, it will not run). Hindsight lists each plugin and the number of records it analyzed (if available).
Hindsight’s last phase generates the report detailing its findings. It will show the amount of rows in the final report and its progress as it writes them to disk. Once Hindsight has finished creating the report, the finish time is displayed on the screen and the program exits. The xlsx report can be found in the same directory as hindsight.pl and can be viewed with a number of common spreadsheet applications.
Reading the Report
The main view in a Hindsight report is the ‘Activity’ tab. This page is a timeline of all the record data that Hindsight was able to pull from the Chrome history files, sorted from oldest to newest. The record types are color coded to make it easier to digest at a glance.
The first three columns, Type, Timestamp, and URL, are self-explanatory. Every row in the timeline will have type and timestamp values, and most will have a URL as well. The next two columns, Title / Name / Status and Data / Value / Path, are a little more complicated, as depending on the type of record they will contain different fields. The Title / Name / Status field generally describes the data in the Data / Value / Path field. For example, for Local Storage and Cookie records, the Title / Name / Status field has the name of the cookie, and the Data / Value / Path field has the cookie content. Autofill records are similar; the name of the input field is in Title / Name / Status and the entered value is in Data / Value / Path. URL records have the page’s title in the Title / Name / Status column, followed by any indexed text from that page in Data / Value / Path. The reason for collapsing these different fields into two is to make the data in the report more easily accessible. An investigator can scan down the timeline and see all the relevant information in one stream, rather than having to scroll across a dozen columns or switch to a different tab to view a different type of record.
The next column, Interpretation, is one of the key features of Hindsight. This is the primary place that plugins display their output. Each plugin processes a specific type of record and decodes the content to make it easier to understand. Plugins run the gamut from complicated to very simple, but regardless of their complexity each has the potential to save an investigator time by automating previously manual tasks.
The remaining columns apply only to URL records. The first is labeled Safe? and shows how Google’s Safe Browsing service classifies the URL (as malware, phishing, or clean); this column will be blank if remote lookups are not allowed. Visit Count gives the cumulative number of times that webpage was visited, and Typed Count shows the number of times that a user typed in the page’s address (rather than clicking on a link). URL Hidden indicates (via a 0 or 1) whether the URL bar was visible to the user. Lastly, Transition shows how the user arrived at the webpage (link, typed, start page, etc).
This last section provides a brief overview of the different types of Chrome artifacts that Hindsight can extract information from, as well as the artifacts’ locations on disk and their file formats.
|Location: <Chrome Dir>/HistoryFile Format: SQLite||The ‘History’ file in the Chrome directory is the heart of where Chrome stores browsing records. This SQLite database has a number of different tables, but the two that combine to give most of the information about visited websites are the ‘urls’ and ‘visits’ tables. Hindsight extracts a number of fields for each website visited, including the URL, the page title, the visit count, the time the page was visited, and transition information.|
|Location: <Chrome Dir>/Archived HistoryFile Format: SQLite||The ‘Archived History’ file is similar to the ‘History’ file, but contains records that are over three months old. It has fewer tables than ‘History’, but the key ‘urls’ and ‘visits’ tables are still present and have the same structure. Hindsight extracts the same fields for these older records and places them on the ‘Activity’ timeline as url (archived).|
|Location: <Chrome Dir>/History Index yyyy-mmFile Format: SQLite||The ‘History Index’ files are another very useful artifact. For some sites Chrome records the text on the web page and saves it in one of these index files. There are four files, each covering a month, and are named ‘History Index yyyy-mm’ (with yyyy-mm designating the year-month, e.g. 2012-04). Along with the text data, Chrome records the page title, URL and the timestamp. Hindsight processes the index data and adds it to existing url and url (archived) records in the ‘Activity’ timeline in the Indexed Data / Value / Path column. Because the index information is timestamped, it can be possible to view the text of a web page at multiple points in time.|
|Location: <Chrome Dir>/BookmarksFile Format: JSON||Chrome stores its bookmark information in a JSON file in the root of the Chrome directory. From this file, Hindsight extracts the name of each bookmark, the bookmark’s URL, the folder(s) the bookmark was saved in, and the date it was added. The tool also extracts when a bookmark folder was created and adds both these record types to the ‘Activity’ timeline.|
|Location: <Chrome Dir>/Web DataFile Format: SQLite||The Chrome ‘Web Data’ file is a SQLite database with a plethora of valuable information. It has a number of interesting tables, but some of the most useful to an investigator are the ones relating to autofill data. Autofill is a feature of Chrome that is intended to help a user by remembering data that was filled into forms. When a user visits the same website again (or a different website with a similarly name input field), Chrome will automatically fill out the forms with the user’s previous answers.Hindsight extracts the name of the input field and the saved value, as well as the time that the value was used. No domain information is stored as to what website the autofill data was used on; however, by ordering the autofill and URL data by time it is easy to see what website the autofill data is likely associated with.|
|Location: <Chrome Dir>/HistoryFile Format: SQLite||Chrome stores records of files a user has downloaded in the ‘downloads’ table in the History file. Hindsight extracts the URL the file was downloaded from, the full path of where it was saved to locally, the number of bytes received vs. the total file size, and places this record on the ‘Activity’ timeline at the time the download started.|
|Location: <Chrome Dir>/Local
|Local Storage is a common name for part of HTML5 Web Storage. Local Storage is the newest version of cookies, and it serves the same purpose as “normal” cookies: enabling websites to store persistent data locally. This new iteration of cookies is superior in many ways, including increasing the amount of data each site can store (from around 4KB in old HTTP cookies to about 10MB in HTML5 Local Storage).Chrome implements Local Storage by creating a .localstorage file in the ‘Local Storage’ directory for each website that elects to use it. Each .localstorage file is a SQLite database that holds all the key/value pairs. Because no temporal information is stored in the database, Hindsight uses the last accessed time of the .localstorage file itself to place the Local Storage records on the ‘Activity’ timeline. If there are multiple key/value pairs in a website’s .localstorage file, Hindsight creates a separate entry for each one in the timeline, all with the last accessed time of the file itself.|