The OCR

Specimen Box

The OCR   Specimen Box

source: creativeapplicationsnet

Following their incisive examinations of online advertising ecosystems and browser activity ad profiling, the Office for Creative Research recently completed an interactive touchscreen-based work that visualizes (and sonifies) botnet activity based on data collected by Microsoft’s Digital Crimes Unit. Botnets are nebulous, distributed entities that span across thousands of PCs, so to provide some illumination on these abstract networks the tool represents their aggregate geographic and temporal patterns. In their project writeup, OCR frames Specimen Box as an…

…exploratory tool that allows DCU’s investigators to examine the unique profiles of various botnets, focusing on the geographic and time-based communication patterns of millions of infected machines. Specimen Box enables investigators to study a botnet the way a naturalist might examine a specimen collected in the wild: What are its unique characteristics? How does it behave? How does it propagate itself? How is it adapting to a changing environment?
.
.
.
.
.
.
.
source: o-c-rorg

What is your computer doing when you’re not watching?

You might not personally be in the business of identity theft, spam delivery, or distributed hacking, but there’s a decent chance that your computer is. “Botnets” are criminal networks of computers that, unbeknownst to their owners, are being put to use for any number of nefarious purposes. Across the globe, millions of PCs have been infected with software that conscripts them into one of these networks, silently transforming these machines into accomplices in illegal activities and putting their users’ information at risk.

Microsoft’s Digital Crimes Unit has been tracking and neutralizing these threats for several years. In January, DCU asked The Office for Creative Research to explore novel ways to visualize botnet activity. The result is Specimen Box, a prototype exploratory tool that allows DCU’s investigators to examine the unique profiles of various botnets, focusing on the geographic and time-based communication patterns of millions of infected machines.

Specimen Box enables investigators to study a botnet the way a naturalist might examine a specimen collected in the wild: What are its unique characteristics? How does it behave? How does it propagate itself? How is it adapting to a changing environment?

Specimen Box combines visualization and sonification capabilities in a large-screen, touch-based application. Investigators can see and hear both live activity and historical ‘imprints’ of daily patterns across a set of 15 botnets. Because every botnet has its own unique properties, the visual and sonic portraits generated by the tool offer insight into the character of each individual network.

Specimen Box offers three ‘views’ which can be used to analyze botnet data, each offering a different perspective.

In the first, ‘Board View’, all of the botnets are displayed as spheres, and every incoming message from an infected computer (more than 2,000 messages per second) is visible as a colored dot moving into the sphere. Each botnet is also assigned its own audible ‘signature’ pulse, so that the balance of activity across the botnets is represented in the balance of the overall soundscape. Additionally, geographic activity is indicated through the speaking of city names, allowing the user to experience the diversity of locations from which the infected machines are ‘calling home’.

The second view, ‘Portrait View’, allows for a deep analysis of an individual botnet’s activity. Circular ‘retina plots’ show the complete activity of a bot over a default time period of 24 hours, displaying up to 500 million messages in a single visualization. IP addresses of the infected computer are presented in a circular array. Using an innovative radial selection interface, investigators can select and zoom into any section of this data, allowing a quick drill-down to the IP level.

By default, the IP addresses are sorted around the circle by the level of communication activity. The huge data set has been optimized to allow researchers to instantly re-sort the IPs by longitude or by similarity. “Longitude Sort Mode” arranges the IPs geographically from east to west, while “Similarity Sort Mode” groups together IPs that have similar activity patterns over time, allowing analysts to see which groups of machines within the botnet are behaving the same way. These similarity clusters may represent botnet control groups, research activity from universities or other institutions, or machines with unique temporal patterns such as printers.

One of the key features of Portrait View is the sonification of botnet activity. The density of the data within these plots means that it can be hard for our eyes to detect pattern or interesting anomalies. However, our ears have a much higher temporal resolution, meaning that we’re able to hear things that we may not otherwise notice.

A visible “playhead” continually sweeps the selected data. The user can control the direction of the sweep, allowing it to move either according to time or IP address. Through an exploratory process of selecting, zooming, and altering the characteristics of the sonification playhead, users can discover behavioral and geographic patterns within the data, aiding their efforts to fully understand the nature of each botnet.

Botnets’ activity sometimes changes dramatically from day to day, or from month to month, so analysts will want to compare a botnet’s activity botnet on different days, or compare two botnets on the same day. To facilitate this, we developed a functionality we call ‘cleaving’, which allows the user to split the portrait view at any time using a simple multi-touch gesture. The split-screens can be merged again by reversing the gesture.

In the third view, ‘Graph View’, users can examine the character of individual botnets by plotting them on graphs with one, two, or three axes. Again, users can cleave these visualizations in order to facilitate comparison. An intuitive interface allows for easy addition and removal of axes and variables, making it easy to compare the data across various dimensions.

With Specimen Box, The OCR set out to visualize a vast and dynamic set of data in a way that engaged with the character of the system from which it came — that of botnets. We wanted to allow for interaction with this data at greatly different scales, using multi-touch gestures and advanced computation techniques to facilitate a responsive, seamless experience. We also focused on bridging two very different data representation techniques, sonification and visualization, allowing each to do what it does best. Ultimately, the core goal for the tool is to facilitate question farming: we want give the DCU an instrument which will allow them to ask critical questions about botnets, the people who build them, and the people who are affected by them.