Internet of Things is one of the fastest growing fields in the tech industry. New devices are constantly being invented and introduced into the market. According to Statista, there are expected to be “30.73 billion connected IoT devices in the world by 2020.” With so many new devices being introduced, it is inevitable that some of these devices will be released with vulnerabilities somewhere in the course of their product lifecycle. Many of these devices pose a risk to users and the networks to which they are connected. As many of Viasat’s customers will be adopting these IoT devices, Viasat had two intern teams working together on different aspects of IoT security. My intern team identifies devices in order to associate and offer better protection visibility to our customers. Our partner team is focused on detecting anomalies in network traffic behavior from these devices. This article summarizes both our teams’ efforts and results.
Sample of IoT devices in our lab
Please note that no new vulnerabilities were discovered or sought during the course of this project (although we would have conformed to responsible disclosure practices). To the extent that any device exhibited anomalous behavior, this was due to the deliberate injection of anomalous traffic for the purpose of verification of detection algorithms.
A Quick Note about Privacy
Viasat takes the privacy of our subscribers very seriously. To that end, it’s important to stress that this project is not looking at any personal information here, and we are not associating any traffic coming from IoT devices with any specific customer. We’re analyzing device behavior and overall network anomaly effects so that we are better able to protect our networks and our customers who use them.
Device Identification for Network Contextualization
Being able to contextualize network traffic is extremely important. Network contextualization enables Viasat to understand the type and amount of traffic to expect from the network. It also enables Viasat to better protect the network from potential vulnerabilities. The best way to gain this network understanding is by sorting out network traffic by device.
With this in mind, our team is tracking and classifying devices as they come on to the network. We are then feeding this information to a partner team which is detecting anomalous behavior on a per-device basis.
Our team monitors DHCP traffic and detects when new IoT devices request an IP address to join the network. Once we identify a new device on the network, we observe the network traffic it generates to determine what kind of device it is. There are a few distinct characteristics we noticed as we began investigating how we would go about device classification. Usually, when the device first connects, it will “phone home” in order to configure itself and check for system updates. Often times, this “phone home” will reveal the company or manufacturer that made the device. For example, an Amazon Alexa might reach out to an Amazon domain or a Fitbit to Fitbit domain. Additionally, we check who registered the domains that these devices reach out to in order to gain more company information. Further, many devices have pre-configured hostnames such as “iPhone 8” or “Echo” that can reveal the device identity. With these lookups, we can gain some device context from identifying a device. At the least, we are able to identify the IP address and MAC address of a device so that the anomaly detection team can group traffic by device.
The initial device classification displayed on our web app.
Once devices have been identified, we want to be able to classify them as specifically as possible. We achieved a 95% accuracy with a random forest classifier for specific device typing. This was perhaps over- fitting since we did not have access to as much data as we would have liked. For this reason, we designed a process to crowd- source data collection. When our program is installed on a local network, it performs the device identification then sends its features into the cloud to be typed. In the cloud, we will continue to tweak the random forest model so that it performs optimally. Once the device has been classified, we can store this data so that Viasat can monitor its traffic. We also store the data so that when another customer has the same device, we can classify it quickly.
The anomaly detection team is taking an unsupervised machine learning approach to detecting anomalous network activity. We ingested datasets of network flow from an intrusion competition from 1999. We also leveraged a more recent 2017 dataset provided by the University of New Brunswick which featured varying attacks injected at random intervals. For even more data, we injected some of the IoT devices in our lab (shown above) and used that data for training. Next, we evaluated several different models.
Our k-means approach focused on the flow’s features with the largest amount of behavioral variance which were determined using Principal Component Analysis (PCA). This allowed us to be able to group specific flow patterns, and cluster the traffic flows based on the most important behaviors. We found that DDoS attacks were mostly found very close to the cluster centroids whereas the portscan and botnet flows were found at varying distances from the centroids. In summary, k-means allowed us to cluster DDoS effectively but other malicious activity such as port scans and botnets didn’t group as clearly.
We evaluated several other models such as Isolation Forest, Hierarchical Temporal Model, and Autoencoders. The best of these was iForest with an autoencoder which produced a 90% true positive rate and a 25% false positive which was a relatively good benchmark for true positives but was much too high of a false positive rate for our use case. With the insight gained from these models, we were able to design a much better prototype.
The confusion matrix for our iForest / autoencoder model.
Our final prototype is an ensemble model consisting of Gaussian Mixture, Hierarchical Density Based, and Autoencoding using deep learning. The autoencoder is similar to PCA in that it helped us find patterns in the data that weren’t obvious at first. Our rigorous testing resulted in the selection of the Gaussian Mixture. The model gave us a benchmark of 98% true positive and only 3% false positive.
The graphs of the number of anomalous activity over time by device.
Putting it all Together
Altogether, both teams achieved satisfying individual results. However, the intersection of the work is where the project becomes much more meaningful. The future combination of these projects will result in a very clear picture of the network and its safety. The device identification team can determine which devices are connected leading the anomaly detection team to monitor the devices for anomalous behavior. From here, we can make decisions to block or re-route certain traffic from a device to prevent malicious flows. This provides customers safety and helps Viasat secure the network and preserve bandwidth. Further, we can generalize certain patterns across networks; if a certain device always seems to be compromised, we can secure that device across all networks further optimizing network efficiency and safety.
To conclude, the intersection of these projects provides an extremely valuable service not only for the customer but also for Viasat, especially in the face of the ever-growing IoT landscape. The webapp prototype below gives an example of data that we have collected and can eventually learn from to protect our customers and our networks.
Our dashboard displaying devices and the corresponding number of flagged flows for each.
Our work provides foundational groundwork needed by Viasat to provide improved network security for its diverse set of customers. This sets the stage for exciting future development, possibly even by our next round of interns! We discussed in the previous section the benefits of combining anomaly detection with device typing – this is one key area of continued research to make our solution even more robust. Ideally, this would be built into customer-facing portals which allow customers to see a user-friendly overview of which devices may be compromised on a network. Viasat employees will also get this information but with the technical details in order continually monitor and improve security. Eventually, with the data our program yields, we will be able to detect patterns among devices and use these insights to develop security solutions. These solutions will benefit the customer by protecting their devices and personal data. Altogether, they will help Viasat by having safe customer networks.
Acknowledgements & Internship Experience
I couldn’t give my internship experience at Viasat a high enough rating. From the project I got to work on, to the people I was working with, and all the events Viasat had for us outside of work, it was an amazing summer. Specifically, my fellow interns were smart, supportive, and very friendly. I had two awesome project managers who guided us through the project while also giving us the freedom to explore. Many others invested their time in us and I was very grateful to collaborate with so many bright, motivated people.
If you are interested in a Viasat internship, check out: https://careers.viasat.com/careers/Interns