Updated September 7, 2017
A snapshot of the tutorial slides is here. The R script (in a word document) is here. The data file for the R script is here. R packages that should be downloaded are listed below under Requirements.
Abstract
The application of analytics methods to data collected from communication networks provides valuable information about the network state, and for detecting and predicting anomalous behavior in the network. Large volumes of operation data, including textual and log files, are collected from communication networks; and the inherent value of this data is recognized by academics, practitioners as well as network operators, who need easy to use and robust methods to detect and analyze anomalies based on the network data. The anomalies are indicators of vulnerabilities in the network. From an operations perspective, it is important to detect the anomalies and correct the problem (based on knowing the root cause) in a timely manner. The goal of the tutorial is to deliver a well-balanced mix of theory and hands-on practice.
The first part of the tutorial will focus on introducing analytics methods for network anomaly detection. Next, a real-world case study is presented applying non-parametric machine learning techniques to detect anomalies, and neural network based Kohonen Self Organizing Maps (SOMs) and visual analytics for exploring anomalous behavior in wireless networks. Data from a 4G network will be used for the analyses. The case study is significant, as communications traffic on wireless networks generates large volumes of log metadata with hundreds of fields including error codes on a continuous basis across the various servers involved in a communication session. The second half (an approximate estimate, may be modified) of the tutorial will provide a hands-on session where attendees will be guided in the analysis of real log data using the techniques described above, in particular, the use of Kohonen SOMs. The hands-on session will focus on exploratory data analysis and modeling approaches using the provided datasets. The hands-on session will be conducted using: the R software environment, the rstudio user interface for R, and various R packages.
The target audience for this tutorial is novice as well as moderately skilled users who have an interest in anomaly detection, machine learning and/or visual analytics; and are interested in learning to use R for these applications.
Requirements
For the hands-on portion of the tutorial attendees must install the following software on their laptops: R (https://www.r-project.org/ ), RStudio (https://www.rstudio.com/ ) and the following R packages (including dependencies):
- rmarkdown
- knitr
- dplyr
- kohonen
- dummies
- ggplot2
- sp
- reshape2
- RColorBrewer
- magrittr
About the instructor
Dr Veena B. Mendiratta (veena.mendiratta@gmail.com)
Veena Mendiratta is the research lead for network reliability and analytics at Bell Labs, Nokia based in Naperville, Illinois, USA. Her research interests include telecom data analytics, system and network dependability analysis, software reliability engineering, and programmable networks (SDN) resiliency. Current research is focused on network reliability and analytics — architecting and modeling the reliability of next generation programmable networks, and development of analytics-based anomaly detection algorithms for improving network performance and reliability. She is a member of the SIAM Visiting Lecturer Program, Life Member of SIAM, Senior Member of IEEE, Member of INFORMS; a Fulbright Specialist; and TPC member for several IEEE conferences. She holds a B.Tech in engineering from the Indian Institute of Technology, India, and a Ph.D. in operations research from Northwestern University, USA.