Network and services monitoring is crucial for quality and security. This can explain the rise of Distributed Measurement Systems (DMS): Devices deployed over networks, embedding monitoring applications periodically testing network and services and retrieving measurements further used for dashboarding, alerting, etc. Examples of DMS range from private infrastructures deployed by ISPs to measure end users « QoE » (e.g. SamKnows, IpLabel, home made) to large scale public infrastructures (e.g. RIPE Atlas, CAIDA Ark) that can be used for Internet Tomography studies.
Designing DMS is challenging especially as they must scale and provide reliable measurements. Especially, when many applications are collocated on the same machines, one has to make sure they do not compete for resources while executing as to not bias the collected measurements. To this end, we propose the NMaaS, an open-source platform, publicly available which enables to deploy and manage containerized measurement applications on a pool of physical machines.
An NMaaS instance is accessible via an online application allowing users to: choose monitoring applications from a pre-defined catalog to be deployed on machines in the network; visualize and manage their pool of machines, as well as the monitoring applications deployed on them; examine the results of the measurements and alerts raised. The first catalog contains: an IP spoofing detection app; a web (resp. streaming) QoS measure app; a web (resp. streaming) cartography app. The goal is then to motivate users to propose new apps to be added to the catalog. Furthermore, we propose and develop a scheduler for our NMaaS solution to make sure monitoring applications do not compete for resources while executing as to not bias the collected measurements.
In terms of implementation, a cluster of Docker containers is provisioned by Kubernetes to orchestrate the deployment of applications across the nodes by following a master-worker pattern. It is automatically set up by Ansible for node scalability, monitored by Prometheus by means of node exporters and sketched by Grafana to give an overview of the platform. Additionally, AlertManager sends notifications whenever an explicit metric reaches a threshold.
As a conclusion, the NMaaS allows rationalizing network and service monitoring, while scaling and ensuring accurate measurements. Gains are numerous: open source and publicly available for the community to use and extend; automation in deployment and use (no need to go on site to deploy new measurement applications making the solution lock down friendly); easy and rapid integration of applications to the platform thanks to the catalog system and its container-based architecture and finally the ACS-based scheduler that enables proper resource allocation. So far, only the platform itself is available, the scheduler and the apps catalog are to be released soon.
: Research Engineer at Orange Labs. I received my PhD in 2009, on Internet cartography and BGP routing improvement. My research topics include network monitoring, anomaly detection and automation.
Raquel Rugani Lage
: Research engineer and PhD. student at Orange Labs and Telecom SudParis, on Internet cartography, allocation and scheduling of measurements applications to improve IP networks security.
Bryan To Van Trang
: DevOps engineer student at Orange Labs, on IP networks monitoring and automation.