In my last post, I talked about the fundamentals of the Simple Network Management Protocol (SNMP). In this piece, I would like to expand on that understanding and outline some design principles that will help in piecing together everything and also offering some real-world context.
I am huge fan of Linux, for two reasons. The first is that I come from a network administration history where I haven’t had a very descent IT budget and therefore had to make the most of the hardware and software around me, and with Linux, you could boot it up on hardware with 32MB of RAM and it would flawlessly handle what Solarwinds requires 4GB of RAM to do. And the second reason is the sheer abundance of free and enterprise grade tools like Observium, Nagios, Ganglia and Zabbix that not only offer a great degree of customization, they have a vibrant ecosystem of boffins constantly improving them. However, when it comes down to rolling out a network monitoring solution, independent of the solution you prefer, the following are some important questions to ask yourself:
What do you want to monitor? In my case, I wanted to monitor my entire network infrastructure, i.e. routers, switches, access points, surveillance, end hosts and servers, so basically everything. However, the issue with most monitoring solutions is this: the more intelligent a device is, the more objects available to monitor, and as you add more clients to your monitoring, your resource requirements (memory, processing power and bandwidth) begin to increase exponentially. For example, a managed switch, will most likely be readable from MIBs that report port statistics, memory and CPU utilization, port status, temperature, etc. If you happen to have 10 of such switches, you already have over 50 data streams. Add 50 more clients, and you’ve most likely gone up to 2,500 streams. Add your servers, and other network equipment, and you’re most likely sampling from over 3 thousand Object Identifiers (OIDs). And this is ideally not a problem if you have the processing power to handle it. But like I mentioned in my last post, each time a Network Management System polls an OID, a GET/GET-NEXT/SET request is sent out, and in return a RESPONSE is received from the client. Requests going to the same host will always be sent out in a single packet if they are requested at the same time. A single SNMP GET request could be as small as 40 bytes before encapsulation, or even larger than 1400 bytes where multiple OIDs are requested. If you add encapsulation at L3 and L2, 42 extra bits are added to each frame by the time it is ready to be sent off on the network, which could be the upper limit of the standard 1,500 MTU. In simple terms, if you are polling 50 OIDs on a device every 10 seconds, you are generating about 60kbps of traffic on the network every minute. And this might sound insignificant, but note that my description above is a mere example. In busy networks, administrators can be sampling up to 100 objects per device and with hundreds of devices this could as good as copying a several mp3 files over the network every 10 seconds, which at peak network utilization could make your network sluggish.
At what rate do you want to monitor each device? Based on the above description, you probably see how quickly it is to congest a network with SNMP traffic. Therefore, after determining what it is you want to monitor, it is important to define the rate at which you want to poll your devices. Generally, I’ve learnt that core infrastructure could be polled between 10 and 60 seconds, depending on how critical it is to maintain uptime, while hosts can be polled even up to every 5 minutes or even more.
At what point should an interesting value become a concern? In a stable network, the majority of SNMP traffic is quite useless however, there are values, like an overheating cabinet, increasing bandwidth utilization on an uplink port, or a slow network backup that can be cause for concern. By setting thresholds, your NMS will notify you if there is odd behaviour on a particular object by triggering an alarm, or notification. I usually like to set multiple alarms to represent HIGH and CRITICAL. Usually, I set high to 60% and critical for 80% for ports and 40% and 50% for process utilization. Unlike memory, port utilization and processor utilization should never remain high over extended periods of time. For bandwidth, any bursts when utilization is high, could result in network latencies, or even traffic drops, while for processing, high CPU could cause a critical process to fail resulting in a crash.
SNMPv1v2 or SNMPv3? If you’re new to the world of SNMP, I’d recommend starting off with SNMPv1v2, as it is easier to setup and if you mirrored your server port, you can sniff traffic using a tool like TCPDUMP or Wireshark. Traffic is not encrypted and if you’re using the defaults, you already have the “Public” community name defined, all you have to do is enable it for most devices and you can get right to monitoring. If you’re a more seasoned user, and have a production network where you want better security, SNMPv3 is a better option, as the communication between the Server and client is encrypted and only someone with the hash key can see what you’re polling. However, it does require a little bit of background knowledge to set up.
Linux or Windows? It all comes down to preference. On the open source market, there is more choice in the Linux world, and if you don’t have a server to spare, a regular computer with an 80GB HDD, a duo core processor and 2 – 4 GB of RAM and a 1Gbps NIC should work just fine if you use one of the tools I listed above. If you’re a windows person, Spiceworks is an excellent tool. It not only has the simplest interface, it is pretty much a click-and-go tool, plus, you also get an Inventory Manager and a Helpdesk Support tool. The downside is, I doesn’t have support for private MIBs, which is not much of an issue as MIB-II does have a lot of functionality that everyone uses. In my opinion, if you really want to become an expert at SNMP, which you should, having a tool that allows you to tinker with the very heart of the protocol is essential.
With the foundation I’ve outlined in these two pieces, you can confidently pick up a user guide any tool delve right into configuration. All the best!