Stats

It’s really a much bigger story, but the bottom line is that I started thinking about how I could graph the utilization of our various major network segments at the company where I worked. I had long known of MRTG, so I tried it out. I had everything working within two hours. My coworkers and my boss thought it was pretty cool.

Tangentially, one of my coworkers had long been watching server loads with separate instances of `xload’ running for each server on a second monitor on his workstation. He wanted a tool for doing this that didn’t require logging into the servers, and manually running the program. I read some more about MRTG, and even discovered some scripts that would make it graph other things besides network bandwidth. However, it was sort of forcing it into a use for which it wasn’t designed.

Enter Cacti, which is a complete replacement for MRTG. In fact, it’s really a generic front end to rrdtool. I got it up and running within a few hours, but it actually took a few days to get it configured to monitor everything we wanted. As a tool, it drops into place very nicely beside Nagios, which is a montioring/alerting application. We had it configured to check about 50 different services across all platforms, and shoot us an email if something breaks or goes out-of-band.

All of these tools are, of course, open-source. They performed tasks for us for which one other group was paying $800 (for a Nagios workalike), and for which another was paying something like $10,000 (for a product that does the work of both at once). So, this was a no-brainer with my management. All of the back ends for these things were running on an old, spare computer that was collecting dust. I just whacked SuSE 8.2 on it, which was my distro of choice at the time. On top of this, I scrounged up another old machine that was collecting dust (we had literally dozens of both servers and workstations laying around), stuffed 3 video cards in it, and produced a monitoring “head,” pictured here. Nagios’ summary screen is on the left, a view of server network utilizations is in the middle, and a view of server loads is on the right.

3-headed Monitoring Computer

After setting all of this up, we noticed a sustained spike in the load on our main server one Friday noon. I wanted to dig deeper into what was causing the problem. I started to sift through the archived “sar” logs, but that was going to take too long. So, I started to pull the logs into Excel and make some graphs, but I saw that that too was going to take too long. So… I wrote a Perl script to parse the logs, throw the data into an rrdtool database, and crank out some graphs of that data. This approach has the following benefits:

  • Automated
  • Don’t have to write database or graphing code
  • Could easily be used as part of a web site
  • Could be extended to store more data over time

In essence, I thought that this be useful when we want to “zoom in” on a part of the data shown by Cacti, without needing to set up complicated customized graphs and collect even more data. Besides which, I found that adding and changing graph templates in Cacti to be, uh, prone to error.

If you have any suggestions on how to make this better, or if you make modifications, please email me. You’ll have to rename the script to have a .pl at the end. I can’t leave it like that; my hosting provider wants to interpret that as something to run on the server side.

parse.pl

Here’s a sample of the output. It might be interesting to run this every day and show the results. Obviously, the later graphs still need some tweaking in order to be useful. Note bene: Since putting this up, I’ve realized that the SuSE version of `sar’ had a flag I relied on to sanitize the output for easier parsing. If this is to be useful now, it would have to be rewritten to accommodate sar’s usual output, which would be very difficult to parse.

  • #1 written by David  8 years ago

    New page! I’m looking for some feedback on this one.

  • You may use these HTML tags: <a> <abbr> <acronym> <b> <blockquote> <cite> <code> <del> <em> <i> <q> <strike> <strong>

  • Comment Feed for this Post
Go to Top