There are two major products that come out of Berkeley: LSD and UNIX. We don't believe this to be a coincidence. | ||
| --Jeremy S. Anderson | ||
Ganglia provides a complete real-time monitoring and execution environment that is in use by hundreds of universities, private and government laboratories and commercial cluster implementors around the world. Ganglia is as simple to install and use on a 16-node cluster as it is to use on a 512-node cluster as has been proven by its use on multiple 500+ node clusters.
Ganglia was initially developed at the University of California, Berkeley Computer Science Division as way to link clusters across the Berkeley campus together in a logical way. Since it was developed at a university, it is completely open-source and has no proprietary components. All data is exchanged in well-defined XML and XDR to ensure maximum extensibility and portability.
The monitoring core allows you to monitor any number of host metrics in real-time. At present, the monitoring core runs on Linux, FreeBSD, Solaris, AIX, Tru64 and IRIX. There is a Windows port as well which is in beta.
Ganglia is not just a way to link nodes in a cluster together in a logical way but also a way to link clusters to other clusters. Ganglia blurs the line between clustering and distributed computing by providing for Cluster to Cluster (C2C) data exchanges which link disparate cluster resources together into a single logical framework.
Last Updated: $Date: 2002/09/16 20:12:49 $
Gmond is a multi-threaded daemon which runs on each cluster node you want to monitor. Installation is easy. You don't have to have a common NFS filesystem or a database, install special accounts, maintain configuration files or other annoying hassles. Gmond is its own redundant, distributed database.
Gmond has four main responsibilities: monitor changes in host state, multicast relevant changes, listen to the state of all other ganglia nodes via a multicast channel and answer requests for an XML description of the cluster state.
Gmond has threads which listen to the multicast channel and write the data collected to a fast, in-memory hash table. All metric data for each cluster node is processed and saved. You might think this would require large amounts of memory but a gmond can maintain the in-memory data in just 16+ 136*n_nodes +364*n_metrics bytes. For example, if you have a massive 1024 node cluster and you are monitoring 25 metrics on each machine then gmond will only use 16 + 136*1024 + 364*25 = 148380 bytes or just 144 kilobytes of memory! The hash table is also completely multi-threaded with read/write locks implemented by POSIX thread mutexes and condition variables for each individual host in the cluster. That means that multiple threads can simultaneously read and write to the hash without interfering with each other. Very scalable. Ganglia is used on many 500+ node clusters.
Each gmond transmits in two different ways: multicasting host state in external data representation (XDR) or sending XML over a TCP connection.
Gmond only multicasts a metric that it is monitoring for two reasons: a change in the value of the metric exceeds the value threshold or when gmond hasn't multicast the metric longer than the time threshold. The value threshold ensures that gmond only multicast when it really needs to. If there is no change in state, then no data is multicast. Of course things are always changing and the time threshold ensures that small changes haven't grow into large changes too slowly to notice. These thresholds dramatically reduces chatter on the multicast channel. For example, the number of CPUs on a host is only sent once an hour since it is a constant whereas 1-minute load could be sent every 15 seconds depending on the change in value.
Since all gmonds in your cluster are storing your cluster state, it is important that all nodes have the same cluster image. The gmond self-organizes and knows how to make sure that all gmonds are on the same page.
Anytime that gmond gets a multicast packet from a new host, it expires the time threshold for all its metrics. This means that all its metric data will get sent to the new gmond even if it wasn't previously scheduled to be sent. For example, the number of CPUs is only multicast once an hour. If the local gmond did not expire its time threshold then the new gmond wouldn't know the number of CPUs on that machine for up to an hour.
Gmond is also good at handling failures. It's inevitable that nodes in your cluster will go down occasionally and (heaven-forbit!) gmond may even fail. Recovering gracefully from those temporary failures is important. The payload for a heartbeat message is the timestamp of when gmond was started on a node. If ever that timestamp changes then gmond is alerted of a gmond restart on another node. This alert will trigger gmond to expire the time threshold on all its metrics in order to quickly update the new gmond of it's state.
Each gmond processes its own multicast data locally via loopback. That means that it saves in-memory the very data that it's sending to remote gmonds.
When asked, gmond will write out the complete cluster state in XML including its DTD. However, gmond will only share that data with hosts in its in-memory cluster hash OR a host specified with the trusted_hosts list in its configuration file (/etc/gmond.conf by default).
You can send custom metrics on the ganglia multicast channel as well. Gmond monitors dozens of metrics right out of the box but gmond doesn't assume that these are the only only metrics that you want to monitor. To expand the list of metrics you are monitoring, use the gmetric tool.

The grey box represents the Ganglia Monitoring Daemon (gmond) with all its components inside: the metric scheduler thread, multicast listening threads, fast in-memory hash and the XML output threads.
The metric scheduler thread checks the state of the host that gmond is running on and multicasts any relevant changes. Gmond decides what is relevant by using value and time thresholds. The metric scheduler remembers what value it last multicast and when it multicast it. If the difference between the last multicast value and the new value is greater than the value threshold, then the metric scheduler will multicast the value. Also, if the time elapsed since the metric value has been multicast is greater than the time threshold it will be sent regardless of its value. The value and time thresholds are metric-specific and set based on the metric characteristics.
The multicast listening threads listen on the ganglia multicast channel for incoming messages including messages from itself (via loopback). All data is stored in the fast in-memory hash. This hash holds the data for all hosts sending data on the cluster multicast channel via gmond or gmetric.
The XML output threads are responsible for processing incoming connection requests. When I request for XML is made the output threads checks if the client is 127.0.0.1, searches the hash of known hosts on the ganglia multicast channel and then the trusted_hosts list (in /etc/gmond.conf). If the host is not found, it closes the connection; otherwise, a complete XML description of the state of all hosts multicasting on its local multicast channel. You use telnet if you want to see this description...
telnet localhost 8649 |
Port 8649 (U*N*I*X on a phone keypad :) ) is the default port for the XML threads to listen on but it can be changed at run time.
Last Updated: $Date: 2002/10/16 17:55:13 $
This graphical representation show how the Ganglia Meta Daemon (gmetad) works with Ganglia Monitoring Daemons (gmond) to allow you to easily monitor cluster over unicast routes. While gmond uses multicast channels in a peer-to-peer way, gmetad pulls the XML description from ganglia data sources (either gmond or another gmetad) via XML over unicast routes.

Configuring Gmetad is simple. Gmetad behavior is controlled by a single configuration file (/etc/gmetad.conf) by default. This configuration file contains instructions on its use.
Gmetad is the backend for the ganglia web frontend. Gmetad stores historical information to Round-Robin databases and exports summary XML which the web frontend uses to present useful snapshots and trends for all hosts monitored by ganglia.
Last Updated: $Date: 2002/09/10 20:01:17 $
The Ganglia web frontend provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users. Although the web frontend to ganglia started as a simple HTML view of the XML tree, it has evolved into a system that keeps a colorful history of all collected data.
The Ganglia web frontend caters to system administrators and users. For example, one can view the CPU utilization over the past hour, day, week, month, or year. The web frontend shows similar graphs for Memory usage, disk usage, network statistics, number of running processes, and all other Ganglia metrics.
The web frontend depends on the existence of the Gmetad which provides it with data from several Ganglia sources. Specifically, the web frontend will open the local port 8651 (by default) and expects to receive a Ganglia XML tree. The web pages themselves are highly dynamic; any change to the Ganglia data appears immediately on the site. This behavior leads to a very responsive site, but requires that the full XML tree be parsed on every page access. Therefore, the Ganglia web frontend should run on a fairly powerful, dedicated machine if it presents a large amount of data. Currently, a Ganglia Grid containing ~15 clusters with over 500 hosts and 900 CPUs is running succesfully on a dual 600Mhz Pentium III Linux server, with all RRD graph files on a ramdisk. See this page at meta.rocksclusters.org.
The Ganglia web frontend is written in the PHP scripting language, and uses graphs generated by Gmetad to display history information. It has been tested on many flavours of Unix (primarily Linux) with the Apache webserver and the PHP 4.1 module.