4. Commandline Tools

Last Updated: $Date: 2002/10/16 18:35:13 $

4.1. Ganglia Metric Tool (gmetric)

The Ganglia Metric Tool (gmetric) allows you to easily monitor any arbitrary host metrics that you like expanding on the core metrics that gmond measures by default.

If you want help with the gmetric sytax, simply use the "help" commandline option

prompt> gmetric --help
gmetric 2.5.0

Purpose:
  The Ganglia Metric Client (gmetric) announces a metric
  value to all Ganglia Monitoring Daemons (gmonds) that are listening
  on the cluster multicast channel.

Usage: ganglia-monitor-core [OPTIONS]...
   -h         --help                  Print help and exit
   -V         --version               Print version and exit
   -nSTRING   --name=STRING           Name of the metric
   -vSTRING   --value=STRING          Value of the metric
   -tSTRING   --type=STRING           Either string|int8|uint8|int16|uint16|int32|uint32|float|double
   -uSTRING   --units=STRING          Unit of measure for the value e.g. Kilobytes, Celcius
   -sSTRING   --slope=STRING          Either zero|positive|negative|both (default='both')
   -xINT      --tmax=INT              The maximum time in seconds between gmetric calls (default=60)
   -cSTRING   --mcast_channel=STRING  Multicast channel to send/receive on (default='239.2.11.71')
   -pINT      --mcast_port=INT        Multicast port to send/receive on (default=8649)
   -iSTRING   --mcast_if=STRING       Network interface to multicast on e.g. 'eth1' (default='kernel decides')
   -lINT      --mcast_ttl=INT         Multicast Time-To-Live (TTL) (default=1)

The gmetric tool formats a special multicast message and sends it to all gmonds that are listening.

All metrics in ganglia have a name, value, type and optionally units. For example, say I wanted to measure the temperature of my CPU (something gmond doesn't do) then I could multicast this metric with name="temperature", value="63", type="int16" and units="Celcius".

Assume I have a program called cputemp which outputs in text the temperature of the CPU

prompt> cputemp
63

I could easily send this data to all listening gmonds by running

prompt> gmetric --name temperature --value `cputemp` --type int16 \
--units Celcius

Check the exit value of gmetric to see if it successfully sent the data: 0 on success and -1 on failure.

To constantly sample this temperature metric, you just need too add this command to your cron table.

4.2. Ganglia Cluster Status Tool (gstat)

The Ganglia Cluster Status Tool (gstat) is a commandline utility that allows you to get status report for your cluster. With time, it will be a more flexible way to query a gmond running locally or remotely.

Commandline Options

To get the commandline options simply run...

prompt> gstat --help
gstat 2.5.0

Purpose:
  The Ganglia Status Client (gstat) connects with a
  Ganglia Monitoring Daemon (gmond) and output a load-balanced list
  of cluster hosts

Usage: gstat [OPTIONS]...
   -h         --help             Print help and exit
   -V         --version          Print version and exit
   -a         --all              List all hosts.  Not just hosts running gexec (default=off)
   -d         --dead             Print only the hosts which are dead (default=off)
   -m         --mpifile          Print a load-balanced mpifile (default=off)
   -1         --single_line      Print host and information all on one line (default=off)
   -l         --list             Print ONLY the host list (default=off)
   -iSTRING   --gmond_ip=STRING  Specify the ip address of the gmond to query (default='127.0.0.1')
   -pINT      --gmond_port=INT   Specify the gmond port to query (default=8649)

Running gstat without any parameters will cause it print a load-balanced (least-loaded host first) list of all the hosts running gmond along with the process, load, and CPU information. If you want to see which hosts are down in your cluster, use the --dead gstat option. You can also have gstat produce a dynamic load-balanced mpimachine file with the --mpifile option.

Gstat Examples

Get a load-balanced list of hosts that are up...

prompt> gstat
CLUSTER INFORMATION
       Name: unspecified
      Hosts: 97
Gexec Hosts: 73
 Dead Hosts: 0
  Localtime: Mon Apr 22 16:58:43 2002

CLUSTER HOSTS
Hostname                     LOAD                       CPU              Gexec
 CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle]

mm92.millennium.berkeley.edu
    4 (    1/   97) [  1.10,  1.19,  0.99] [   5.9,   0.0,   0.5, 100.0] ON
mm98.Millennium.Berkeley.EDU
    4 (    0/   80) [  1.16,  1.67,  1.25] [   4.1,   0.0,   0.2,  98.5] ON
mm91.Millennium.Berkeley.EDU
    4 (    1/   87) [  1.67,  1.78,  1.69] [  25.0,   0.0,   0.7,  74.9] ON
mm75.millennium.berkeley.edu
    4 (    3/  103) [  1.85,  2.54,  1.83] [  72.6,   0.0,   0.2,  50.3] ON
mm67.millennium.Berkeley.EDU
    4 (    4/  112) [  1.89,  2.08,  1.38] [  81.4,   0.0,   0.1,  38.5] ON
mm87.millennium.berkeley.edu
    4 (    4/  112) [  1.95,  1.67,  1.27] [   3.2,   0.0,   0.4,  96.4] ON
mm83.millennium.Berkeley.EDU
    4 (    1/  120) [  2.00,  2.59,  2.24] [  25.0,   0.0,   0.0,  75.0] ON
mm10.millennium.Berkeley.EDU
    2 (    0/   77) [  0.00,  0.06,  0.07] [   0.2,   0.0,   0.0,  99.9] ON
...

To get create a dynamic load-balanced mpifile list

prompt> gstat --mpifile
mm56.Millennium.Berkeley.EDU:4
mm44.Millennium.Berkeley.EDU:4
mm31.Millennium.Berkeley.EDU:2
mm43.Millennium.Berkeley.EDU:4
mm15.Millennium.Berkeley.EDU:2
...