注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

PostgreSQL research

公益是一辈子的事, I'm 德哥@Digoal, Just Do it!

 
 
 

日志

 
 

gmond's XML introduction  

2014-09-12 11:55:38|  分类: NOC |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
本文讲一下从gmond监听端口导出的XML的信息, 便于理解 : 
监听端口如果你没改的话, 默认是8649

[root@db-172-16-3-221 ganglia-web]# netstat -anp|grep gmond
tcp        0      0 0.0.0.0:8649                0.0.0.0:*                   LISTEN      4462/gmond          
udp        0      0 239.2.11.71:8649            0.0.0.0:*                               4462/gmond          
udp        0      0 172.16.3.221:28390          239.2.11.71:8649            ESTABLISHED 4462/gmond 

使用telnet导出xml的信息

[root@db-172-16-3-221 ganglia-web]# telnet 127.0.0.1 8649
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
# 文档结构
<!DOCTYPE GANGLIA_XML [
   <!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
      <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
      <!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
   <!ELEMENT GRID (CLUSTER | GRID | HOSTS | METRICS)*>
      <!ATTLIST GRID NAME CDATA #REQUIRED>
      <!ATTLIST GRID AUTHORITY CDATA #REQUIRED>
      <!ATTLIST GRID LOCALTIME CDATA #IMPLIED>
   <!ELEMENT CLUSTER (HOST | HOSTS | METRICS)*>
      <!ATTLIST CLUSTER NAME CDATA #REQUIRED>
      <!ATTLIST CLUSTER OWNER CDATA #IMPLIED>
      <!ATTLIST CLUSTER LATLONG CDATA #IMPLIED>
      <!ATTLIST CLUSTER URL CDATA #IMPLIED>
      <!ATTLIST CLUSTER LOCALTIME CDATA #REQUIRED>
   <!ELEMENT HOST (METRIC)*>
      <!ATTLIST HOST NAME CDATA #REQUIRED>
      <!ATTLIST HOST IP CDATA #REQUIRED>
      <!ATTLIST HOST LOCATION CDATA #IMPLIED>
      <!ATTLIST HOST TAGS CDATA #IMPLIED>
      <!ATTLIST HOST REPORTED CDATA #REQUIRED>
      <!ATTLIST HOST TN CDATA #IMPLIED>
      <!ATTLIST HOST TMAX CDATA #IMPLIED>
      <!ATTLIST HOST DMAX CDATA #IMPLIED>
      <!ATTLIST HOST GMOND_STARTED CDATA #IMPLIED>
   <!ELEMENT METRIC (EXTRA_DATA*)>
      <!ATTLIST METRIC NAME CDATA #REQUIRED>
      <!ATTLIST METRIC VAL CDATA #REQUIRED>
      <!ATTLIST METRIC TYPE (string | int8 | uint8 | int16 | uint16 | int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRIC UNITS CDATA #IMPLIED>
      <!ATTLIST METRIC TN CDATA #IMPLIED>
      <!ATTLIST METRIC TMAX CDATA #IMPLIED>
      <!ATTLIST METRIC DMAX CDATA #IMPLIED>
      <!ATTLIST METRIC SLOPE (zero | positive | negative | both | unspecified) #IMPLIED>
      <!ATTLIST METRIC SOURCE (gmond) 'gmond'>
   <!ELEMENT EXTRA_DATA (EXTRA_ELEMENT*)>
   <!ELEMENT EXTRA_ELEMENT EMPTY>
      <!ATTLIST EXTRA_ELEMENT NAME CDATA #REQUIRED>
      <!ATTLIST EXTRA_ELEMENT VAL CDATA #REQUIRED>
   <!ELEMENT HOSTS EMPTY>
      <!ATTLIST HOSTS UP CDATA #REQUIRED>
      <!ATTLIST HOSTS DOWN CDATA #REQUIRED>
      <!ATTLIST HOSTS SOURCE (gmond | gmetad) #REQUIRED>
   <!ELEMENT METRICS (EXTRA_DATA*)>
      <!ATTLIST METRICS NAME CDATA #REQUIRED>
      <!ATTLIST METRICS SUM CDATA #REQUIRED>
      <!ATTLIST METRICS NUM CDATA #REQUIRED>
      <!ATTLIST METRICS TYPE (string | int8 | uint8 | int16 | uint16 | int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRICS UNITS CDATA #IMPLIED>
      <!ATTLIST METRICS SLOPE (zero | positive | negative | both | unspecified) #IMPLIED>
      <!ATTLIST METRICS SOURCE (gmond) 'gmond'>
]>
<GANGLIA_XML VERSION="3.6.0" SOURCE="gmond">

# 集群信息, 对应gmond.conf里面的cluster章节的配置.
<CLUSTER NAME="test" LOCALTIME="1410489748" OWNER="digoal" LATLONG="111 122" URL="http://dba.sky-mobi.com">

# 主机信息, 对应gmond.conf里面的host和globals章节的配置. 
# 解释一下TN, TMAX, DMAX
# TN是一个变量, 等于最后一次head time距离现在的时间(秒). 参见本文末尾代码的解释. 所以每次telnet得到的TN值是变的.
# TMAX对应配置host_tmax, 
# DMAX对应配置host_dmax, 
# 含义如下 : 
# 取自man gmond.conf
       The host_dmax value is an integer with units in seconds.  When set to zero (0), gmond will never delete a host
       from its list even when a remote host has stopped reporting.  If host_dmax is set to a positive number then
       gmond will flush a host after it has not heard from it for host_dmax seconds.  By the way, dmax means "delete
       max".

       The host_tmax value is an integer with units in seconds. This value represents the maximum amount of time that
       gmond should wait between updates from a host. As messages may get lost in the network, gmond will consider the
       host as being down if it has not received any messages from it after 4 times this value. For example, if
       host_tmax is set to 20, the host will appear as down after 80 seconds with no messages from it. By the way,
       tmax means "timeout max".
# 当tn大于dmax时, 将触发delete, 在delete前, 会等待一些时间, 这个配置就是cleanup_threshold.
       The cleanup_threshold is the minimum amount of time before gmond will cleanup any hosts or metrics where tn >
       dmax a.k.a. expired data.

<HOST NAME="db-172-16-3-221.localdomain" IP="172.16.3.221" TAGS="" REPORTED="1410489745" TN="2" TMAX="20" DMAX="86400" LOCATION="1,2,3" GMOND_STARTED="1410332246">

# 接下来开始的是gmond.conf里面对应的collection_group章节的配置. 包含所有的metric.
# metric的属性有: name, value, type, 单位, tn, tmax, dmax, slope.
# 首先看看gmetric的数据结构
lib/gm_protocol.h
struct Ganglia_25metric {
        int key;
        char *name;
        int tmax;
        Ganglia_value_types type;
        char *units;
        char *slope;
        char *fmt;
        int msg_size;
        char *desc;
        int *metadata;
};
# 简单介绍一下tn, tmax, dmax, slope的含义.
# 注意这里的tmax, dmax和HOST章节的host_tmax, host_dmax不一样.
# TN同样是一个变量, 等于最后一次head time距离现在的时间(秒). 参见本文末尾代码的解释. 所以每次telnet得到的TN值是变的.
# TMAX, 发生metric对应组内的所有metric到send channel的时间间隔, 对应collection_group中time_threshold配置.
# DMAX, metric数据结构里没有这个值, 应该是XML结构的问题.  可以认为metric里的DMAX是没有一样的
# slope的解释, 表示采样值存在 不增不减, 只增, 只减, 即增即减的几种风格.
# 参考man gmetric
       -s, --slope=STRING
              Either zero|positive|negative|both  (default=‘both’)
Specifies the slope/derivative type of the metric being submitted. The
possible values are zerofor a zero slope metric, positivefor an
increment-only metric, negativefor a decrement-only metric, and
bothfor an arbitrarily changing metric. The default value is both.
Using the value positivefor the slope of a new metric will cause
the corresponding RRD file to be generated as a COUNTER, with delta
values being displayed instead of the actual metric values.
# 其实还有3个重要配置
# collection_group里面
  collect_once = yes  # 表示这个组内的metric只采集一次
  collect_every = 20  # 表示这个组内的metric间隔20秒采集一次.
# metric内的配置
    value_threshold = "1.0"   # 表示当前采样值和前一次的采样值出现1.0的偏差的时候, 将组内的所有metric到send channel
# 还包含分组, 描述, 抬头.
<METRIC NAME="mem_shared" VAL="0" TYPE="float" UNITS="KB" TN="2" TMAX="180" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="memory"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Amount of shared memory"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Shared Memory"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_wio" VAL="0.0" TYPE="float" UNITS="%" TN="40" TMAX="90" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU wio"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="machine_type" VAL="x86_64" TYPE="string" UNITS="" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="system"/>
<EXTRA_ELEMENT NAME="DESC" VAL="System architecture"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Machine Type"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="proc_total" VAL="1106" TYPE="uint32" UNITS=" " TN="282" TMAX="950" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="process"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Total number of processes"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Total Processes"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_num" VAL="48" TYPE="uint16" UNITS="CPUs" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Total number of CPUs"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU Count"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_speed" VAL="1995" TYPE="uint32" UNITS="MHz" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="CPU Speed in terms of MHz"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU Speed"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="pkts_out" VAL="0.47" TYPE="float" UNITS="packets/sec" TN="162" TMAX="300" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="network"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Packets out per second"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Packets Sent"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="swap_free" VAL="1048568" TYPE="float" UNITS="KB" TN="2" TMAX="180" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="memory"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Amount of available swap memory"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Free Swap Space"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_steal" VAL="0.0" TYPE="float" UNITS="%" TN="40" TMAX="90" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="cpu_steal"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU steal"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="load_one" VAL="0.00" TYPE="float" UNITS=" " TN="83" TMAX="70" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="load"/>
<EXTRA_ELEMENT NAME="DESC" VAL="One minute load average"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="One Minute Load Average"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="mem_total" VAL="99045568" TYPE="float" UNITS="KB" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="memory"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Total amount of memory displayed in KBs"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Memory Total"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="os_release" VAL="2.6.32-431.el6.x86_64" TYPE="string" UNITS="" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="system"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Operating system release date"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Operating System Release"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="proc_run" VAL="0" TYPE="uint32" UNITS=" " TN="282" TMAX="950" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="process"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Total number of running processes"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Total Running Processes"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="load_five" VAL="0.00" TYPE="float" UNITS=" " TN="83" TMAX="325" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="load"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Five minute load average"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Five Minute Load Average"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="gexec" VAL="OFF" TYPE="string" UNITS="" TN="298" TMAX="300" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="core"/>
<EXTRA_ELEMENT NAME="DESC" VAL="gexec available"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Gexec Status"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="disk_free" VAL="3208.387" TYPE="double" UNITS="GB" TN="174" TMAX="180" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="disk"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Total free disk space"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Disk Space Available"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="mem_cached" VAL="1362116" TYPE="float" UNITS="KB" TN="2" TMAX="180" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="memory"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Amount of cached memory"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Cached Memory"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="pkts_in" VAL="3.72" TYPE="float" UNITS="packets/sec" TN="162" TMAX="300" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="network"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Packets in per second"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Packets Received"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="bytes_in" VAL="314.24" TYPE="float" UNITS="bytes/sec" TN="162" TMAX="300" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="network"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Number of bytes in per second"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Bytes Received"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="bytes_out" VAL="45.29" TYPE="float" UNITS="bytes/sec" TN="162" TMAX="300" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="network"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Number of bytes out per second"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Bytes Sent"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="swap_total" VAL="1048568" TYPE="float" UNITS="KB" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="memory"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Total amount of swap space displayed in KBs"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Swap Space Total"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="mem_free" VAL="95977456" TYPE="float" UNITS="KB" TN="2" TMAX="180" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="memory"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Amount of available memory"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Free Memory"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="load_fifteen" VAL="0.00" TYPE="float" UNITS=" " TN="83" TMAX="950" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="load"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Fifteen minute load average"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Fifteen Minute Load Average"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="os_name" VAL="Linux" TYPE="string" UNITS="" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="system"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Operating system name"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Operating System"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="boottime" VAL="1410314161" TYPE="uint32" UNITS="s" TN="300" TMAX="1200" DMAX="0" SLOPE="zero">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="system"/>
<EXTRA_ELEMENT NAME="DESC" VAL="The last time that the system was started"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Last Boot Time"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_idle" VAL="100.0" TYPE="float" UNITS="%" TN="40" TMAX="90" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU Idle"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_user" VAL="0.0" TYPE="float" UNITS="%" TN="40" TMAX="90" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Percentage of CPU utilization that occurred while executing at the user level"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU User"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_nice" VAL="0.0" TYPE="float" UNITS="%" TN="40" TMAX="90" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Percentage of CPU utilization that occurred while executing at the user level with nice priority"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU Nice"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_aidle" VAL="0.0" TYPE="float" UNITS="%" TN="40" TMAX="3800" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Percent of time since boot idle CPU"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU aidle"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="mem_buffers" VAL="94408" TYPE="float" UNITS="KB" TN="2" TMAX="180" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="memory"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Amount of buffered memory"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Memory Buffers"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="cpu_system" VAL="0.0" TYPE="float" UNITS="%" TN="40" TMAX="90" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Percentage of CPU utilization that occurred while executing at the system level"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU System"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="part_max_used" VAL="30.7" TYPE="float" UNITS="%" TN="174" TMAX="180" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="disk"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Maximum percent used for all partitions"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Maximum Disk Space Used"/>
</EXTRA_DATA>
</METRIC>
<METRIC NAME="disk_total" VAL="3862.720" TYPE="double" UNITS="GB" TN="2701" TMAX="1200" DMAX="0" SLOPE="both">
<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="disk"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Total available disk space"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="Total Disk Space"/>
</EXTRA_DATA>
</METRIC>
</HOST>
</CLUSTER>
</GANGLIA_XML>
Connection closed by foreign host.


以上XML dump对应的配置文件 : 

[root@db-172-16-3-221 ganglia-web]# cat /opt/ganglia-core-3.6.0/etc/gmond.conf 
/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  # By default gmond will use reverse DNS resolution when displaying your hostname
  # Uncommeting following value will override that value.
  # override_hostname = "mywebserver.domain.com"
  # If you are not using multicast this value should be set to something other than 0.
  # Otherwise if you restart aggregator gmond you will get empty graphs. 60 seconds is reasonable
  send_metadata_interval = 0 /*secs */

}

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "test"
  owner = "digoal"
  latlong = "111 122"
  url = "http://dba.sky-mobi.com"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "1,2,3"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine's hostname.  Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 10
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really
  # should bump it up to e.g. 10MB or even higher.
  buffer = 10485760
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
  # If you want to gzip XML output
  gzip_output = no
}

/* Channel to receive sFlow datagrams */
#udp_recv_channel {
#  port = 6343
#}

/* Optional sFlow settings */
#sflow {
# udp_port = 6343
# accept_vm_metrics = yes
# accept_jvm_metrics = yes
# multiple_jvm_instances = no
# accept_http_metrics = yes
# multiple_http_instances = no
# accept_memcache_metrics = yes
# multiple_memcache_instances = no
#}

/* Each metrics module that is referenced by gmond must be specified and
   loaded. If the module has been statically linked with gmond, it does
   not require a load path. However all dynamically loadable modules must
   include a load path. */
modules {
  module {
    name = "core_metrics"
  }
  module {
    name = "cpu_module"
    path = "modcpu.so"
  }
  module {
    name = "disk_module"
    path = "moddisk.so"
  }
  module {
    name = "load_module"
    path = "modload.so"
  }
  module {
    name = "mem_module"
    path = "modmem.so"
  }
  module {
    name = "net_module"
    path = "modnet.so"
  }
  module {
    name = "proc_module"
    path = "modproc.so"
  }
  module {
    name = "sys_module"
    path = "modsys.so"
  }
}

/* The old internal 2.5.x metric array has been replaced by the following
   collection_group directives.  What follows is the default behavior for
   collecting and sending metrics that is as close to 2.5.x behavior as
   possible. */

/* This collection group will cause a heartbeat (or beacon) to be sent every
   20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses
   the age of the running gmond. */
collection_group {
  collect_once = yes
  time_threshold = 20
  metric {
    name = "heartbeat"
  }
}

/* This collection group will send general info about this host every
   1200 secs.
   This information doesn't change between reboots and is only collected
   once. */
collection_group {
  collect_once = yes
  time_threshold = 1200
  metric {
    name = "cpu_num"
    title = "CPU Count"
  }
  metric {
    name = "cpu_speed"
    title = "CPU Speed"
  }
  metric {
    name = "mem_total"
    title = "Memory Total"
  }
  /* Should this be here? Swap can be added/removed between reboots. */
  metric {
    name = "swap_total"
    title = "Swap Space Total"
  }
  metric {
    name = "boottime"
    title = "Last Boot Time"
  }
  metric {
    name = "machine_type"
    title = "Machine Type"
  }
  metric {
    name = "os_name"
    title = "Operating System"
  }
  metric {
    name = "os_release"
    title = "Operating System Release"
  }
  metric {
    name = "location"
    title = "Location"
  }
}

/* This collection group will send the status of gexecd for this host
   every 300 secs.*/
/* Unlike 2.5.x the default behavior is to report gexecd OFF. */
collection_group {
  collect_once = yes
  time_threshold = 300
  metric {
    name = "gexec"
    title = "Gexec Status"
  }
}

/* This collection group will collect the CPU status info every 20 secs.
   The time threshold is set to 90 seconds.  In honesty, this
   time_threshold could be set significantly higher to reduce
   unneccessary  network chatter. */
collection_group {
  collect_every = 20
  time_threshold = 90
  /* CPU status */
  metric {
    name = "cpu_user"
    value_threshold = "1.0"
    title = "CPU User"
  }
  metric {
    name = "cpu_system"
    value_threshold = "1.0"
    title = "CPU System"
  }
  metric {
    name = "cpu_idle"
    value_threshold = "5.0"
    title = "CPU Idle"
  }
  metric {
    name = "cpu_nice"
    value_threshold = "1.0"
    title = "CPU Nice"
  }
  metric {
    name = "cpu_aidle"
    value_threshold = "5.0"
    title = "CPU aidle"
  }
  metric {
    name = "cpu_wio"
    value_threshold = "1.0"
    title = "CPU wio"
  }
  metric {
    name = "cpu_steal"
    value_threshold = "1.0"
    title = "CPU steal"
  }
  /* The next two metrics are optional if you want more detail...
     ... since they are accounted for in cpu_system.
  metric {
    name = "cpu_intr"
    value_threshold = "1.0"
    title = "CPU intr"
  }
  metric {
    name = "cpu_sintr"
    value_threshold = "1.0"
    title = "CPU sintr"
  }
  */
}

collection_group {
  collect_every = 20
  time_threshold = 90
  /* Load Averages */
  metric {
    name = "load_one"
    value_threshold = "1.0"
    title = "One Minute Load Average"
  }
  metric {
    name = "load_five"
    value_threshold = "1.0"
    title = "Five Minute Load Average"
  }
  metric {
    name = "load_fifteen"
    value_threshold = "1.0"
    title = "Fifteen Minute Load Average"
  }
}

/* This group collects the number of running and total processes */
collection_group {
  collect_every = 80
  time_threshold = 950
  metric {
    name = "proc_run"
    value_threshold = "1.0"
    title = "Total Running Processes"
  }
  metric {
    name = "proc_total"
    value_threshold = "1.0"
    title = "Total Processes"
  }
}

/* This collection group grabs the volatile memory metrics every 40 secs and
   sends them at least every 180 secs.  This time_threshold can be increased
   significantly to reduce unneeded network traffic. */
collection_group {
  collect_every = 40
  time_threshold = 180
  metric {
    name = "mem_free"
    value_threshold = "1024.0"
    title = "Free Memory"
  }
  metric {
    name = "mem_shared"
    value_threshold = "1024.0"
    title = "Shared Memory"
  }
  metric {
    name = "mem_buffers"
    value_threshold = "1024.0"
    title = "Memory Buffers"
  }
  metric {
    name = "mem_cached"
    value_threshold = "1024.0"
    title = "Cached Memory"
  }
  metric {
    name = "swap_free"
    value_threshold = "1024.0"
    title = "Free Swap Space"
  }
}

collection_group {
  collect_every = 40
  time_threshold = 300
  metric {
    name = "bytes_out"
    value_threshold = 4096
    title = "Bytes Sent"
  }
  metric {
    name = "bytes_in"
    value_threshold = 4096
    title = "Bytes Received"
  }
  metric {
    name = "pkts_in"
    value_threshold = 256
    title = "Packets Received"
  }
  metric {
    name = "pkts_out"
    value_threshold = 256
    title = "Packets Sent"
  }
}

/* Different than 2.5.x default since the old config made no sense */
collection_group {
  collect_every = 1800
  time_threshold = 3600
  metric {
    name = "disk_total"
    value_threshold = 1.0
    title = "Total Disk Space"
  }
}

collection_group {
  collect_every = 40
  time_threshold = 180
  metric {
    name = "disk_free"
    value_threshold = 1.0
    title = "Disk Space Available"
  }
  metric {
    name = "part_max_used"
    value_threshold = 1.0
    title = "Maximum Disk Space Used"
  }
}

include ("/opt/ganglia-core-3.6.0/etc/conf.d/*.conf")


关于主机 TN的代码如下 : 

gmond.c
print_host_start( apr_socket_t *client, Ganglia_host *hostinfo)
{
  apr_size_t len;
  char hostxml[1024]; /* for <HOST></HOST> */
  apr_time_t now = apr_time_now();
  int tn = (now - hostinfo->last_heard_from) / APR_USEC_PER_SEC;

  len = apr_snprintf(hostxml, 1024, 
           "<HOST NAME=\"%s\" IP=\"%s\" TAGS=\"%s\" REPORTED=\"%d\" TN=\"%d\" TMAX=\"%d\" DMAX=\"%d\" LOCATION=\"%s\" GMOND_STARTED=\"%d\">\n",
                     hostinfo->hostname, 
                     hostinfo->ip,
                     tags ? tags : "",
                     (int)(hostinfo->last_heard_from / APR_USEC_PER_SEC),
                     tn,
                     host_tmax,
                     host_dmax,
                     hostinfo->location? hostinfo->location: "unspecified", 
                     hostinfo->gmond_started);

  return socket_send(client, hostxml, &len);
}

这里面引用到的APR_USEC_PER_SEC常量 : 

Apache Portable Runtime
apr_time.h
00060 /** number of microseconds per second */
00061 #define APR_USEC_PER_SEC APR_TIME_C(1000000)


注意extra_element和extra_data里面包含了组信息, 描述信息, 抬头信息.

<EXTRA_DATA>
<EXTRA_ELEMENT NAME="GROUP" VAL="cpu"/>
<EXTRA_ELEMENT NAME="DESC" VAL="Percentage of CPU utilization that occurred while executing at the user level"/>
<EXTRA_ELEMENT NAME="TITLE" VAL="CPU User"/>
</EXTRA_DATA>

使用allow_extra_data attribute可以关闭EXTRA_ELEMENT 和 EXTRA_DATA 的输出.

man gmond.conf
       The allow_extra_data attribute is a boolean.  When false, gmond will not send out the EXTRA_ELEMENT and
       EXTRA_DATA parts of the XML.  This might be useful if you are using your own frontend to the metric data and
       will like to save some bandwith.

       The host_dmax value is an integer with units in seconds.  When set to zero (0), gmond will never delete a host
       from its list even when a remote host has stopped reporting.  If host_dmax is set to a positive number then
       gmond will flush a host after it has not heard from it for host_dmax seconds.  By the way, dmax means "delete
       max".

       The host_tmax value is an integer with units in seconds. This value represents the maximum amount of time that
       gmond should wait between updates from a host. As messages may get lost in the network, gmond will consider the
       host as being down if it has not received any messages from it after 4 times this value. For example, if
       host_tmax is set to 20, the host will appear as down after 80 seconds with no messages from it. By the way,
       tmax means "timeout max".

       The cleanup_threshold is the minimum amount of time before gmond will cleanup any hosts or metrics where tn >
       dmax a.k.a. expired data.


[参考]
1. gmond.c
2. man gmond.conf
3. man gmetric

Flag Counter
  评论这张
 
阅读(852)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016