Cloud Insight安装及其一些体验心得


#1

不知不觉,Cloud Insight已经出了好几个版本了。这边简述下云服务器上安装Cloud Insight及其配置报警策略的一些心得。

Cloud Insight可以监控很多很多平台乃至中间件,这边着重对tomcat集群做了脚本优化。

由于云服务器上不是每一台都有外网权限,所以思路是使用nginx来作为转发,并且由于tomcat比较多,一台服务器上有很多个,所以做了一个shell脚本来自动设置jmxport

先根据http://yum.oneapm.com确定最新的rpm包下载到nas上
事先先将自己的appkey编辑进oneapm-ci-agent.conf并打zip包

过程
sh /data/client/devops-kit/OneAPM/ci/install.sh

mkdir -p /etc/oneapm-ci-agent/
unzip -o /data/client/devops-kit/OneAPM/ci/oneapm-ci-agent.conf.zip -d /etc/oneapm-ci-agent/
#cp -a /data/client/devops-kit/OneAPM/ci/oneapm-ci-agent.conf /etc/oneapm-ci-agent/
rpm -Uvh /data/client/devops-kit/OneAPM/ci/oneapm-ci-agent-4.4.0-1.x86_64.rpm
#rm -rf /etc/oneapm-ci-agent/conf.d/nginx.yaml
#cp -a /data/client/devops-kit/OneAPM/ci/nginx.yaml /etc/oneapm-ci-agent/conf.d/nginx.yaml
#cp -a /data/client/devops-kit/OneAPM/ci/nginx_status.conf /mall/web/nginx/conf/vhost/
#cp -a /etc/oneapm-ci-agent/conf.d/tomcat.yaml.example /etc/oneapm-ci-agent/conf.d/tomcat.yaml
/etc/init.d/oneapm-ci-agent restart

cat oneapm-ci-agent.conf
[Main]
# The host of the OneAPM data collector server to send Agent data to
#ci_url: https://dc-cloud.oneapm.com
ci_url:http://172.18.10.63:8087
# If you need a proxy to connect to the Internet, provide the settings here
# proxy_host: my-proxy.com
# proxy_port: 3128
# proxy_user: user
# proxy_password: password
# To be used with some proxys that return a 302 which make curl switch from POST to GET
# proxy_forbid_method_switch: no
# If you run the agent behind haproxy, you might want to set this to yes
skip_ssl_validation: yes
# The OneAPM license key to associate your Agent's data with your organization.
license_key: 你的appkey
# Force the hostname to whatever you want.
#hostname: mymachine.mydomain
# ========================================================================== #
# Logging
# ========================================================================== #
# log_level: INFO
# collector_log_file: /var/log/oneapm-ci-agent/collector.log
# forwarder_log_file: /var/log/oneapm-ci-agent/forwarder.log
# onestatsd_log_file: /var/log/oneapm-ci-agent/onestatsd.log
# if syslog is enabled but a host and port are not set, a local domain socket
# connection will be attempted
#
# log_to_syslog: yes
# syslog_host:
# syslog_port:

nginx段的代码

[root@DEAL05 vhost]# cat 8087.conf 
server {  
     	server_name	172.18.10.63 127.0.0.1;
	listen	8087;
	access_log logs/8087.log main;
	#OneAPM专用#
 	location / {
        proxy_set_header X-Real-IP        $remote_addr;
        proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for;
	proxy_pass https://tpm.oneapm.com:443/;
        proxy_redirect  default;
	}	
 	location /infrastructure/ {
        proxy_set_header X-Real-IP        $remote_addr;
        proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for;
	proxy_pass https://dc-cloud.oneapm.com/infrastructure/ ;
        proxy_redirect  default;
	}

nginx建议使用虚拟IP后做集群处理。消灭单点设备

其实nginx这样做转发还有别的好处,在与外部平台接口过多的情况下,建议走统一的nginx转发出去,在下一次服务器容灾或者机房搬迁时候会很有帮助,包括但不限于防火墙策略,IP鉴权等等

配置了上面这些,基本上CI里面就能显示出平台来了。但我们要做的还远远不够。要将所有中间件加进去还有很多路。
这边着重举tomcat为例子
查看tomcat.yaml得知,我们需要配置jmx端口。
前期我们在做ai的时候,打包了定制的tomcat包,http://club.oneapm.com/t/ai-ci/583

这边做CI,我们也可以预埋一个彩蛋进定制版tomcat
在tomcat的/bin/catalina.sh中 插入

CATALINA_OPTS="-Dcom.sun.management.jmxremote  -Dcom.sun.management.jmxremote.port=65537  -Dcom.sun.management.jmxremote.ssl=false  -Dcom.sun.management.jmxremote.authenticate=false"

并在/bin/start.sh中插入

jmxremoteport_num_65537=`cat ./catalina.sh | grep -o 65537 | wc -l`
if [ $jmxremoteport_num_65537 -eq 1 ];then
	cat ../conf/server.xml|sed 's/<!--/&\n/;s/-->/\n&/;'|sed '/<!--/,/-->/d'>se.xml || continue
	pport=`cat se.xml|grep Connector|grep HTTP|awk -F"\"" '{print $2}'`
	jmxport=`expr $pport - 1000`
	sed -i "s/65537/$jmxport/g" catalina.sh
	rm -rf se.xml
fi

加上ai用的文件,我们打包好的tomcat,在部署新的tomcat的时候只要mkdir先
然后

alias installtomcat='unzip /data/client/devops-kit/tomcat/apache-tomcat-7.0.67.zip -d $PWD && chmod -R 775 *'

这样就能快速部署。当然,还有个脚本用于ai的安装,我集成进了重启脚本里,这边就不放出了。
这样,单个的tomcat的jmx就能自动生成。但是服务器上有很多tomcat,这时就需要另一个脚本来遍历

#/bin/bash
tomcatprogram()
{
	dir2=`dirname $program`
	dir1=`dirname $dir2`
	basename1=`basename $dir1`
    ifconf_num=`/sbin/ifconfig bond0|wc -l`
if [[ ifconf_num -eq 0 ]];then
        ipaddr=`/sbin/ifconfig eth0|sed -n "2"p|awk '{print $2}'|awk -F: '{print $2}'`
else
        ipaddr=`/sbin/ifconfig bond0|sed -n "2"p|awk '{print $2}'|awk -F: '{print $2}'`
fi
    fin=`find $dir1 -maxdepth 2 -name catalina.jar && find $dir1 -maxdepth 2 -name servlet-api.jar&& find $dir1 -maxdepth 2 -name server.xml && find $dir1 -maxdepth 1 -name work -type d `
[ -z $fin ] > /dev/null 2>&1 && continue 

cat $dir1/conf/server.xml|sed 's/<!--/&\n/;s/-->/\n&/;'|sed '/<!--/,/-->/d'>se.xml || continue
pport=`cat se.xml|grep Connector|grep HTTP|awk -F"\"" '{print $2}'`
programdir1=`cat se.xml|grep docBase|awk -F"\"" '{print $8}'`
jmxport=`cat $dir1/bin/catalina.sh|grep "jmxremote.port" | awk '{print $2}' |tr -d "/Dcom.sun.management.jmxremote.port=" | sed 's/-//g'`
if [ -n $programdir1  ]; then
	programdir=$programdir1
fi
if [ -z $programdir1 ]; then
	programdir=$dir1/webapps
fi	

if [[ -n  "$dir1" ]] && [[ -n "$pport" ]] ; then
	echo  -e '\033[1;32mprogram at: '$dir1'\033[0m'
	echo  -e "\e[42mprogram http port is: '$pport'\e[0m"
	echo  -e "\e[42mprogram jmxremote port is '$jmxport'\e[0m"
            echo  -e '\033[1;32mprogram process at: '$programdir'\033[0m'

	#echo "program at: $dir1"
	#echo "program http port is: $pport"
	#echo "program jmxremote port is $jmxport"
	#echo "program process at: $programdir"
	echo "$basename1	$pport	$jmxport">>/backup/monitor_shell/health_check/conf/$ipaddr
	#process is starting?
	rund=` netstat -nap|grep $pport  &&  ps -ef|grep $dir1  `
	[ -z "$rund"  ] && echo -e "$dir1 do't run \n" 
	[ -n "$rund"  ] && echo -e "$dir1 running \n" 
fi
}
###########find tomcat process########
fprocess=`locate catalina.sh|grep catalina.sh$`
total=`echo $fprocess | awk -F" " '{print NF}'` 
#if system do't have tomcat,exit
[ $total -eq 0 ] && echo "you system do't have tomcat" && exit 1
if [[ ifconf_num -eq 0 ]];then
        ipaddr=`/sbin/ifconfig eth0|sed -n "2"p|awk '{print $2}'|awk -F: '{print $2}'`
else
	ipaddr=`/sbin/ifconfig bond0|sed -n "2"p|awk '{print $2}'|awk -F: '{print $2}'`
fi
rm -rf /backup/monitor_shell/health_check/conf/$ipaddr
for (( i=1;i<=$total;i++ )) 
 do 
	program=`echo $fprocess|awk '{print $('$i')}'`
	tomcatprogram
done 
cat /backup/monitor_shell/health_check/conf/$ipaddr
rm -f se.xml

用这个脚本前记得 updatedb更新索引

这样就能获知服务器上已有的tomcat的服务端口以及jmx端口,在新部署tomcat时候也是非常有用的。

在编辑tomcat.yaml时候,格式会很重要,官方没提供检测格式的工具,我自己找了一个http://www.json2yaml.com/ 可以格式成正确的格式。批量设置后即可重启ci服务。
当然,可以设置crontab 每天自动检测新增加的tomcat并添加进ci。

报警策略那边目前只对redisblock做了一个提示,当redis集群中block大于等于1时候会有一个告警邮件,目前只做提示告知。

onealert那边我用的是zabbix,目前还没适配好自定义脚本,此前使用crontab做了一个每日笑话倒是蛮好玩的。


#2

感谢分享配置经验:stuck_out_tongue_winking_eye: