[toc]
温故而知新
Messaaging Layer: heartbaet v1,v2,v3 corosync v1, v2(votequorum) OpenAIS CRM: pacemaker 需要一个配置接口 配置接口:crmsh (由SUSE研发的),由pssh管理工具来实现 pcs (agent c/s ),pcsd来实现 conga(ricci运行在各结点上的进程/luci发送指令到各节点)组成 group,constraint(基于约束) rgmanager(cman) resouce group(资源组): failover domain RA: LSB: /etc/rc.d/init.d systemd: /etc/systemd/system/multi-user.wants 服务开机处于enable状态; OCF:[provider] heartbeat pacemaker linbit 基于内核跨节点的块设备 service stonith 高可用集群的可用方案: heartbeat v1 heartbeat v2 heartbeat v3 + pacemaker X corosync + pacemaker cman + rgmanager corosync + cman + pacemaker keepalived
Linux-HA-Cluster 2
Heartbeat集群之间传递心跳的方法
1、使用串型线缆
2、使用以太网通信
3、Unicast 单播,使用udpu,一般在不支持多播的网络中使用的
4、Mutlicast 多播/组播, 使用udp
5、Broadcast 广播,占用过高的网络资源
1、组播地址:
用于标识一个IP组播域:IANA把D类地址留给组播使用:224.0.0.0-239.255.255.255 永久组播地址:224.0.0.0-224.0.0.255 临时组播地址:224.0.1.0-238.255.255.255 建议使用临时组播地址 本地组播地址:239.0.0.0-239.255.255.255
2、配置以组播方式进行高可用集群通信
# 在Node1 和 Node2中执行 # yum -y install corosync pacemaker # rpm -ql corosync #查看corosync安装完之后所生成的配置文件; /etc/corosync /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf.example.udpu /etc/corosync/corosync.xml.example /etc/corosync/uidgid.d /etc/dbus-1/system.d/corosync-signals.conf /etc/logrotate.d/corosync /etc/sysconfig/corosync /etc/sysconfig/corosync-notifyd /usr/bin/corosync-blackbox /usr/bin/corosync-xmlproc /usr/lib/systemd/system/corosync-notifyd.service /usr/lib/systemd/system/corosync.service /usr/sbin/corosync /usr/sbin/corosync-cfgtool /usr/sbin/corosync-cmapctl /usr/sbin/corosync-cpgtool /usr/sbin/corosync-keygen /usr/sbin/corosync-notifyd /usr/sbin/corosync-quorumtool /usr/share/corosync /usr/share/corosync/corosync /usr/share/corosync/corosync-notifyd /usr/share/corosync/xml2conf.xsl /usr/share/doc/corosync-2.4.0 /usr/share/doc/corosync-2.4.0/LICENSE /usr/share/doc/corosync-2.4.0/SECURITY /usr/share/man/man5/corosync.conf.5.gz /usr/share/man/man5/corosync.xml.5.gz /usr/share/man/man5/votequorum.5.gz /usr/share/man/man8/cmap_keys.8.gz /usr/share/man/man8/corosync-blackbox.8.gz /usr/share/man/man8/corosync-cfgtool.8.gz /usr/share/man/man8/corosync-cmapctl.8.gz /usr/share/man/man8/corosync-cpgtool.8.gz /usr/share/man/man8/corosync-keygen.8.gz /usr/share/man/man8/corosync-notifyd.8.gz /usr/share/man/man8/corosync-quorumtool.8.gz /usr/share/man/man8/corosync-xmlproc.8.gz /usr/share/man/man8/corosync.8.gz /usr/share/man/man8/corosync_overview.8.gz /usr/share/snmp/mibs/COROSYNC-MIB.txt /var/lib/corosync /var/log/cluster # rpm -ql pacemaker #查看pacemaker所生成的配置文件; /etc/sysconfig/pacemaker /usr/lib/ocf/resource.d/.isolation /usr/lib/ocf/resource.d/.isolation/docker-wrapper /usr/lib/ocf/resource.d/pacemaker/controld /usr/lib/ocf/resource.d/pacemaker/remote /usr/lib/systemd/system/pacemaker.service /usr/libexec/pacemaker/attrd /usr/libexec/pacemaker/cib /usr/libexec/pacemaker/cibmon /usr/libexec/pacemaker/crmd /usr/libexec/pacemaker/lrmd /usr/libexec/pacemaker/lrmd_internal_ctl /usr/libexec/pacemaker/pengine /usr/libexec/pacemaker/stonith-test /usr/libexec/pacemaker/stonithd /usr/sbin/crm_attribute /usr/sbin/crm_master /usr/sbin/crm_node /usr/sbin/pacemakerd /usr/sbin/stonith_admin /usr/share/doc/pacemaker-1.1.16 /usr/share/doc/pacemaker-1.1.16/COPYING /usr/share/doc/pacemaker-1.1.16/ChangeLog /usr/share/licenses/pacemaker-1.1.16 /usr/share/licenses/pacemaker-1.1.16/GPLv2 /usr/share/man/man7/crmd.7.gz /usr/share/man/man7/ocf_pacemaker_controld.7.gz /usr/share/man/man7/ocf_pacemaker_remote.7.gz /usr/share/man/man7/pengine.7.gz /usr/share/man/man7/stonithd.7.gz /usr/share/man/man8/crm_attribute.8.gz /usr/share/man/man8/crm_master.8.gz /usr/share/man/man8/crm_node.8.gz /usr/share/man/man8/pacemakerd.8.gz /usr/share/man/man8/stonith_admin.8.gz /usr/share/pacemaker/alerts /usr/share/pacemaker/alerts/alert_file.sh.sample /usr/share/pacemaker/alerts/alert_smtp.sh.sample /usr/share/pacemaker/alerts/alert_snmp.sh.sample /var/lib/pacemaker/cib /var/lib/pacemaker/pengine
2、配置corosync
# cd /etc/corosync/ # cp corosync.conf.example # cp corosync.conf.example corosync.conf # 某些环境中可能不支持组播。这时应该配置Corosync使用单播; # 下面是使用单播的Corosync 配置文件的一部分; totem { #... interface { ringnumber: 0 bindnetaddr: 10.180.22.0 broadcast: yes (1) mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 10.180.22.0 brodcast: yes mcastport: 5405 } transport: udpu (2) } nodelist{ (3) node { ring0_addr: 10.180.22.166 ring1_addr:10.180.55.1 nodedid: 1 } node { ring0_addr: 10.180.22.167 ring1_addr: 10.180.55.2 nodeid: 2 } # 如果将 broadcast 设置为yes,集群心跳将通过广播实现。设置该参数时,不能设置mcastaddr; # transport 配置项决定集群通信方式,要完全禁用组播,应该配置单播传输参数 udpu; # 这要求将所有的节点服务器信息写入nodelist; # 也就是需要在配置HA 集群之前确定节点组成。默认配置是udp。 通信方式类型还支持udpu和iba; # 在nodelist 之下可以为栽一节点设置只与该节点相关的信息,这些设置项只能包含在node之中; # 即只能对属于集群的节点服务器进行设置,而且只应包括那些与默认设置不同的参数; # 每台服务器都必需配置 ring0_addr; # 实验中配置信息如下 # Please read the corosync.conf.5 manual page totem { version: 2 # crypto_cipher and crypto_hash: Used for mutual node authentication. # If you choose to enable this, then do remember to create a shared # secret with "corosync-keygen". # enabling crypto_cipher, requires also enabling of crypto_hash. crypto_cipher: aes128 crypto_hash: sha1 secauth: on # interface: define at least one interface to communicate # over. If you define more than one interface stanza, you must # also set rrp_mode. interface { # Rings must be consecutively numbered, starting at 0. ringnumber: 0 # This is normally the *network* address of the # interface to bind to. This ensures that you can use # identical instances of this configuration file # across all your cluster nodes, without having to # modify this option. bindnetaddr: 10.180.0.0 # However, if you have multiple physical network # interfaces configured for the same subnet, then the # network address alone is not sufficient to identify # the interface Corosync should bind to. In that case, # configure the *host* address of the interface # instead: # bindnetaddr: 192.168.1.1 # When selecting a multicast address, consider RFC # 2365 (which, among other things, specifies that # 239.255.x.x addresses are left to the discretion of # the network administrator). Do not reuse multicast # addresses across multiple Corosync clusters sharing # the same network. mcastaddr: 239.185.1.31 # Corosync uses the port you specify here for UDP # messaging, and also the immediately preceding # port. Thus if you set this to 5405, Corosync sends # messages over UDP ports 5405 and 5404. mcastport: 5405 # Time-to-live for cluster communication packets. The # number of hops (routers) that this ring will allow # itself to pass. Note that multicast routing must be # specifically enabled on most network routers. ttl: 1 } } nodelist{ node { ring0_addr: 10.180.22.166 nodedid: 1 } node { ring0_addr: 10.180.22.167 nodeid: 2 } node { ring0_addr: 10.180.22.168 nodeid: 3 } } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to no. Useful when # running in the foreground (when invoking "corosync -f") to_stderr: no # Log to a log file. When set to "no", the "logfile" option # must not be set. to_logfile: yes logfile: /var/log/cluster/corosync.log # Log to the system log daemon. When in doubt, set to yes. to_syslog: no # Log debug messages (very verbose). When in doubt, leave off. debug: off # Log messages with time stamps. When in doubt, set to on # (unless you are only logging to syslog, where double # timestamps can be annoying). timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { # Enable and configure quorum subsystem (default: off) # see also corosync.conf.5 and votequorum.5 provider: corosync_votequorum } # 生成密钥文件 # corosync-keygen # scp -p authkey coronsync.conf root@node2.ssjinyao.com:/etc/corosync # corosync 启动集群投票系统默认必需需要三人结点; # systemctl start corosync.serivce #这个时候启动是失败的; # systemctl status corosync.serivce #查看corosync的状态信息; # 此时需要开启另一个结点 # yum -y install corosync pacemaker # Node1 中执行 # scp -p /etc/coronsync.conf /etc/authkey root@node2.ssjinyao.com:/etc/corosync
3、验证结点
# corosync-cfgtool -s # 验证结点 # corosync-cmapctl #查看结点间的信息; # grep -v '^[[:space:]]*#' /etc/corosync.conf # 保存实用的配置示例
4、编辑pacemaker
# vim /etc/sysconfig/pacemaker PCMK_logfile= /var/log/pacemaker.log # 各个节点间需要启动pacemaker # systemctl start pacemaker.service # crm_mon # 进行验证
5、 crm 工具的使用简明
# crm_node -n # 查看当前节点信息 # crm_node -l # 例出集群所有的节点信息 # crm_verify -L -V # 但看报错信息 # crm shell 的安装与使用,若无rpm包,可在网上找相关的rpm包 # yum -y install crmsh pssh python-pssh
HA Web Service
vip: 10.180.xx.xxx, ocf:hearbeat:IPaddr httpd: systemd nfs shared storage: ocf:heartbeat:Filesystem Ha Cluster 工作模型: A/P: 两节点集群; active/apsslve; without-quorum-policy=(stop|ignore|suicide|freeze) A/A:
# node0 # mkdir -pv /www/htdocs # echo "<h1> Test Page on NFS Server</h1>" > /www/htdocs/index.html # vim /etc/exports /www/htdocs 10.180.xx.xxx/16(rw) # iptables -L -n # 查看是否有防火墙规则 # checkconfig nfs on # systemctl start nfs
# node 1 # showmount -e 10.180.xx.xxx # mount -t nfs 10.180.xx.xxx:/www/htdocs /var/www/html # 挂载网络存储NFS # systemctl start httpd # systemctl enable httpd # 此时便可以查看能否可以能否访问到以上的测试信息
# node 2 # systemctl start httpd # systemctl enabl httpd
# crm configure crm(live)configure# show crm(live)# configure property crm(live)# configure property stonith-enabled=false crm(live)# configrue verify crm(live)# show crm(live)# configure commit crm(live)# configure crm(live)configure# primitive webip ocf:heartbeat:IPaddr2 params ip="10.180.xx.xxx" op monitor interval=30s timeout=20s crm(live)# configure show crm(live)# configure commit crm(live)# configure edit # 可以打开vim编辑配置文件 crm(live)# configure verify crm(live)# configure primitive webstore ocf:heartbat:Filesystem params device="10.180.xx.xxx:/www/htdocs" directory="/var/www/html" fstype="nfs" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=40s crm(live)# configure verify crm(live)configure# colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore ) crm(live)# show xmls # 查看xml格式的配置 crm(live)configure# order webstore_afer_webip Mandatory: webip webstore crm(live)configure# order webserver_after_webstore Manadatory: webstore webserver crm(live)configure# verify crm(live)configure# commit crm(live)configure# location webservice_perf_node1 webip 100: node1.ssjinao.com crm(live)configure# verify crm(live)configure# commit crom(live)configure# property default-resource-sticklines = 50 # systemctl stop httpd.service # systemctl enable httpd.service # crm_verify -V -L # crm node standby # crm onde online