[toc]
温故而知新
Messaaging Layer:
heartbaet v1,v2,v3
corosync v1, v2(votequorum)
OpenAIS
CRM:
pacemaker 需要一个配置接口
配置接口:crmsh (由SUSE研发的),由pssh管理工具来实现
pcs (agent c/s ),pcsd来实现
conga(ricci运行在各结点上的进程/luci发送指令到各节点)组成
group,constraint(基于约束)
rgmanager(cman)
resouce group(资源组):
failover domain
RA:
LSB: /etc/rc.d/init.d
systemd: /etc/systemd/system/multi-user.wants
服务开机处于enable状态;
OCF:[provider]
heartbeat
pacemaker
linbit 基于内核跨节点的块设备
service
stonith
高可用集群的可用方案:
heartbeat v1
heartbeat v2
heartbeat v3 + pacemaker X
corosync + pacemaker
cman + rgmanager
corosync + cman + pacemaker
keepalived
Linux-HA-Cluster 2
Heartbeat集群之间传递心跳的方法
1、使用串型线缆
2、使用以太网通信
3、Unicast 单播,使用udpu,一般在不支持多播的网络中使用的
4、Mutlicast 多播/组播, 使用udp
5、Broadcast 广播,占用过高的网络资源
1、组播地址:
用于标识一个IP组播域:IANA把D类地址留给组播使用:224.0.0.0-239.255.255.255
永久组播地址:224.0.0.0-224.0.0.255
临时组播地址:224.0.1.0-238.255.255.255 建议使用临时组播地址
本地组播地址:239.0.0.0-239.255.255.255
2、配置以组播方式进行高可用集群通信
# 在Node1 和 Node2中执行
# yum -y install corosync pacemaker
# rpm -ql corosync #查看corosync安装完之后所生成的配置文件;
/etc/corosync
/etc/corosync/corosync.conf.example
/etc/corosync/corosync.conf.example.udpu
/etc/corosync/corosync.xml.example
/etc/corosync/uidgid.d
/etc/dbus-1/system.d/corosync-signals.conf
/etc/logrotate.d/corosync
/etc/sysconfig/corosync
/etc/sysconfig/corosync-notifyd
/usr/bin/corosync-blackbox
/usr/bin/corosync-xmlproc
/usr/lib/systemd/system/corosync-notifyd.service
/usr/lib/systemd/system/corosync.service
/usr/sbin/corosync
/usr/sbin/corosync-cfgtool
/usr/sbin/corosync-cmapctl
/usr/sbin/corosync-cpgtool
/usr/sbin/corosync-keygen
/usr/sbin/corosync-notifyd
/usr/sbin/corosync-quorumtool
/usr/share/corosync
/usr/share/corosync/corosync
/usr/share/corosync/corosync-notifyd
/usr/share/corosync/xml2conf.xsl
/usr/share/doc/corosync-2.4.0
/usr/share/doc/corosync-2.4.0/LICENSE
/usr/share/doc/corosync-2.4.0/SECURITY
/usr/share/man/man5/corosync.conf.5.gz
/usr/share/man/man5/corosync.xml.5.gz
/usr/share/man/man5/votequorum.5.gz
/usr/share/man/man8/cmap_keys.8.gz
/usr/share/man/man8/corosync-blackbox.8.gz
/usr/share/man/man8/corosync-cfgtool.8.gz
/usr/share/man/man8/corosync-cmapctl.8.gz
/usr/share/man/man8/corosync-cpgtool.8.gz
/usr/share/man/man8/corosync-keygen.8.gz
/usr/share/man/man8/corosync-notifyd.8.gz
/usr/share/man/man8/corosync-quorumtool.8.gz
/usr/share/man/man8/corosync-xmlproc.8.gz
/usr/share/man/man8/corosync.8.gz
/usr/share/man/man8/corosync_overview.8.gz
/usr/share/snmp/mibs/COROSYNC-MIB.txt
/var/lib/corosync
/var/log/cluster
# rpm -ql pacemaker #查看pacemaker所生成的配置文件;
/etc/sysconfig/pacemaker
/usr/lib/ocf/resource.d/.isolation
/usr/lib/ocf/resource.d/.isolation/docker-wrapper
/usr/lib/ocf/resource.d/pacemaker/controld
/usr/lib/ocf/resource.d/pacemaker/remote
/usr/lib/systemd/system/pacemaker.service
/usr/libexec/pacemaker/attrd
/usr/libexec/pacemaker/cib
/usr/libexec/pacemaker/cibmon
/usr/libexec/pacemaker/crmd
/usr/libexec/pacemaker/lrmd
/usr/libexec/pacemaker/lrmd_internal_ctl
/usr/libexec/pacemaker/pengine
/usr/libexec/pacemaker/stonith-test
/usr/libexec/pacemaker/stonithd
/usr/sbin/crm_attribute
/usr/sbin/crm_master
/usr/sbin/crm_node
/usr/sbin/pacemakerd
/usr/sbin/stonith_admin
/usr/share/doc/pacemaker-1.1.16
/usr/share/doc/pacemaker-1.1.16/COPYING
/usr/share/doc/pacemaker-1.1.16/ChangeLog
/usr/share/licenses/pacemaker-1.1.16
/usr/share/licenses/pacemaker-1.1.16/GPLv2
/usr/share/man/man7/crmd.7.gz
/usr/share/man/man7/ocf_pacemaker_controld.7.gz
/usr/share/man/man7/ocf_pacemaker_remote.7.gz
/usr/share/man/man7/pengine.7.gz
/usr/share/man/man7/stonithd.7.gz
/usr/share/man/man8/crm_attribute.8.gz
/usr/share/man/man8/crm_master.8.gz
/usr/share/man/man8/crm_node.8.gz
/usr/share/man/man8/pacemakerd.8.gz
/usr/share/man/man8/stonith_admin.8.gz
/usr/share/pacemaker/alerts
/usr/share/pacemaker/alerts/alert_file.sh.sample
/usr/share/pacemaker/alerts/alert_smtp.sh.sample
/usr/share/pacemaker/alerts/alert_snmp.sh.sample
/var/lib/pacemaker/cib
/var/lib/pacemaker/pengine
2、配置corosync
# cd /etc/corosync/
# cp corosync.conf.example
# cp corosync.conf.example corosync.conf
# 某些环境中可能不支持组播。这时应该配置Corosync使用单播;
# 下面是使用单播的Corosync 配置文件的一部分;
totem {
#...
interface {
ringnumber: 0
bindnetaddr: 10.180.22.0
broadcast: yes (1)
mcastport: 5405
}
interface {
ringnumber: 1
bindnetaddr: 10.180.22.0
brodcast: yes
mcastport: 5405
}
transport: udpu (2)
}
nodelist{ (3)
node {
ring0_addr: 10.180.22.166
ring1_addr:10.180.55.1
nodedid: 1
}
node {
ring0_addr: 10.180.22.167
ring1_addr: 10.180.55.2
nodeid: 2
}
# 如果将 broadcast 设置为yes,集群心跳将通过广播实现。设置该参数时,不能设置mcastaddr;
# transport 配置项决定集群通信方式,要完全禁用组播,应该配置单播传输参数 udpu;
# 这要求将所有的节点服务器信息写入nodelist;
# 也就是需要在配置HA 集群之前确定节点组成。默认配置是udp。 通信方式类型还支持udpu和iba;
# 在nodelist 之下可以为栽一节点设置只与该节点相关的信息,这些设置项只能包含在node之中;
# 即只能对属于集群的节点服务器进行设置,而且只应包括那些与默认设置不同的参数;
# 每台服务器都必需配置 ring0_addr;
# 实验中配置信息如下
# Please read the corosync.conf.5 manual page
totem {
version: 2
# crypto_cipher and crypto_hash: Used for mutual node authentication.
# If you choose to enable this, then do remember to create a shared
# secret with "corosync-keygen".
# enabling crypto_cipher, requires also enabling of crypto_hash.
crypto_cipher: aes128
crypto_hash: sha1
secauth: on
# interface: define at least one interface to communicate
# over. If you define more than one interface stanza, you must
# also set rrp_mode.
interface {
# Rings must be consecutively numbered, starting at 0.
ringnumber: 0
# This is normally the *network* address of the
# interface to bind to. This ensures that you can use
# identical instances of this configuration file
# across all your cluster nodes, without having to
# modify this option.
bindnetaddr: 10.180.0.0
# However, if you have multiple physical network
# interfaces configured for the same subnet, then the
# network address alone is not sufficient to identify
# the interface Corosync should bind to. In that case,
# configure the *host* address of the interface
# instead:
# bindnetaddr: 192.168.1.1
# When selecting a multicast address, consider RFC
# 2365 (which, among other things, specifies that
# 239.255.x.x addresses are left to the discretion of
# the network administrator). Do not reuse multicast
# addresses across multiple Corosync clusters sharing
# the same network.
mcastaddr: 239.185.1.31
# Corosync uses the port you specify here for UDP
# messaging, and also the immediately preceding
# port. Thus if you set this to 5405, Corosync sends
# messages over UDP ports 5405 and 5404.
mcastport: 5405
# Time-to-live for cluster communication packets. The
# number of hops (routers) that this ring will allow
# itself to pass. Note that multicast routing must be
# specifically enabled on most network routers.
ttl: 1
}
}
nodelist{
node {
ring0_addr: 10.180.22.166
nodedid: 1
}
node {
ring0_addr: 10.180.22.167
nodeid: 2
}
node {
ring0_addr: 10.180.22.168
nodeid: 3
}
}
logging {
# Log the source file and line where messages are being
# generated. When in doubt, leave off. Potentially useful for
# debugging.
fileline: off
# Log to standard error. When in doubt, set to no. Useful when
# running in the foreground (when invoking "corosync -f")
to_stderr: no
# Log to a log file. When set to "no", the "logfile" option
# must not be set.
to_logfile: yes
logfile: /var/log/cluster/corosync.log
# Log to the system log daemon. When in doubt, set to yes.
to_syslog: no
# Log debug messages (very verbose). When in doubt, leave off.
debug: off
# Log messages with time stamps. When in doubt, set to on
# (unless you are only logging to syslog, where double
# timestamps can be annoying).
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
}
# 生成密钥文件
# corosync-keygen
# scp -p authkey coronsync.conf root@node2.ssjinyao.com:/etc/corosync
# corosync 启动集群投票系统默认必需需要三人结点;
# systemctl start corosync.serivce #这个时候启动是失败的;
# systemctl status corosync.serivce #查看corosync的状态信息;
# 此时需要开启另一个结点
# yum -y install corosync pacemaker
# Node1 中执行
# scp -p /etc/coronsync.conf /etc/authkey root@node2.ssjinyao.com:/etc/corosync
3、验证结点
# corosync-cfgtool -s # 验证结点
# corosync-cmapctl #查看结点间的信息;
# grep -v '^[[:space:]]*#' /etc/corosync.conf # 保存实用的配置示例
4、编辑pacemaker
# vim /etc/sysconfig/pacemaker
PCMK_logfile= /var/log/pacemaker.log
# 各个节点间需要启动pacemaker
# systemctl start pacemaker.service
# crm_mon # 进行验证
5、 crm 工具的使用简明
# crm_node -n # 查看当前节点信息
# crm_node -l # 例出集群所有的节点信息
# crm_verify -L -V # 但看报错信息
# crm shell 的安装与使用,若无rpm包,可在网上找相关的rpm包
# yum -y install crmsh pssh python-pssh
HA Web Service
vip: 10.180.xx.xxx, ocf:hearbeat:IPaddr
httpd: systemd
nfs shared storage: ocf:heartbeat:Filesystem
Ha Cluster 工作模型:
A/P: 两节点集群; active/apsslve;
without-quorum-policy=(stop|ignore|suicide|freeze)
A/A:
# node0
# mkdir -pv /www/htdocs
# echo "<h1> Test Page on NFS Server</h1>" > /www/htdocs/index.html
# vim /etc/exports
/www/htdocs 10.180.xx.xxx/16(rw)
# iptables -L -n # 查看是否有防火墙规则
# checkconfig nfs on
# systemctl start nfs
# node 1
# showmount -e 10.180.xx.xxx
# mount -t nfs 10.180.xx.xxx:/www/htdocs /var/www/html # 挂载网络存储NFS
# systemctl start httpd
# systemctl enable httpd
# 此时便可以查看能否可以能否访问到以上的测试信息
# node 2
# systemctl start httpd
# systemctl enabl httpd
# crm configure
crm(live)configure# show
crm(live)# configure property
crm(live)# configure property stonith-enabled=false
crm(live)# configrue verify
crm(live)# show
crm(live)# configure commit
crm(live)# configure
crm(live)configure# primitive webip ocf:heartbeat:IPaddr2 params ip="10.180.xx.xxx" op monitor interval=30s timeout=20s
crm(live)# configure show
crm(live)# configure commit
crm(live)# configure edit # 可以打开vim编辑配置文件
crm(live)# configure verify
crm(live)# configure primitive webstore ocf:heartbat:Filesystem params device="10.180.xx.xxx:/www/htdocs" directory="/var/www/html" fstype="nfs" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=40s
crm(live)# configure verify
crm(live)configure# colocation webserver_with_webstore_and_webip inf: webserver
( webip webstore )
crm(live)# show xmls # 查看xml格式的配置
crm(live)configure# order webstore_afer_webip Mandatory: webip webstore
crm(live)configure# order webserver_after_webstore Manadatory: webstore webserver
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# location webservice_perf_node1 webip 100: node1.ssjinao.com
crm(live)configure# verify
crm(live)configure# commit
crom(live)configure# property default-resource-sticklines = 50
# systemctl stop httpd.service
# systemctl enable httpd.service
# crm_verify -V -L
# crm node standby
# crm onde online