环境信息

准备3台机器模拟PG HA架构,其中etcd和proxy为单节点,便于测试环境搭建

组件 ip port 角色
etcd 192.168.5.220 2379
postgres0 192.168.0.200 5432 master
postgres1 192.168.0.220 5432
proxy 192.168.5.220 25432

初始化步骤

1、初始化集群

1
stolonctl --cluster-name stolon-cluster --store-backend=etcdv3 --store-endpoints= http://192.168.5.220:2379  init

2、启动一个sentinel

1
2
3
4
5
stolon-sentinel --cluster-name stolon-cluster --store-backend=etcdv3 --store-endpoints=http://192.168.5.220:2379

>> sentinel id id=66613766
>> Trying to acquire sentinels leadership
>> sentinel leadership acquired

3、启动一个keeper (主 postgres0) (第一个启动的是主)

这将启动一个uid为postgres0的keeper进程,监听在192.168.5.200:5432 上,在data/postgres0/postgres/ 目录初始化出一个PG实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
stolon-keeper --cluster-name stolon-cluster --store-backend=etcdv3 --store-endpoints= http://192.168.5.220:2379 --uid postgres0 --data-dir data/postgres0 --pg-su-password=supassword --pg-repl-username=repluser --pg-repl-password=replpassword --pg-listen-address=192.168.5.200

>> exclusive lock on data dir taken
>> keeper uid uid=postgres0
>> stopping database
>> our keeper data is not available, waiting for it to appear
>> our keeper data is not available, waiting for it to appear
>> current db UID different than cluster data db UID db= cdDB=2a87ea79
>> initializing the database cluster
>> cannot get configured pg parameters error=dial tcp 127.0.0.1:5432: getsockopt: connection refused
>> starting database
>> error getting pg state error=pq: password authentication failed for user "repluser"
>> setting roles
>> setting superuser password
>> superuser password set
>> creating replication role
>> replication role created role=repluser
>> stopping database
>> our db requested role is master
>> starting database
>> already master
>> postgres parameters changed, reloading postgres instance
>> reloading database configuration
.......

此时setinel将选举该实例为master

1
2
3
4
5
6
7
>> trying to find initial master
>> initializing cluster keeper=postgres0
>> received db state for unexpected db uid receivedDB= db=2a87ea79
>> waiting for db db=2a87ea79 keeper=postgres0
>> waiting for db db=2a87ea79 keeper=postgres0
>> waiting for db db=2a87ea79 keeper=postgres0
>> db initialized db=2a87ea79 keeper=postgres0

4、再启动一个keeper (备 postgres1)

这个实例将和master建立复制关系,成为备机

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
stolon-keeper --cluster-name stolon-cluster --store-backend=etcdv3 --store-endpoints= http://192.168.5.220:2379 --uid postgres1 --data-dir data/postgres1 --pg-su-password=supassword --pg-repl-username=repluser --pg-repl-password=replpassword --pg-listen-address=192.168.5.220

>> exclusive lock on data dir taken
>> keeper uid uid=postgres1
>> stopping database
>> our keeper data is not available, waiting for it to appear
>> our keeper data is not available, waiting for it to appear
>> current db UID different than cluster data db UID db= cdDB=63343433
>> database cluster not initialized
>> our db requested role is standby followedDB=2a87ea79
>> running pg_basebackup
>> sync succeeded
>> starting database
>> postgres parameters changed, reloading postgres instance
>> reloading database configuration
>> our db requested role is standby followedDB=2a87ea79
>> already standby

组件概述

keeper

Keeper 等于是在PG上层包装了一层,内嵌了很多PG主从初始化、切换以及一些Failover处理的逻辑。而Keeper做这些操作的依据就是Sentinel生成的Cluster Data(存储在Etcd),Keeper会一直轮询集群状态。

sentinel

Sentinel用来选主、以及监控所有PG实例的健康状态,会在Master不可用的时候,选举出最合适的Slave来设置成新的Master(通过PG Xlog来判断)

Proxy

Sentinel + Proxy 有点像Redis的哨兵。Proxy 是客户端的流量入口,会将所有的流量转发到Master,如果发生了主备切换,Proxy也会关闭旧的Master的连接。至于Proxy是如何判断Master节点,也是依赖Sentinel生成的Cluster Data。Proxy 实现上依赖的是一个自行开发的网络库,支持代理转发流量,同时支持对Socket FD设置TCP Keepalived参数,通过Goroutine而不是线程来处理请求,可以极大的减少资源开销。综上,可以看到Sentinel + Proxy组合起来承担了类似Redis哨兵的功能

步骤详解

1、stolonctl init 集群初始化

将集群的元数据信息、ClusterSpec等存放在etcd上

2、 stolon-sentinel 启动1个sentinel

1)最早创建的sentinel成为leader。借助etcd的leader选举机制

2) 把sentinel的信息存放在etcd上

3)从etcd拉取keeper的信息

4)如果当前sentinel不是leader,就直接返回

5) 监控keeper,(健康检查、如果发现当前集群还没有master,就将发现的第一个keeper设置为master

3、stolon-keeper 启动1个keeper

1)initdb 初始化数据库

2)postgres -D /dir 启动数据库

3)直到出现postmaster.pid文件,检查下pid,启动成功

4、再次创建1个sentinel和keeper

keeper将备机的复制信息写入recovery.conf文件,运行 pg_basebackup 建立同步

该sentinel不工作