Pgpool-II集群的高可用

集群准备

role ip PostgreSQL Pgpool-II port
master 192.168.1.3 10.9 4.0.5 5432
slave 192.168.1.2 10.9 4.0.5 5432
vip 192.168.1.100

其中watchdog端口9000pcp端口9898psql服务连接端口9999

查看集群节点状态 show pool_nodes;

1
2
3
4
5
6
7
8
9
10
11
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)

Pgpool服务挂了

vip目前在master端,先测试非vip端pgpool服务不可用,再测试vip端pgpool服务不可用。

非vip端pgpool不可用

停止pgpool服务

1
$ systemctl stop pgpool.service

查看节点状态,服务未中断

1
2
3
4
5
6
7
8
9
10
11
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | false
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | true
(2 rows)
vip端pgpool不可用

停止pgpool服务

1
$ systemctl stop pgpool.service

此时 非vip端的pgpool 会主动接管vip继续对外提供服务

1
2
3
4
$ ifconfig eth0:0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255
ether fa:f2:7d:42:75:00 txqueuelen 1000 (Ethernet)

查看节点状态,服务被接管后,继续对外提供服务

1
2
3
4
5
6
7
8
9
10
11
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)

PostgreSQL挂了

standby不可用

停止postgresql服务

1
2
3
$ pg_ctl stop
waiting for server to shut down.... done
server stopped

此时集群节点状态会发生变化,standby节点变为down,此时集群仍然可以继续对外提供服务。

1
2
3
4
5
6
7
8
9
10
11
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | down | 0.500000 | standby | 0 | false
(2 rows)

重新启动 standby 节点,模拟节点修复,重新将该节点加入集群

1
2
3
4
5
6
7
8
$ pg_ctl start
... ... ...
done
server started

$ pcp_attach_node -h 192.168.1.100 -p 9898 -U postgres -n 1
Password:
pcp_attach_node -- Command Successful

standby 重新加入集群后,状态变为up,继续对外服务。

1
2
3
4
5
6
7
8
9
10
11
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)
primary不可用

当前 192.168.1.2primary

1
2
3
4
5
6
7
8
9
10
11
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | standby | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | primary | 0 | false
(2 rows)

停止192.168.1.2 上面的 postgresql服务

1
2
3
$ pg_ctl stop
waiting for server to shut down.... done
server stopped

此时查看节点状态,集群检查到 192.168.1.2 服务不可用后,自动切换 192.168.1.3primary,继续对外提供服务,此时集群可读写。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | down | 0.500000 | standby | 0 | false
(2 rows)

postgres=# create database kong;
CREATE DATABASE
postgres=# drop database kong;
DROP DATABASE

修复节点 192.168.1.2 的服务,然后重新加入集群作为 standby 继续对外提供服务。如果主备时间线不同步,需要用 pg_rewind 命令修复时间线,然后再启动服务。

1
2
3
4
5
$ mv recovery.done recovery.conf
$ pg_ctl start
$ pcp_attach_node -h 192.168.1.100 -p 9898 -U postgres -n 1
Password:
pcp_attach_node -- Command Successful

节点成功加入集群,其状态为 up

1
2
3
4
5
6
7
8
9
10
11
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)
----------------本文结束 感谢阅读----------------