pg流复制_同步
同步流复制和异步流复制的搭建方法基本相同,但需要配置一些其它的参数来控制流复制。
-
参数:
-
synchronous_commit
当synchronous_standby_names为空时,它为单实时状态
off:表示提交时不需要等待wal Buffer写入到WAL日志磁盘后才向客户端返回成功
off外的其它值:等于on,均需要wal Buffer写入到磁盘后才返回成功
当synchronous_standby_names不为空时,为流复制状态
on:主库事务提交时,需要等待备库接收到wal日志并将内容写到磁盘后,主库才返回成功,
remote_write:主库事务提交时,需要等待备库接收到wal日志并将内容写到操作系统缓存后,主库才返回成功,与on的区别是不要求落盘
remote_apply:表示主库事务提交时,需要等待备库接收到wal日志并写到磁盘,并且已经完成了apply重做
-
配置同步复制:
相比异步流复制,同步流复制新增以下点:
--备库postgresql.conf(12之前为recovery.conf)中primary_conninfo参数配置中添加application_name
primary_conninfo = 'host=192.168.150.130 port=6000 user=repuser application_name=pg93std'
重启生效后在主库即可查询到该application_name
postgres@postgres:select pid,usename,application_name,client_addr from pg_stat_replication;
pid | usename | application_name | client_addr
------+---------+------------------+-----------------
4506 | repuser | pg93std | 192.168.150.132
(1 row)
--主库配置参数
synchronous_commit = on
synchronous_standby_names = 'pg93std'
--reload生效
[postgres@pg93 pgdata]$ pg_ctl reload -D /pgdata
server signaled
postgres@postgres:select name,setting from pg_settings where name like 'synchronous%';
name | setting
---------------------------+---------
synchronous_commit | on
synchronous_standby_names | pg93std
(2 rows)
--查询同步状态
postgres@postgres:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
pid | usename | application_name | client_addr | sync_state | state
------+---------+------------------+-----------------+------------+-----------
4506 | repuser | pg93std | 192.168.150.132 | sync | streaming
(1 row)
--可见已经是同步状态了
-
同步流复制测试
--同步测试
--主
postgres@findb:select count(*) from tbl_json;
count
-------
5
(1 row)
postgres@findb:truncate table tbl_json;
TRUNCATE TABLE
--备
[postgres@pg93s pgdata]$ psql findb
psql (12.3)
Type "help" for help.
findb=# select count(*) from tbl_json;
count
-------
0
(1 row)
--备库中断
--停备
[postgres@pg93s pgdata]$ pg_ctl stop -D /pgdata/
waiting for server to shut down.... done
server stopped
--主库
postgres@postgres:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
pid | usename | application_name | client_addr | sync_state | state
-----+---------+------------------+-------------+------------+-------
(0 rows)
postgres@findb:drop table t1;
--卡住了......
--启动备库
[postgres@pg93s pgdata]$ pg_ctl start -D /pgdata
waiting for server to start....2020-10-06 20:26:04.072 CST [6944] LOG: starting PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
2020-10-06 20:26:04.073 CST [6944] LOG: listening on IPv4 address "192.168.150.132", port 6000
2020-10-06 20:26:04.088 CST [6944] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.6000"
2020-10-06 20:26:04.158 CST [6944] LOG: redirecting log output to logging collector process
2020-10-06 20:26:04.158 CST [6944] HINT: Future log output will appear in directory "log".
done
--源库的操作恢复正常完成,日志也提示恢复同步
postgres@findb:drop table t1;
DROP TABLE
2020-12-20 12:16:55.401 CST [4588] LOG: standby "pg93std" is now a synchronous standby with priority 1
--修改主库参数 synchronous_commit为local
postgres@postgres:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
pid | usename | application_name | client_addr | sync_state | state
------+---------+------------------+-----------------+------------+-----------
4588 | repuser | pg93std | 192.168.150.132 | sync | streaming
(1 row)
postgres@postgres:select name,setting from pg_settings where name like 'synchronous%';
name | setting
---------------------------+---------
synchronous_commit | local
synchronous_standby_names | pg93std
--再停止备库,主库是可以正常执行操作的
[postgres@pg93s pgdata]$ pg_ctl stop -D /pgdata
waiting for server to shut down.... done
--主
postgres@findb:drop table t2 ;
DROP TABLE
(2 rows)
--此时主查询不到同步
postgres@findb:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
pid | usename | application_name | client_addr | sync_state | state
-----+---------+------------------+-------------+------------+-------
(0 rows)
--再启动备
[postgres@pg93s pgdata]$ pg_ctl start -D /pgdata
waiting for server to start....2020-10-06 20:32:39.214 CST [6966] LOG: starting PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
2020-10-06 20:32:39.214 CST [6966] LOG: listening on IPv4 address "192.168.150.132", port 6000
2020-10-06 20:32:39.226 CST [6966] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.6000"
2020-10-06 20:32:39.295 CST [6966] LOG: redirecting log output to logging collector process
2020-10-06 20:32:39.295 CST [6966] HINT: Future log output will appear in directory "log".
done
server started
--再次查询主的同步状态
postgres@findb:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
pid | usename | application_name | client_addr | sync_state | state
------+---------+------------------+-----------------+------------+-----------
4621 | repuser | pg93std | 192.168.150.132 | sync | streaming
(1 row)
--此时可见又恢复到了sync状态
--这种配置和所谓“可退化的同步复制”吗?待研究
--生产上通常需要有两个或以上的备库时,才会选择使用同步复制方式,这样当有备库故障时,至少有一个备库可以继续接收主发送过来的日志
-
流复制监控
--监控主库pg_stat_replication上的状态和延迟(10或以后版本&主挂了就查不了了)
postgres@postgres:select pid,usename,client_addr,application_name,state,sync_state,write_lag,flush_lag,replay_lag,reply_time from pg_stat_replication;
pid | usename | client_addr | application_name | state | sync_state | write_lag | flush_lag | replay_lag | reply_time
------+---------+-----------------+------------------+-----------+------------+-----------------+-----------------+-----------------+-------------------------------
4621 | repuser | 192.168.150.132 | pg93std | streaming | sync | 00:00:00.017919 | 00:00:00.019003 | 00:00:00.021647 | 2020-10-06 21:24:43.725341+08
(1 row)
--在备上对比当前时间和最后应用时间(10以前版本可用)
postgres=# select now(),pg_last_xact_replay_timestamp();
now | pg_last_xact_replay_timestamp
-------------------------------+-------------------------------
2020-10-06 21:30:12.983656+08 | 2020-12-20 13:20:24.564295+08
(1 row)
postgres=# select now()-pg_last_xact_replay_timestamp();
?column?
---------------------------
-74 days -15:49:18.659168
(1 row)
--可见由于两边时间不同步导致此方式无效,所以此方法需要两边的时间是同步的(使用ntp),另外一个问题是如果主库没有事务,则备端最近应用的时间就会隔得比较长,导致显示的差异时间比较大而误报
--对比日志延迟量(主挂则不可用)
postgres@postgres:select pid,usename,client_addr,state,application_name,
pg_wal_lsn_diff(pg_current_wal_lsn(),write_lsn) write_delay,
pg_wal_lsn_diff(pg_current_wal_lsn(),flush_lsn) flush_delay,
pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) replay_delay from pg_stat_replication;
pid | usename | client_addr | state | application_name | write_delay | flush_delay | replay_delay
------+---------+-----------------+-----------+------------------+-------------+-------------+--------------
4621 | repuser | 192.168.150.132 | streaming | pg93std | 6684672 | 6684672 | 6684720
(1 row)
--
--还有一个方法是创建一张,每隔一段时间(如1秒)往里面插入一条数据,包含当前时间戳,lsn位置等,通过此方法并结合备的时间,也能大概知道主备相差多少日志
--扩展一下还可以将主的当前lsn和时间戳同步到第三方如redis里面(如0.1秒每次),备通过查询redis也能在主宕机的情况下大概知道和主的lsn差距
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 6972
status | streaming --和pg_stat_replication的state一样
receive_start_lsn | 1/9A000000 --启动时的lsn
receive_start_tli | 1
received_lsn | 1/FF2CB870
received_tli | 1
last_msg_send_time | 2020-12-20 16:09:16.100045+08 --收到的主wal sender最后一次发送消息的时间(主时间)
last_msg_receipt_time | 2020-10-07 00:18:13.558129+08 --收到主wal sender最后一次发送的消息的时间(备时间)
latest_end_lsn | 1/FF2CB870 --主发送的消息中lsn的最后位置
latest_end_time | 2020-12-20 13:35:58.721361+08 --主发送的消息中的lsn最后位置的时间
slot_name |
sender_host | 192.168.150.130
sender_port | 6000
conninfo | user=repuser passfile=/home/postgres/.pgpass dbname=replication host=192.168.150.130 port=6000 application_name=pg93std fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
-
相关函数
1、判断数据库是否处于恢复模式(常用于判断主库还是备库)
--主
postgres@postgres:select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
--备
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
2、查看当前在写的WAL lsn
--主
postgres@postgres:select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
2/10000A0
(1 row)
--备
postgres=# select pg_current_wal_lsn();
ERROR: recovery is in progress
HINT: WAL control functions cannot be executed during recovery.
3、查看备最近接收到的WAL日志位置
--主
postgres@postgres:select pg_last_wal_receive_lsn();
pg_last_wal_receive_lsn
-------------------------
(1 row)
--备
postgres=# select pg_last_wal_receive_lsn();
pg_last_wal_receive_lsn
-------------------------
2/10000D8
(1 row)
4、查看备最近应用的WAL日志位置
--主
postgres@postgres:select pg_last_wal_replay_lsn();
pg_last_wal_replay_lsn
------------------------
(1 row)
--备
postgres=# select pg_last_wal_replay_lsn();
pg_last_wal_replay_lsn
------------------------
2/10001C0
(1 row)
5、其它函数
查看备最近事务应用时间:
postgres=# select pg_last_xact_replay_timestamp();
pg_last_xact_replay_timestamp
-------------------------------
2020-12-20 13:34:46.657621+08
(1 row)
查看LSN之间的偏移量差异
postgres=# select pg_wal_lsn_diff('2/10001C0','2/10000D8');
pg_wal_lsn_diff
-----------------
232
(1 row)