数据库fillfactor

postgresql-fillfactor

postgresql-fillfactor
- fillfactor
  - table（默认100）
  - index（fillfactor默认90）
- 验证
- 更新数据
- 更新效率
- 更新后索引大小
- Heap-Only Tuples(hot)
- 总结

fillfactor

fillfactor是在创建表的时候指定的参数，该参数是限制数据插入一页时预留的空闲空间比例，对于数据库表的默认值是100，索引默认值是90

table（默认100）

一个表的填充因子是一个10-100质检的百分数。100（完全填满）是默认值。设置较小的填充因子，insert操作会把表页面只填满到指定的百分比，剩余的空间留给页面上行的更新。这就让update有机会把一行的已更新版本放到在与原始版本相同的页面上，这比把它放在一个不同的页面上效率更高。对于不经常更新的表来说，设置为100是最好的选择，如果更新频繁设置较小的值更合适。这个参数对toast表不生效。

index（fillfactor默认90）

索引的填充因子是一个百分数，它决定索引方法将尝试填充索引页面的充满程度。对于B-tree，在初始的索引构建过程中，叶子页面会被填充至该百分数，B-tree默认的填充因子是90，可以设置为10-100的任何整数值。如果表是静态的，那么填充因子100是最好的，这样索引占用空间最小。对于更新频繁的表，设置较小的值有利于最小化页面分裂。

验证

--创建表test_fill_1设置fillfactor=100
abase=# create table test_fill_1(n_id int,c_xm varchar(300)) with (fillfactor=100);
CREATE TABLE
--创建表test_fill_2设置fillfactor=80
abase=# create table test_fill_2(n_id int,c_xm varchar(300)) with (fillfactor=80);
CREATE TABLE
--添加主键
abase=#  alter table test_fill_1 add primary key(n_id);
ALTER TABLE

abase=# alter table test_fill_2 add primary key(n_id);
ALTER TABLE

--初始化数据
abase=# insert into test_fill_1 select generate_series(1,1000000),'zhangsan'||generate_series(1,1000000);
INSERT 0 1000000
Time: 7067.047 ms
abase=# insert into test_fill_2 select generate_series(1,1000000),'zhangsan'||generate_series(1,1000000);
INSERT 0 1000000
Time: 6849.234 ms

--表分析
postgres=# vacuum analyze test_fill_1;
VACUUM
postgres=# vacuum analyze test_fill_2;
--查看表的页数
abase=# select relpages,reltuples from pg_class where relname = 'test_fill_1';
 relpages | reltuples 
----------+-----------
     6369 |     1e+06
(1 row)

abase=#  select relpages,reltuples from pg_class where relname = 'test_fill_2';
 relpages | reltuples 
----------+-----------
     7999 |     1e+06
(1 row)


--查看表结构
abase=# \d+ test_fill_1;
                             Table "public.test_fill_1"
 Column |          Type          | Modifiers | Storage  | Stats target | Description 
--------+------------------------+-----------+----------+--------------+-------------
 n_id   | integer                |           | plain    |              | 
 c_xm   | character varying(300) |           | extended |              | 
Options: fillfactor=100

abase=# \d+ test_fill_2;
                             Table "public.test_fill_2"
 Column |          Type          | Modifiers | Storage  | Stats target | Description 
--------+------------------------+-----------+----------+--------------+-------------
 n_id   | integer                |           | plain    |              | 
 c_xm   | character varying(300) |           | extended |              | 
Options: fillfactor=80

--查看表大小，fillfactor越大占用的空间越小
abase=#  select pg_size_pretty(pg_relation_size('test_fill_1'));
 pg_size_pretty 
----------------
 50 MB
(1 row)

Time: 1.251 ms
abase=# select pg_size_pretty(pg_relation_size('test_fill_2'));
 pg_size_pretty 
----------------
 62 MB
(1 row)

Time: 0.995 ms


--查看主键大小
abase=# select pg_size_pretty(pg_relation_size('test_fill_1_pkey'));
 pg_size_pretty 
----------------
 21 MB
(1 row)

abase=# select pg_size_pretty(pg_relation_size('test_fill_2_pkey'));
 pg_size_pretty 
----------------
 21 MB
(1 row)

初始化数据耗时差别不大，fillfactor=80略快。设置了fillfactor=80的表占用空间更大。索引方面占用空间一样。

更新数据

--1.更新test_fill_1
abase=# select ctid,* from test_fill_1 where n_id =1;
 ctid  | n_id |   c_xm    
-------+------+-----------
 (0,1) |    1 | zhangsan1
(1 row)

abase=# update test_fill_1 set c_xm='李四' where n_id = 1;
UPDATE 1

--更新test_fill_1 fillfactor为100，更新后，数据插入到了最后一页ctid为(6368,74)
abase=# select ctid,* from test_fill_1 where n_id =1;
   ctid    | n_id | c_xm 
-----------+------+------
 (6368,74) |    1 | 李四
(1 row)

--2.更新test_fill_2
abase=# select ctid,* from test_fill_2 where n_id =1;
 ctid  | n_id |   c_xm    
-------+------+-----------
 (0,1) |    1 | zhangsan1
(1 row)

Time: 1.392 ms
abase=#  update test_fill_2 set c_xm='李四' where n_id = 1;;
UPDATE 1

--test_fill_2表的fillfactor为80，还剩余20%的空间可以利用，更新后数据是在第一页插入了这条数据，ctid为 (0,149)还在第一页
abase=# select ctid,* from test_fill_2 where n_id =1;
  ctid   | n_id | c_xm 
---------+------+------
 (0,149) |    1 | 李四
(1 row)

设置了fillfactor=100后，更新数据会在最后一页插入一条数据

而设置fillfactor=80，数据会在当前页插入一条数据

更新效率

--为了看出明显的效果，先将autovacuum关闭掉
--更新test_fill_1的所有数据‘
更新的效率来看较小的fillfactor更新更快
abase=# update test_fill_1 set c_xm = c_xm||'x';
UPDATE 1000000
Time: 13035.901 ms
--更新test_fill_2的所有数据
abase=#  update test_fill_2 set c_xm = c_xm||'x';
UPDATE 1000000
Time: 10162.411 ms

--再次全部更新
abase=#  update test_fill_1 set c_xm = c_xm||'y';
UPDATE 1000000
Time: 11741.058 ms
abase=# update test_fill_2 set c_xm = c_xm||'y';
UPDATE 1000000
Time: 10838.738 ms

abase=# update test_fill_1 set c_xm = c_xm||'z';
UPDATE 1000000
Time: 14763.844 ms
abase=# update test_fill_2 set c_xm = c_xm||'z';
UPDATE 1000000
Time: 9392.977 ms
--经过三次全部更新来看，设置fillfactor=80时，更新的速度在10s左右。

--多次更新后，可以看到fillfactor=80的页在更新时，还是会使用前面的页的旧行。
abase=# select ctid,*from test_fill_1 limit 100;
    ctid    |  n_id  |         c_xm          
------------+--------+-----------------------
 (6369,155) |      1 | zhangsan1xyzxxy
 (6369,156) |      2 | zhangsan2xyzxxy
 (6369,157) |      3 | zhangsan3xyzxxy
 (6369,158) |      4 | zhangsan4xyzxxy
 (6369,159) |      5 | zhangsan5xyzxxy
 (6369,160) |      6 | zhangsan6xyzxxy
 (6369,161) |      7 | zhangsan7xyzxxy
 (6369,162) |      8 | zhangsan8xyzxxy
 (6369,163) |      9 | zhangsan9xyzxxy
 (6369,164) |     10 | zhangsan10xyzxxy
 (6369,165) |     11 | zhangsan11xyzxxy
 (6369,166) |     12 | zhangsan12xyzxxy
 (6369,167) |     13 | zhangsan13xyzxxy
abase=# select ctid,*from test_fill_2 limit 100;
  ctid   | n_id |        c_xm         
---------+------+---------------------
 (0,158) |    1 | zhangsan1xyzxxy
 (0,159) |    2 | zhangsan2xyzxxy
 (0,160) |    3 | zhangsan3xyzxxy
 (0,161) |    4 | zhangsan4xyzxxy
 (0,162) |    5 | zhangsan5xyzxxy
 (0,163) |    6 | zhangsan6xyzxxy
 (0,164) |    7 | zhangsan7xyzxxy
 (0,165) |    8 | zhangsan8xyzxxy
 (0,166) |    9 | zhangsan9xyzxxy
 (0,167) |   10 | zhangsan10xyzxxy
 (0,168) |   11 | zhangsan11xyzxxy
 (0,169) |   12 | zhangsan12xyzxxy
 (0,170) |   13 | zhangsan13xyzxxy
 (0,171) |   14 | zhangsan14xyzxxy
 (0,172) |   15 | zhangsan15xyzxxy

--更新小范围数据，test_fill_2效果很明显
abase=# update test_fill_1 set c_xm = c_xm||'xx' where n_id>1000 and n_id <2000;
UPDATE 999
Time: 28.306 ms
abase=# update test_fill_2 set c_xm = c_xm||'xx' where n_id>1000 and n_id <2000;
UPDATE 999
Time: 13.577 ms

在update（全量）的时候fillfactor=80的效率更高，少量数据更新的时候低fillfactor效果更明显。

再次update以后，可以看到fillfactor=80的页在更新时，还是会使用更新到前面的页的旧行。（没有autovacuum的情况下仍然更新可以使用标记为删除的行）

更新后索引大小

--test_fill_1表的所以更大
postgres=#  select pg_size_pretty(pg_relation_size('test_fill_1_pkey'));
 pg_size_pretty 
----------------
 66 MB
(1 row)

Time: 0.870 ms
postgres=#  select pg_size_pretty(pg_relation_size('test_fill_2_pkey'));
 pg_size_pretty 
----------------
 43 MB
(1 row)

Time: 0.782 ms

可以看到设置了fillfactor=80表的索引比fillfactor=100的索引要小，而fillfactor=100膨胀的更快，这是为什么呢？请往下看

Heap-Only Tuples(hot)

在PG中因为多版本功能的原因，当更新一行时，实际上旧行并未被删除，只是插入一条新行。如果这个表上有索引，而更新的字段不是索引的键值时，由于新行的物理位置发生了变化，因此仍然需要更新索引，这将导致性能下降。为了解决这个问题PostgreSQL自8.3版本后，引入了一个名为Heap-Only Tuple的新技术，简称HOT。使用HOT技术之后，如果更新后的新行与旧行在同一个数据块内，旧行会有一个指针，指向新行，这样就不必更新索引了，当从索引访问到数据行时，会根据这个指针找到新行。

HOT详细说明见下图，图中表上有一个索引，其中“索引项n”指向数据块的第3行。

图片1.png

更新第三行后，因为有用HOT技术，所以索引项仍然指向原先的旧数据（第3行），而第3行旧数据中有一个指针指向新数据（第6行），如下图：

图片2.png

注意，如果在原先的数据块中无法放下新行，就不能使用HOT技术了，即HOT技术中的行间指针只能在一个数据块内，不能跨数据块。所以为了使用HOT技术，应该在数据块中留出较多的空闲空间，方法是把表的填充因子（fillfactor）设置为一个合适的值。
Fillfactor参数的意思是插入数据时，块数据的空间占用率达到这个比率后，就不在插入数据了，默认值为100，表示块中不留存空间，数据全部填满数据块。

当有空闲空间的时候，更新可能会走hot更新，数据是在一页内变动的，索引不会变化。所以前面fillfactor=80在更新后索引要比fillfactor=100小。

总结

1.初始化数据耗时差别不大，设置了fillfactor=80的表占用空间更大，初始化时索引方面占用空间一样。

2.设置了fillfactor=100后，更新数据会在最后一页插入一条数据，而设置fillfactor=80，数据会在当前页空闲空间插入一条数据

3.在更新较频繁的表设置合适的fillfactor可以提高更新效率，因为有hot技术，减小索引膨胀

4.在update（全量）的时候fillfactor=80的效率更高，少量数据更新的时候低fillfactor效果更明显。

5.多次update以后，可以看到fillfactor=80时，还是会使用前面的页的旧行。（如果autovacuum没有及时清理的情况下更新可以使用标记为删除的行）

参考资料：https://my.oschina.net/207miner/blog/2994857

PostgreSQL

数据库fillfactor

postgresql-fillfactor

fillfactor

table（默认100）

index（fillfactor默认90）

验证

更新数据

更新效率

更新后索引大小

Heap-Only Tuples(hot)

总结

相关

【Gerrit】Gerrit+Postgresql安装方式的数据备份

PostgreSQL数据库常用SQL命令示例

使用pgcli工具连接postgreSQL数据库服务

postgresql-shared_buffers

前沿分享｜阿里云资深技术专家魏闯先：AnalyticDB PostgreSQL年度新版本发布

Centos7系统-postgresql+etcd+patroni+haproxy+keepalived高可用集群部署

Centos7系统-postgresql+etcd+patroni+haproxy+keepalived高可用集群部署

【TPC-DS】trino+S3+hive+postgresql性能测试----生成简单数据验证环境（四）

【TPC-DS】trino+S3+hive+postgresql性能测试----生成简单数据验证环境（四）

PostgreSql查看当前表的主外键关系

postgresql-用户登录数据库

harbor-db 启动时报initdb: directory "/var/lib/postgresql/data" exists but is not

标签