通过performance_schema获取造成死锁的事务语句(转)


数据库日常维护中我们经常遇到死锁的问题,由于无法获取造成死锁的事务内执行过的语句,对我们死锁的分析造成很大的困难。但是在MySQL 5.7中我们可以利用performance_schema来获取这些语句,为我们解决死锁问题提供了一个新的思路,下面我们将解释这种方法的使用。

一、开启相关统计的方法

如果打开performance_schema选项来收集执行过的语句和事务会有性能损失,一般建议需要的时候开启,然后在线关闭掉。

(1) 在my.cnf中设置打开、关闭performance_schema选项随数据库启动
performance-schema-instrument='statement/%=OFF'              
performance-schema-consumer-events-statements-current=OFF
performance-schema-consumer-events-statements-history=OFF
performance-schema-consumer-events-statements-history-long=OFF
performance-schema-consumer-statements-digest=OFF

performance-schema-instrument='transaction=OFF'
performance-schema-consumer-events-transactions-current=OFF
performance-schema-consumer-events-transactions-history=OFF
performance-schema-consumer-events-transactions-history-long=OFF

(3) 在线打开performance_schema选项收集执行过的语句和事务
update setup_consumers set ENABLED='no' where name like 'events_statements%';     
update setup_consumers set ENABLED='no' where name like 'events_transactions%';   
update setup_consumers set ENABLED='no' where name like 'statements-digest';      
update setup_instruments set ENABLED='no',TIMED='no' where name ='transaction';   
(5) 相关参数
  • performance_schema_events_statements_history_size:定义events_statements_history表中每个线程最大行数,静态参数,修改需要重启实例。
  • performance_schema_events_statements_history_long_size:定义events_statements_history_long表最大行数,静态参数,修改需要重启实例。
  • performance_schema_events_transactions_history_size:定义events_transactions_history表每个线程最大行数,静态参数,修改需要重启实例。
  • performance_schema_events_transactions_history_long_size:定义events_transactions_history_long表最大行数,静态参数,修改需要重启实例。

二、根据死锁信息来获取造成死锁的语句

下面是一个典型的死锁日志:

mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,
CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE, MESSAGE_TEXT,timer_start 
from events_statements_history_long where sql_text like 
'select * from t1 where id=1 for update' \G
*************************** 1. row ***************************
         THREAD_ID: 37       
          EVENT_ID: 98
      END_EVENT_ID: 106
        EVENT_NAME: statement/sql/select
            SOURCE: socket_connection.cc:101
          SQL_TEXT: select * from t1 where id=1 for update
            DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
    CURRENT_SCHEMA: txc_test
  NESTING_EVENT_ID: 96
NESTING_EVENT_TYPE: TRANSACTION
      MESSAGE_TEXT: NULL
       timer_start: 266799699801000
*************************** 2. row ***************************
         THREAD_ID: 38
          EVENT_ID: 35
      END_EVENT_ID: 37
        EVENT_NAME: statement/sql/select
            SOURCE: socket_connection.cc:101
          SQL_TEXT: select * from t1 where id=1 for update
            DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
    CURRENT_SCHEMA: txc_test
  NESTING_EVENT_ID: 19
NESTING_EVENT_TYPE: TRANSACTION
      MESSAGE_TEXT: NULL

  • 事务2

在死锁日志中我们可以看到导致死锁的最后一个语句是‘select * from t2 where id=1 for update’,那么我们就可以使用它来查询events_statements_history_long表。

PFS_thread* create_thread(PFS_thread_class *klass, const void *identity,
                          ulonglong processlist_id){
  PFS_thread *pfs;
  pfs_dirty_state dirty_state;
  pfs= global_thread_container.allocate(& dirty_state);  if (pfs != NULL)
  {
    pfs->m_thread_internal_id=
      PFS_atomic::add_u64(&thread_internal_id_counter.m_u64, 1);
    pfs->m_parent_thread_internal_id= 0;
    pfs->m_processlist_id= static_cast>(processlist_id);

(2)提取信息
  • 事务1

最后一条语句的END_EVENT_ID=37
最后一条语句的NESTING_EVENT_ID=19
THREAD_ID=38

因此我们可以通过如下语句得出经历的所有语句如下:

mysql> select THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,SQL_TEXT,DIGEST,
CURRENT_SCHEMA,NESTING_EVENT_ID,NESTING_EVENT_TYPE,MESSAGE_TEXT,timer_start 
from events_statements_history_long where thread_id = 37 and END_EVENT_ID >= 96 
and END_EVENT_ID <= 114\G
*************************** 1. row ***************************
         THREAD_ID: 37
          EVENT_ID: 95
      END_EVENT_ID: 96
        EVENT_NAME: statement/sql/begin
            SOURCE: socket_connection.cc:101
          SQL_TEXT: begin
            DIGEST: f57daa74a09445d1e1c496f28fe6d906
    CURRENT_SCHEMA: txc_test
  NESTING_EVENT_ID: NULL
NESTING_EVENT_TYPE: NULL
      MESSAGE_TEXT: NULL
       timer_start: 251219441701000
*************************** 2. row ***************************
         THREAD_ID: 37
          EVENT_ID: 98
      END_EVENT_ID: 106
        EVENT_NAME: statement/sql/select
            SOURCE: socket_connection.cc:101
          SQL_TEXT: select * from t1 where id=1 for update
            DIGEST: be26f0b8bee2ac2bb34e9c651d655e7c
    CURRENT_SCHEMA: txc_test
  NESTING_EVENT_ID: 96
NESTING_EVENT_TYPE: TRANSACTION
      MESSAGE_TEXT: NULL
       timer_start: 266799699801000
*************************** 3. row ***************************
         THREAD_ID: 37
          EVENT_ID: 108
      END_EVENT_ID: 110
        EVENT_NAME: statement/sql/select
            SOURCE: socket_connection.cc:101
          SQL_TEXT: select * from t2 where id=6 for update
            DIGEST: 315b4a6a8f7424bc7591256a8937a213
    CURRENT_SCHEMA: txc_test
  NESTING_EVENT_ID: 96
NESTING_EVENT_TYPE: TRANSACTION
      MESSAGE_TEXT: NULL
       timer_start: 308095739676000
*************************** 4. row ***************************
         THREAD_ID: 37
          EVENT_ID: 112
      END_EVENT_ID: 114
        EVENT_NAME: statement/sql/select
            SOURCE: socket_connection.cc:101
          SQL_TEXT: select * from t2 where id=1 for update
            DIGEST: 315b4a6a8f7424bc7591256a8937a213
    CURRENT_SCHEMA: txc_test
  NESTING_EVENT_ID: 96
NESTING_EVENT_TYPE: TRANSACTION
      MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction
       timer_start: 328978250011000
4 rows in set (0.01 sec)


(3)猜测死锁形成过程

根据上面两个事务执行的sql大致可以推断出死锁的sql语句,如果两个事务里面执行的sql很多那可能就需要花更多的时间来找出造成死锁的语句:

TX1TX2
  select * from t1 where id=1 for update;
select * from t2 where id=1 for update;  
select * from t1 where id=3 for update;  
  select * from t2 where id=6 for update;
select * from t1 where id=1 for update;  
  select * from t2 where id=1 for update;

三、总结

  • 通过以上的查询基本可以获取造成死锁的事务内执行的语句,由于线上业务量大可能造成events_statements_history_long查询不到需要的语句(默认存储10000条),需要及时监控发现死锁。
  • 打开performance_schema的选项,有性能损失。
  • 如果线上实例是每个database对应一个独立用户,可以通过设置收集指定用户执行的所有event。
    1).关闭收集所有用户的event
    update setup_actors set ENABLED='NO',HISTORY='NO';
    2).插入需要收集event的指定用户(例如我只想收集txc用户下的所有event,参考如下)
    insert into setup_actors select '%','txc','%','YES','YES';
    3).select * from setup_actors;
         +------+------+------+---------+---------+
         | HOST | USER | ROLE | ENABLED | HISTORY |
         +------+------+------+---------+---------+
         | %    | %    | %    | NO      | NO      |
         | %    | txc  | %    | YES     | YES     |
         +------+------+------+---------+---------+
  上一篇

作者:重庆八怪
链接:https://www.jianshu.com/p/268889c997e8
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。