文先生的博客 求职,坐标深圳。(wenfh2020@126.com)

[多线程] 剖析嵌套式死锁问题

2020-03-08

在多线程模型中,锁是个复杂的东西,即便老司机有时也会翻车。

锁是配对出现的。锁上了,就要解锁,忘记解锁会产生死锁,一般这种低级错误很容易避免,然而在复杂的业务体系中,往往会产生嵌套式死锁问题,而这种问题有时藏得很深。


1. 嵌套死锁理解

嵌套式死锁:系统中存在多个(不可重入)锁,跨线程相互调用。

下图展示了同时运行的两个线程,极有可能产生嵌套死锁问题。

设计图来源:《嵌套式死锁原理


2. 测试

2.1. Demo

根据上述分析,写了个测试 Demo。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// g++ -g -O0 -std=c++11 test.cpp -lpthread -o t && ./t
#include <chrono>
#include <iostream>
#include <memory>
#include <mutex>
#include <thread>

std::mutex g_mtx1;
std::mutex g_mtx2;

int main() {
    std::thread t1([]() {
        std::lock_guard<std::mutex> lck(g_mtx1);
        std::cout << "thread: 1, locked by mtx1\n";
        std::this_thread::sleep_for(std::chrono::seconds(1));

        std::cout << "thread: 1, waitting to unlock mtx2\n";
        std::lock_guard<std::mutex> lck2(g_mtx2);
        std::cout << "thread: 1, locked by mtx2\n";

        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::cout << "thread: 1, done!!!\n";
    });

    std::thread t2([]() {
        std::lock_guard<std::mutex> lck(g_mtx2);
        std::cout << "thread: 2, locked by mtx2\n";
        std::this_thread::sleep_for(std::chrono::seconds(1));

        std::cout << "thread: 2, waitting to unlock mtx1\n";
        std::lock_guard<std::mutex> lck2(g_mtx1);
        std::cout << "thread: 2, locked by mtx1\n";

        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::cout << "thread: 2, done!!!\n";
    });

    t1.join();
    t2.join();
    std::cout << "finished!" << std::endl;
    return 0;
}

// 输出:
// thread: 1, locked by mtx1
// thread: 2, locked by mtx2
// thread: 1, waitting to unlock mtx2
// thread: 2, waitting to unlock mtx1

2.2. Gdb 分析

通过 Gdb 工具绑定死锁进程,然后查看死锁程序函数堆栈。死锁分别发生在:

  • test.cpp : 18
  • test.cpp : 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(gdb) thread apply all bt

Thread 3 (Thread 0x7f8d9c793700 (LWP 2605)):
#0  __lll_lock_wait ()
#1  0x00007f8d9d38be9b in _L_lock_883 ()
#2  0x00007f8d9d38bd68 in __GI___pthread_mutex_lock ()
#3  0x00000000004010ad in __gthread_mutex_lock ()
#4  0x0000000000403284 in std::mutex::lock (this=0x607320 <g_mtx2>)
#5  0x0000000000403456 in std::lock_guard<std::mutex>::lock_guard ()
#6  0x00000000004011d5 in __lambda0::operator() at test.cpp:18
...

Thread 2 (Thread 0x7f8d9bf92700 (LWP 2606)):
#0  __lll_lock_wait ()
#1  0x00007f8d9d38be9b in _L_lock_883 ()
#2  0x00007f8d9d38bd68 in __GI___pthread_mutex_lock ()
#3  0x00000000004010ad in __gthread_mutex_lock ()
#4  0x0000000000403284 in std::mutex::lock ()
#5  0x0000000000403456 in std::lock_guard<std::mutex>::lock_guard ()
#6  0x00000000004012c5 in __lambda1::operator() at test.cpp:31
...

3. 小结

  • 在一个已上锁的功能单元里,尽量不要再使用其它锁,或者调用其它有锁的函数。
  • 锁的粒度(区域)应该尽量小;锁是锁数据的,不是锁逻辑的。在一个函数里,函数入口加锁,函数退出解锁,这种操作看似方便,实则隐藏了很多问题。假如锁住了插入数据库语句逻辑,刚好这个数据库堵了,那么整个多线程系统有可能卡住。

作者公众号
微信公众号,干货持续更新~