在多线程模型中,锁是个复杂的东西,即便老司机有时也会翻车。
锁是配对出现的。锁上了,就要解锁,忘记解锁会产生死锁,一般这种低级错误很容易避免,然而在复杂的业务体系中,往往会产生嵌套式死锁问题,而这种问题有时藏得很深。
1. 嵌套死锁理解
嵌套式死锁:系统中存在多个(不可重入)锁,跨线程相互调用。
下图展示了同时运行的两个线程,极有可能产生嵌套死锁问题。
设计图来源:《嵌套式死锁原理》
2. 测试
2.1. Demo
根据上述分析,写了个测试 Demo。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// g++ -g -O0 -std=c++11 test.cpp -lpthread -o t && ./t
#include <chrono>
#include <iostream>
#include <memory>
#include <mutex>
#include <thread>
std::mutex g_mtx1;
std::mutex g_mtx2;
int main() {
std::thread t1([]() {
std::lock_guard<std::mutex> lck(g_mtx1);
std::cout << "thread: 1, locked by mtx1\n";
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "thread: 1, waitting to unlock mtx2\n";
std::lock_guard<std::mutex> lck2(g_mtx2);
std::cout << "thread: 1, locked by mtx2\n";
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "thread: 1, done!!!\n";
});
std::thread t2([]() {
std::lock_guard<std::mutex> lck(g_mtx2);
std::cout << "thread: 2, locked by mtx2\n";
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "thread: 2, waitting to unlock mtx1\n";
std::lock_guard<std::mutex> lck2(g_mtx1);
std::cout << "thread: 2, locked by mtx1\n";
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "thread: 2, done!!!\n";
});
t1.join();
t2.join();
std::cout << "finished!" << std::endl;
return 0;
}
// 输出:
// thread: 1, locked by mtx1
// thread: 2, locked by mtx2
// thread: 1, waitting to unlock mtx2
// thread: 2, waitting to unlock mtx1
2.2. Gdb 分析
通过 Gdb 工具绑定死锁进程,然后查看死锁程序函数堆栈。死锁分别发生在:
- test.cpp : 18
- test.cpp : 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(gdb) thread apply all bt
Thread 3 (Thread 0x7f8d9c793700 (LWP 2605)):
#0 __lll_lock_wait ()
#1 0x00007f8d9d38be9b in _L_lock_883 ()
#2 0x00007f8d9d38bd68 in __GI___pthread_mutex_lock ()
#3 0x00000000004010ad in __gthread_mutex_lock ()
#4 0x0000000000403284 in std::mutex::lock (this=0x607320 <g_mtx2>)
#5 0x0000000000403456 in std::lock_guard<std::mutex>::lock_guard ()
#6 0x00000000004011d5 in __lambda0::operator() at test.cpp:18
...
Thread 2 (Thread 0x7f8d9bf92700 (LWP 2606)):
#0 __lll_lock_wait ()
#1 0x00007f8d9d38be9b in _L_lock_883 ()
#2 0x00007f8d9d38bd68 in __GI___pthread_mutex_lock ()
#3 0x00000000004010ad in __gthread_mutex_lock ()
#4 0x0000000000403284 in std::mutex::lock ()
#5 0x0000000000403456 in std::lock_guard<std::mutex>::lock_guard ()
#6 0x00000000004012c5 in __lambda1::operator() at test.cpp:31
...
3. 优化方案
可以使用 C++17 的 std::scoped_lock
可以一次性锁定多个互斥锁,它会按照固定的顺序锁定互斥锁,从而避免死锁。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// g++ -g -O0 -std=c++17 test.cpp -lpthread -o test && ./test
#include <chrono>
#include <iostream>
#include <memory>
#include <mutex>
#include <thread>
std::mutex g_mtx1;
std::mutex g_mtx2;
int main() {
std::thread t1([]() {
std::scoped_lock lck(g_mtx1, g_mtx2);
std::cout << "thread: 1, locked by mtx1 and mtx2\n";
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "thread: 1, done!!!\n";
});
std::thread t2([]() {
std::scoped_lock lck(g_mtx1, g_mtx2);
std::cout << "thread: 2, locked by mtx1 and mtx2\n";
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "thread: 2, done!!!\n";
});
t1.join();
t2.join();
std::cout << "finished!\n";
return 0;
}
4. 小结
- 在一个已上锁的功能单元里,尽量不要再使用其它锁,或者调用其它有锁的函数。
- 锁的粒度(临界区域)应该尽量小。
- 锁是锁数据的,不是锁逻辑的。在一个函数里,函数入口加锁,函数退出解锁,这种操作看似方便,实则隐藏了很多问题。假如锁住了插入数据库语句逻辑,刚好这个数据库堵了,那么整个多线程系统有可能卡住。