本文是学习100gdb-tips的总结内容,部分代码与原文不同,补充相关调试程序中遇到的问题与解决方法。

调试已运行的程序

thread.c

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
void *thread_func(void *p_arg)
{
        while (1)
        {
                printf("%s\n", (char*)p_arg);
                sleep(10);
        }
}
int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread_func, "Thread 1");
        pthread_create(&t2, NULL, thread_func, "Thread 2");

        sleep(1000);
        return 0;
}

gcc -g -o thread thread.c -lpthread
然后在当前会话shell中执行./thread &
或直接执行./thread再单独开一个shell会话窗口。
调试已运行的进程有两种方法,一种是通过ps找到要调试的程序的进程号,例如上面例子可以通过ps -aux | grep thread,然后gdb thread 进程号。
另一种方法是先gdb 然后通过attach命令来调试进程。

如果不想继续调试了,可以用“detach”命令“脱离”进程:
(gdb) detach
Detaching from program: /data/nan/a, process 10210
(gdb) bt
No stack.
如果觉得每次查找进程号麻烦可以使用如下脚本:

# 保存为xgdb.sh(添加可执行权限)
# 用法 xgdb.sh program 
prog_bin=$1
running_name=$(basename $prog_bin)
pid=$(pidof $running_name)
gdb attach $pid

例如我们使用./xgdb.sh thread即可调试正在运行的进程。

调试子进程

代码:

#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>

int main(void) 
{
    pid_t pid;

    pid = fork();
    if (pid == 0)
    {
        printf("child\n");
        exit(1);
    }
    else if (pid > 0)
    {
        printf("parent\n");
        exit(0);
    }
    else
    {
        printf("error\n");
    }
    printf("hello world\n");
    return 0;
}

在调试多进程程序时,gdb默认会追踪父进程。在执行set follow-fork-mode child后可以去跟踪子进程。

这个命令目前Linux支持,其它很多操作系统都不支持,使用时请注意。

同时调试父进程和子进程
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void) 
{
    pid_t pid;

    pid = fork();
    if (pid < 0)
    {
        exit(1);
    }
    else if (pid > 0)
    {
        printf("Parent\n");
        exit(0);
    }
    printf("Child\n");
    return 0;
}

在调试多进程程序时,gdb默认只会追踪父进程的运行,而子进程会独立运行,gdb不会控制。以上面程序为例:
Starting program: /root/100gdb/a.out

Breakpoint 1, main () at child.c:8
8 pid = fork();
(gdb) n
child
9 if (pid == 0)
(gdb) n
14 else if (pid > 0)
(gdb)
可以看到当单步执行到第8行时,程序打印出“Child” ,证明子进程已经开始独立运行。
如果要同时调试父进程和子进程,可以使用“set detach-on-fork off”(默认detach-on-fork是on)命令,这样gdb就能同时调试父子进程,并且在调试一个进程时,另外一个进程处于挂起状态。仍以上面程序为例:

在使用“set detach-on-fork off”命令后,用“i inferiors”(i是info命令缩写)查看进程状态,可以看到父子进程都在被gdb调试的状态,前面显示“*”是正在调试的进程。当父进程退出后,用“inferior infno”切换到子进程去调试。
这个命令目前Linux支持,其它很多操作系统都不支持,使用时请注意。
此外,如果想让父子进程都同时运行,可以使用“set schedule-multiple on”命令(默认off),仍以上述代码为例:

Reading symbols from parent...
(gdb) set schedule-multiple on
(gdb) b main
Breakpoint 1 at 0x115d: file parent.c, line 8.
(gdb) r
Starting program: /root/workspace/gdb/parent 

Breakpoint 1, main () at parent.c:8
8       pid = fork();
(gdb) n
[Detaching after fork from child process 28513]
Child
9       if (pid < 0)
(gdb) n
13      else if (pid > 0)
(gdb) p pid
$1 = 28513
(gdb) n
15          printf("Parent\n");
(gdb) n
Parent
16          exit(0);
(gdb) q

可以看到打印出了“Child”,证明子进程也在运行了。

查看线程信息

#include <stdio.h>
#include <pthread.h>
void *thread_func(void *p_arg)
{
        while (1)
        {
                printf("%s\n", (char*)p_arg);
                sleep(10);
        }
}
int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread_func, "Thread 1");
        pthread_create(&t2, NULL, thread_func, "Thread 2");

        sleep(1000);
        return;
}

用gdb调试多线程程序,可以用“i threads”命令(i是info命令缩写)查看所有线程的信息,以上面程序为例(运行平台为Linux,CPU为X86_64):

(gdb) b main
Breakpoint 1 at 0x400722: file google.c, line 15.
(gdb) r
Starting program: /root/100gdb/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at google.c:15
15  {
(gdb) n
18      pthread_create(&t1, NULL, thread_func, "Thread 1");
(gdb) 
[New Thread 0x7ffff77ef700 (LWP 21849)]
19      pthread_create(&t2, NULL, thread_func, "Thread 2");
(gdb) 
Thread 1
[New Thread 0x7ffff6fee700 (LWP 21856)]
21      sleep(1000);
(gdb) i thread
  Id   Target Id         Frame 
* 1    Thread 0x7ffff7fdd700 (LWP 21836) "a.out" main () at google.c:21
  2    Thread 0x7ffff77ef700 (LWP 21849) "a.out" 0x00007ffff78bc30d in nanosleep ()
    at ../sysdeps/unix/syscall-template.S:84
  3    Thread 0x7ffff6fee700 (LWP 21856) "a.out" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
(gdb) 

第一项(Id):是gdb标示每个线程的唯一ID:1,2等等。
第二项(Target Id):是具体系统平台用来标示每个线程的ID,不同平台信息可能会不同。 像当前Linux平台显示的就是: Thread 0x7ffff77ef700 (LWP 21849)。
第三项(Frame):显示的是线程执行到哪个函数。
前面带“*”表示的是“current thread”,可以理解为gdb调试多线程程序时,选择的一个“默认线程”。
可以用“i threads [Id...]”指定打印某些线程的信息,例如:

(gdb) info threads 1
  Id   Target Id                                   Frame 
* 1    Thread 0x7ffff7da7740 (LWP 28527) "process" main () at process.c:20
(gdb) info threads 1 2
  Id   Target Id                                   Frame 
* 1    Thread 0x7ffff7da7740 (LWP 28527) "process" main () at process.c:20
  2    Thread 0x7ffff7da6700 (LWP 28533) "process" 0x00007ffff7e8b670 in __GI___nanosleep (
    requested_time=requested_time@entry=0x7ffff7da5ea0, remaining=remaining@entry=0x7ffff7da5ea0)
    at ../sysdeps/unix/sysv/linux/nanosleep.c:28
(gdb) info threads 1 
  Id   Target Id                                   Frame 
* 1    Thread 0x7ffff7da7740 (LWP 28527) "process" main () at process.c:20

使用$_thread变量

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

int a = 0;
int b = 0;

void *thread1_func(void *p_arg)
{
    while (1)
    {
        a++;
        sleep(1);
    }
}

void *thread2_func(void *p_arg)
{
    while (1)
    {
        b++;
        sleep(1);
    }
}

int main(void)
{
    pthread_t t1, t2;

    pthread_create(&t1, NULL, thread1_func, "Thread 1");
    pthread_create(&t2, NULL, thread2_func, "Thread 2");

    sleep(1000);
    return 0;
}

gdb从7.2版本引入了$_thread这个“convenience variable”,用来保存当前正在调试的线程号。这个变量在写断点命令或是命令脚本时会很有用。
首先设置了观察点:“wa a”(wa是watch命令缩写),也就是当a的值发生变化时,程序会暂停,接下来在commands语句中打印线程号。


可以看到在程序中断后打印出了进程号 thread id = 2

打印所有线程的堆栈信息

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

void *thread_func(void *p_arg)
{
        while (1)
        {
                printf("%s\n", (char*)p_arg);
                sleep(10);
        }
}
int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread_func, "Thread 1");
        pthread_create(&t2, NULL, thread_func, "Thread 2");

        sleep(1000);
        return 0;
}
Breakpoint 1, main () at process.c:14
14  {
(gdb) n
17          pthread_create(&t1, NULL, thread_func, "Thread 1");
(gdb) n
[New Thread 0x7ffff7da6700 (LWP 28549)]
Thread 1
18          pthread_create(&t2, NULL, thread_func, "Thread 2");
(gdb) n
[New Thread 0x7ffff75a5700 (LWP 28550)]
Thread 2
20          sleep(1000);
(gdb) thread apply all bt

Thread 3 (Thread 0x7ffff75a5700 (LWP 28550)):
#0  0x00007ffff7e8b670 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffff75a4ea0, remaining=remaining@entry=0x7ffff75a4ea0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00007ffff7e8b57a in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2  0x0000555555555187 in thread_func (p_arg=0x55555555600d) at process.c:10
#3  0x00007ffff7f9e182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007ffff7ec7b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7ffff7da6700 (LWP 28549)):
#0  0x00007ffff7e8b670 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffff7da5ea0, remaining=remaining@entry=0x7ffff7da5ea0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00007ffff7e8b57a in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2  0x0000555555555187 in thread_func (p_arg=0x555555556004) at process.c:10
#3  0x00007ffff7f9e182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007ffff7ec7b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff7da7740 (LWP 28545)):
#0  main () at process.c:20
(gdb) thread apply 1 bt

Thread 1 (Thread 0x7ffff7da7740 (LWP 28545)):
#0  main () at process.c:20
(gdb) thread apply 1-2  bt

Thread 1 (Thread 0x7ffff7da7740 (LWP 28545)):
#0  main () at process.c:20

Thread 2 (Thread 0x7ffff7da6700 (LWP 28549)):
#0  0x00007ffff7e8b670 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffff7da5ea0, remaining=remaining@entry=0x7ffff7da5ea0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00007ffff7e8b57a in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2  0x0000555555555187 in thread_func (p_arg=0x555555556004) at process.c:10
#3  0x00007ffff7f9e182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007ffff7ec7b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 

可以受用命令thread apply all bt查看所有线程堆栈信息,同时使用thread-id可以显示部分线程堆栈信息。

只允许一个线程运行

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

int a = 0;
int b = 0;
void *thread1_func(void *p_arg)
{
        while (1)
        {
                a++;
                sleep(1);
        }
}

void *thread2_func(void *p_arg)
{
        while (1)
        {
                b++;
                sleep(1);
        }
}

int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread1_func, "Thread 1");
        pthread_create(&t2, NULL, thread2_func, "Thread 2");

        sleep(1000);
        return 0;
}
Reading symbols from thread...
(gdb) b thread.c:9
Breakpoint 1 at 0x1161: file thread.c, line 11.
(gdb) r
Starting program: /root/workspace/gdb/thread 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7da6700 (LWP 28631)]
[New Thread 0x7ffff75a5700 (LWP 28632)]
[Switching to Thread 0x7ffff7da6700 (LWP 28631)]

Thread 2 "thread" hit Breakpoint 1, thread1_func (
    p_arg=0x555555556004) at thread.c:11
11                  a++;
(gdb) p b
$1 = 1
(gdb) n
12                  sleep(1);
(gdb) 
11                  a++;
(gdb) 

Thread 2 "thread" hit Breakpoint 1, thread1_func (
    p_arg=0x555555556004) at thread.c:11
11                  a++;
(gdb) p b
$2 = 5
(gdb) 

thread1_func更新全局变量a的值,thread2_func更新全局变量b的值。我在thread1_func里a++语句打上断点,当断点第一次命中时,打印b的值是1,在单步调试thread1_func几次后,b的值变成5,证明在单步调试thread1_func时,thread2_func也在执行。
如果想在调试一个线程时,让其它线程暂停执行,可以使用“set scheduler-locking on”命令:

Reading symbols from thread...
(gdb) b thread
thread.c      thread1_func  thread2_func  
(gdb) b thread.c:9
Breakpoint 1 at 0x1161: file thread.c, line 11.
(gdb) r
Starting program: /root/workspace/gdb/thread 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7da6700 (LWP 28646)]
[New Thread 0x7ffff75a5700 (LWP 28647)]
[Switching to Thread 0x7ffff7da6700 (LWP 28646)]

Thread 2 "thread" hit Breakpoint 1, thread1_func (p_arg=0x555555556004) at thread.c:11
11                  a++;
(gdb) set scheduler-locking on
(gdb) p b
$1 = 1
(gdb) n
12                  sleep(1);
(gdb) n
11                  a++;
(gdb) n

Thread 2 "thread" hit Breakpoint 1, thread1_func (p_arg=0x555555556004) at thread.c:11
11                  a++;
(gdb) n
12                  sleep(1);
(gdb) n
n11                 a++;
(gdb) n

Thread 2 "thread" hit Breakpoint 1, thread1_func (p_arg=0x555555556004) at thread.c:11
11                  a++;
(gdb) n
12                  sleep(1);
(gdb) p a
$2 = 3
(gdb) p b
$3 = 1

可以看到在单步调试thread1_func几次后,b的值仍然为1,证明在在单步调试thread1_func时,thread2_func没有执行。