본문 바로가기

Programming

How ptrace works in Linux

ptrace 가 어떻게 차일드 프로세스의 시스템콜 호출을 통지받을수 있는가? 답은 요약하자면 커널내부 시스템콜 핸들러 코드자체에서 현재 프로세스의 task_struct 속에 프로세스가 ptracing 당하고있는지를 확인하는 flag 가 있어서, 그 경우 parent 에게 signal 을 날려주는식으로 구현이 되어있기 때문이다.



Q. When the attached child process invokes a system call, the ptracing parent process can be notified. But how exactly does that happen?



A. Parent process calls ptrace with PTRACE_ATTACH, and his child calls ptrace with PTRACE_TRACEME option. This pair will connect two processes by filling some fields inside their task_struct (kernel/ptrace.c: sys_ptrace, child will have PT_PTRACED flag in ptrace field of struct task_struct, and pid of ptracer process as parent and in ptrace_entry list - __ptrace_link).


Then strace will call ptrace with PTRACE_SYSCALL flag to register itself as syscall debugger, setting thread_flag TIF_SYSCALL_TRACE in child process's struct thread_info (set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);). arch/x86/include/asm/thread_info.h:


 67 /*

 68  * thread information flags

 69  * - these are process state flags that various assembly files

 70  *   may need to access   ...*/


 75 #define TIF_SYSCALL_TRACE       0       /* syscall trace active */

On every syscall entry or exit, architecture-specific syscall entry code will check this flag (directly in assembler implementation of syscall, for example x86 arch/x86/kernel/entry_32.S: jnz syscall_trace_entry in ENTRY(system_call) and similar code in syscall_exit_work), and if it is set, ptracer will be notified with signal (SIGTRAP) and child will be temporary stopped. This is done usually in syscall_trace_enter and syscall_trace_leave :


1457 long syscall_trace_enter(struct pt_regs *regs)


1483         if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) &&

1484             tracehook_report_syscall_entry(regs))

1485                 ret = -1L;


1507 void syscall_trace_leave(struct pt_regs *regs)


1531         if (step || test_thread_flag(TIF_SYSCALL_TRACE))

1532                 tracehook_report_syscall_exit(regs, step);

The tracehook_report_syscall_* are actual workers here, they will call ptrace_report_syscall. include/linux/tracehook.h:


 80 /**

 81  * tracehook_report_syscall_entry - task is about to attempt a system call

 82  * @regs:               user register state of current task

 83  *

 84  * This will be called if %TIF_SYSCALL_TRACE has been set, when the

 85  * current task has just entered the kernel for a system call.

 86  * Full user register state is available here.  Changing the values

 87  * in @regs can affect the system call number and arguments to be tried.

 88  * It is safe to block here, preventing the system call from beginning.

 89  *

 90  * Returns zero normally, or nonzero if the calling arch code should abort

 91  * the system call.  That must prevent normal entry so no system call is

 92  * made.  If @task ever returns to user mode after this, its register state

 93  * is unspecified, but should be something harmless like an %ENOSYS error

 94  * return.  It should preserve enough information so that syscall_rollback()

 95  * can work (see asm-generic/syscall.h).

 96  *

 97  * Called without locks, just after entering kernel mode.

 98  */

 99 static inline __must_check int tracehook_report_syscall_entry(

100         struct pt_regs *regs)

101 {

102         return ptrace_report_syscall(regs);

103 }

104 

105 /**

106  * tracehook_report_syscall_exit - task has just finished a system call

107  * @regs:               user register state of current task

108  * @step:               nonzero if simulating single-step or block-step

109  *

110  * This will be called if %TIF_SYSCALL_TRACE has been set, when the

111  * current task has just finished an attempted system call.  Full

112  * user register state is available here.  It is safe to block here,

113  * preventing signals from being processed.

114  *

115  * If @step is nonzero, this report is also in lieu of the normal

116  * trap that would follow the system call instruction because

117  * user_enable_block_step() or user_enable_single_step() was used.

118  * In this case, %TIF_SYSCALL_TRACE might not be set.

119  *

120  * Called without locks, just before checking for pending signals.

121  */

122 static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)

123 {

...

130 

131         ptrace_report_syscall(regs);

132 }

And ptrace_report_syscall generates SIGTRAP for debugger or strace:


 55 /*

 56  * ptrace report for syscall entry and exit looks identical.

 57  */

 58 static inline int ptrace_report_syscall(struct pt_regs *regs)

 59 {

 60         int ptrace = current->ptrace;

 61 

 62         if (!(ptrace & PT_PTRACED))

 63                 return 0;

 64 

 65         ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));

 66 

 67         /*

 68          * this isn't the same as continuing with a signal, but it will do

 69          * for normal use.  strace only continues with a signal if the

 70          * stopping signal is not SIGTRAP.  -brl

 71          */

 72         if (current->exit_code) {

 73                 send_sig(current->exit_code, current, 1);

 74                 current->exit_code = 0;

 75         }

 76 

 77         return fatal_signal_pending(current);

 78 }