最近,经常出现我的blog不能响应的问题,起初怀疑网络问题,因为我是部署在阿里云的。后来,忍无可忍,查了一下:
- 首先,浏览器中不能响应的时候,curl还是能正常访问的
- 其次,通过tcpdump抓包发现,确实是请求发送到了server,而server没有回应
- strace跟踪server端进程,发现大量进程在试图flock 一个文件,lsof查看,发现确实在试图lock我的session文件
- 肯定有一个进程lock住session文件后,在干别的耗时的事情,一直也没干完
- 继续strace打开session文件的每一个进程,必然会有一个文件打开了session文件,但是没有处于flock系统调用阶段,果然,是进程15822
- pstack 15822, 结果如下:
123456789101112131415161718192021222324252627282930313233343536373839404142#0 0x00007f08b8813248 in poll () from /lib64/libc.so.6#1 0x00007f08ac57a8f6 in Curl_socket_ready () from /usr/lib64/libcurl.so.4#2 0x00007f08ac571c11 in ?? () from /usr/lib64/libcurl.so.4#3 0x00007f08ac571ee1 in Curl_connecthost () from /usr/lib64/libcurl.so.4#4 0x00007f08ac566710 in Curl_connect () from /usr/lib64/libcurl.so.4#5 0x00007f08ac56e8c0 in Curl_perform () from /usr/lib64/libcurl.so.4#6 0x00007f08ac79c414 in ?? () from /usr/lib64/php/modules/curl.so#7 0x00007f08af712769 in dtrace_execute_internal () from /etc/httpd/modules/libphp7.so#8 0x00007f08af798bd2 in ?? () from /etc/httpd/modules/libphp7.so#9 0x00007f08af760960 in execute_ex () from /etc/httpd/modules/libphp7.so#10 0x00007f08af7128ae in dtrace_execute_ex () from /etc/httpd/modules/libphp7.so#11 0x00007f08af798a4a in ?? () from /etc/httpd/modules/libphp7.so#12 0x00007f08af760960 in execute_ex () from /etc/httpd/modules/libphp7.so#13 0x00007f08af7128ae in dtrace_execute_ex () from /etc/httpd/modules/libphp7.so#14 0x00007f08af798a4a in ?? () from /etc/httpd/modules/libphp7.so#15 0x00007f08af760960 in execute_ex () from /etc/httpd/modules/libphp7.so#16 0x00007f08af7128ae in dtrace_execute_ex () from /etc/httpd/modules/libphp7.so#17 0x00007f08af798a4a in ?? () from /etc/httpd/modules/libphp7.so#18 0x00007f08af760960 in execute_ex () from /etc/httpd/modules/libphp7.so#19 0x00007f08af7128ae in dtrace_execute_ex () from /etc/httpd/modules/libphp7.so#20 0x00007f08af798a4a in ?? () from /etc/httpd/modules/libphp7.so#21 0x00007f08af760960 in execute_ex () from /etc/httpd/modules/libphp7.so#22 0x00007f08af7128ae in dtrace_execute_ex () from /etc/httpd/modules/libphp7.so#23 0x00007f08af798a4a in ?? () from /etc/httpd/modules/libphp7.so#24 0x00007f08af760960 in execute_ex () from /etc/httpd/modules/libphp7.so#25 0x00007f08af7128ae in dtrace_execute_ex () from /etc/httpd/modules/libphp7.so#26 0x00007f08af798a4a in ?? () from /etc/httpd/modules/libphp7.so#27 0x00007f08af760960 in execute_ex () from /etc/httpd/modules/libphp7.so#28 0x00007f08af7128ae in dtrace_execute_ex () from /etc/httpd/modules/libphp7.so#29 0x00007f08af7b3f4b in zend_execute () from /etc/httpd/modules/libphp7.so#30 0x00007f08af721233 in zend_execute_scripts () from /etc/httpd/modules/libphp7.so#31 0x00007f08af6c28c0 in php_execute_script () from /etc/httpd/modules/libphp7.so#32 0x00007f08af7b7e3d in ?? () from /etc/httpd/modules/libphp7.so#33 0x00007f08ba225fc0 in ap_run_handler ()#34 0x00007f08ba22987e in ap_invoke_handler ()#35 0x00007f08ba234fb0 in ap_process_request ()#36 0x00007f08ba231df8 in ?? ()#37 0x00007f08ba22dac8 in ap_run_process_connection ()#38 0x00007f08ba239d57 in ?? ()#39 0x00007f08ba23a079 in ?? ()#40 0x00007f08ba23acfc in ap_mpm_run ()#41 0x00007f08ba211aa0 in main ()
- 显然,该进程正在试图通过curl访问一个外部资源,应该是连接不上,lsof 查看部分结果如下:
- 一切不出所料,但是,至此,我根据上面信息还无法知道究竟是哪个逻辑要访问该资源
- 借助php源码中提供的 .gdbinit 通过zbacktrace来看,不过忙了一会儿,进程不在了,下次再说吧
- N 天后,相同问题再次出现,本次进程id: 826
- 直接 gdb -p 816 然后zbacktrace
最后发现由于插件: /data1/www/htdocs/phpor.net/blog/wp-content/plugins/google-analytics-dashboard/ga-lib.php 导致,相关域名: www.googleapis.com ; 域名解析发现果然和上述IP契合 - 解决办法,直接禁用该插件