缘起:
有这么一些服务器软件(如: tokyotyrant、memcached),他们提供长连接的功能,简单说,server端基本不会主动关闭连接的;非长连接的时候也会存在这种问题,只是长连接时会表现的更明显一下。如果客户端抽风(或者一些恶意的目的),打开连接后,并不关闭而直接离开,这样,就会在server端残留大量的连接,最直接的表象就是: netstat -anpt 时很慢。
真实案例:
具体原因未明,来个命令:
sar -n SOCK :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
11:00:01 AM totsck tcpsck udpsck rawsck ip-frag ... 12:20:01 PM 291 198 6 0 0 12:30:01 PM 275 182 6 0 0 12:40:01 PM 284 194 6 0 0 12:50:02 PM 285 192 6 0 0 01:00:01 PM 300 207 7 0 0 01:10:01 PM 299 207 6 0 0 01:30:03 PM 434614 374694 6 0 0 01:50:03 PM 871925 668143 6 0 0 02:00:01 PM 830446 639373 6 0 0 02:10:01 PM 442958 442852 7 0 0 02:20:01 PM 442929 442826 6 0 0 02:30:01 PM 442895 442792 6 0 0 02:40:01 PM 442888 442785 6 0 0 02:50:01 PM 442903 442797 6 0 0 03:00:01 PM 442889 442785 6 0 0 03:10:01 PM 442892 442787 6 0 0 03:20:01 PM 440122 440017 6 0 0 03:30:01 PM 439761 439659 7 0 0 03:40:01 PM 161380 161281 6 0 0 03:50:01 PM 216 118 6 0 0 04:00:01 PM 218 118 6 0 0 04:10:01 PM 222 124 6 0 0 |
我们发现在1:30 PM时,tcp连接数突然飙升到 40万, 没有做任何操作3:40 PM时,连接数就开始减少了。
1 2 3 |
# cat /var/log/messages | grep "port 6101" Nov 26 13:28:43 localhost kernel: [4771814.322915] possible SYN flooding on port 6101. Sending cookies. Nov 26 13:32:02 localhost kernel: [4772012.800915] possible SYN flooding on port 6101. Sending cookies. |
原因:
1 2 3 4 |
# /sbin/sysctl -a| grep keep net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 7200 |
连接2小时没有操作,探活机制发现对方已经消失,于是,关闭了连接; 看了一下tokyotyrant 和 memcached的源码,都做了KEEPALIVE 的设置:
1 |
./ttutil.c: setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, (char *)&optint, sizeof(optint)); |
1 |
./memcached.c: setsockopt(sfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&flags, sizeof(flags)); |
只是都没有做平静时间的设置,于是就都参考系统的设置了。
教训:
1. 如果没有keepalive的机制,系统运行时间过长总会出现这样的问题的
学习:
关于keep-alive的一些参数设置,摘自: man tcp :
tcp_keepalive_intvl (integer; default: 75)
The number of seconds between TCP keep-alive probes.
如果探活时,对方没有反应则,间隔 tcp_keepalive_intvl 秒后重发探活数据包,避免因丢包产生的探活差错
tcp_keepalive_probes (integer; default: 9)
The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no
response is obtained from the other end.
如果发送了 tcp_keepalive_probes 次探活数据包,对方都没有反应,则视为对方死掉,而不是探活1次失败就放弃
tcp_keepalive_time (integer; default: 7200)
The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes.
Keep-alives are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 sec-
onds (2 hours). An idle connection is terminated after approximately an additional 11 minutes (9 probes
an interval of 75 seconds apart) when keep-alive is enabled.
Note that underlying connection tracking mechanisms and application timeouts may be much shorter.
从连接平静开始,间隔 tcp_keepalive_time 秒开始探活,避免对方异常退出
编程实现:
使用setsockopt(…) 函数来实现:
1 2 3 4 |
TCP_KEEPALIVE : set idle time in milliseconds TCP_KEEPIDLE : set idle time in seconds TCP_KEEPINTVL : set keep alive time interval TCP_KEEPCNT : set number of keep alive probes. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
#define SERV_HOST_ADDR "192.168.1.40" #define SERV_HOST_PORT 8000 void receiver(void) { int socket_fd, ret; struct sockaddr_in serv_addr; int num_recv_bytes =0; int val=1; int keepalive_idle_time = 5000; /* number of milli seconds */ if ((socket_fd = socket(AF_INET,SOCK_STREAM,0)) == -1) { return; // error } memset((char *)&serv_addr,0,sizeof(serv_addr)); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = inet_addr(SERV_HOST_ADDR); serv_addr.sin_port = htons(SERV_TCP_PORT); /* set the idle time after which tcp keepalive probes should start */ if( setsockopt(socket_fd,IPPROTO_TCP,TCP_KEEPALIVE,(void*)&keepalive_idle_time,sizeof(keepalive_idle_time)) == -1) { printf("setsockopt TCP_KEEPALIVE failed\n"); return; } if( setsockopt(socket_fd,SOL_SOCKET,SO_KEEPALIVE,(void*)&val,sizeof(val)) == -1) { printf("setsockopt SO_KEEPALIVE failed\n"); return; } if (connect(socket_fd,(struct sockaddr *) &serv_addr, sizeof(serv_addr)) <0) { ret = send(socket_fd,buffer,strlen(buffer),0); // Receive the same string, if idle then break // memset(rcv_buffer,0,sizeof(rcv_buffer)); while(1) { num_recv_bytes = recv(socket_fd,rcv_buffer,BUFFER_LENGTH,0); if(num_recv_bytes < 0) { printf("Receiver: failed exiting\n",buffer); return; } else if (num_recv_bytes == 0) { /* Connection is closed by the tcp keep alive timer upon timeout we will break out of the while loop and close the socket. */ printf("Connection IDLE Keep-Alive timer terminated connection\n"); break; } } // close the socket close(socket_fd); } } |
参考资料: http://ez.analog.com/docs/DOC-1862