strace 解决 springboot 服务 时通时不通故障


现象

通过http访问springboot服务时,时而正常,时而超时

image-20200902125227774

背景

springboot webflux 开发

centos7

原因分析

1、排除网络问题

1
tcpdump -n -i eth0 '((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0' and host 172.16.160.12 and host 172.16.247.230 and port 18092  -XX

使用tcpdump抓包,发现 12:18:14.286806 收到请求,直到 12:40:33.668631 才将结果返回。证明网络链路是通的,但在近20分钟的时间里,服务究竟在干嘛?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
 12:18:14.286806 IP 172.16.160.12.50966 > 172.16.247.230.18092: Flags [P.], seq 200172037:200172579, ack 1033672370, win 16652, options [nop,nop,TS val 15683831 ecr 93358138], length 542
0x0000: 0022 d914 1734 1051 72f1 3f48 0800 4500 ."...4.Qr.?H..E.
0x0010: 0252 431b 4000 7b06 ca76 ac10 a00c ac10 .RC.@.{..v......
0x0020: f7e6 c716 46ac 0bee 6205 3d9c 96b2 8018 ....F...b.=.....
0x0030: 410c 8422 0000 0101 080a 00ef 50f7 0590 A.."........P...
0x0040: 883a 504f 5354 202f 5765 6253 6572 7669 .:POST./WebServi
0x0050: 6365 5072 6f78 792f 5072 6f78 7953 6572 ceProxy/ProxySer
0x0060: 7669 6365 2048 5454 502f 312e 310d 0a55 vice.HTTP/1.1..U
0x0070: 7365 722d 4167 656e 743a 2050 6f73 746d ser-Agent:.Postm
0x0080: 616e 5275 6e74 696d 652f 372e 3236 2e33 anRuntime/7.26.3
0x0090: 0d0a 4163 6365 7074 3a20 2a2f 2a0d 0a43 ..Accept:.*/*..C
0x00a0: 6163 6865 2d43 6f6e 7472 6f6c 3a20 6e6f ache-Control:.no
0x00b0: 2d63 6163 6865 0d0a 506f 7374 6d61 6e2d -cache..Postman-
0x00c0: 546f 6b65 6e3a 2062 3965 3130 3561 322d Token:.b9e105a2-
0x00d0: 3432 3632 2d34 3430 392d 3935 3964 2d37 4262-4409-959d-7
0x00e0: 3937 3431 3038 6133 6363 620d 0a48 6f73 974108a3ccb..Hos
0x00f0: 743a 2031 3732 2e31 362e 3234 372e 3233 t:.172.16.247.23
0x0100: 303a 3138 3039 320d 0a41 6363 6570 742d 0:18092..Accept-
0x0110: 456e 636f 6469 6e67 3a20 677a 6970 2c20 Encoding:.gzip,.
0x0120: 6465 666c 6174 652c 2062 720d 0a43 6f6e deflate,.br..Con
0x0130: 7465 6e74 2d54 7970 653a 2061 7070 6c69 tent-Type:.appli
0x0140: 6361 7469 6f6e 2f78 2d77 7777 2d66 6f72 cation/x-www-for
0x0150: 6d2d 7572 6c65 6e63 6f64 6564 0d0a 436f m-urlencoded..Co
0x0160: 6e74 656e 742d 4c65 6e67 7468 3a20 3233 ntent-Length:.23
0x0170: 350d 0a0d 0a74 6172 6765 743d 4b45 5353 5....target=KESS
0x0180: 266d 6574 686f 644e 616d 653d 7175 6572 &methodName=quer
0x0190: 7943 7573 7442 6173 6963 496e 666f 4c69 yCustBasicInfoLi
0x01a0: 7374 2669 6e50 6172 616d 733d 2537 4225 st&inParams=%7B%
0x01b0: 3232 7265 7175 6573 7425 3232 2533 4125 22request%22%3A%
0x01c0: 3742 2532 3261 7574 6825 3232 2533 4125 7B%22auth%22%3A%
0x01d0: 3742 2532 326f 7065 7261 746f 7225 3232 7B%22operator%22
0x01e0: 2533 4138 3939 3030 3334 2537 4425 3243 %3A8990034%7D%2C
0x01f0: 2532 3264 6174 6125 3232 2533 4125 3742 %22data%22%3A%7B
0x0200: 2532 3255 5345 525f 524f 4c45 2532 3225 %22USER_ROLE%22%
0x0210: 3341 2532 3225 3232 2532 4325 3232 5553 3A%22%22%2C%22US
0x0220: 4552 5f43 4f44 4525 3232 2533 4125 3232 ER_CODE%22%3A%22
0x0230: 3831 3130 3635 3030 3036 3925 3232 2532 81106500069%22%2
0x0240: 4325 3232 4f50 5f54 5950 4525 3232 2533 C%22OP_TYPE%22%3
0x0250: 4125 3232 2532 3225 3744 2537 4425 3744 A%22%22%7D%7D%7D
12:40:33.668631 IP 172.16.247.230.18092 > 172.16.160.12.50966: Flags [P.], seq 1:1661, ack 542, win 235, options [nop,nop,TS val 94697532 ecr 15683831], length 1660
0x0000: 1051 72f1 3f48 0022 d914 1734 0800 4500 .Qr.?H."...4..E.
0x0010: 06b0 1e8d 4000 4006 25a7 ac10 f7e6 ac10 ....@.@.%.......
0x0020: a00c 46ac c716 3d9c 96b2 0bee 6423 8018 ..F...=.....d#..
0x0030: 00eb f6b6 0000 0101 080a 05a4 f83c 00ef .............<..
0x0040: 50f7 4854 5450 2f31 2e31 2032 3030 204f P.HTTP/1.1.200.O
0x0050: 4b0d 0a43 6f6e 7465 6e74 2d54 7970 653a K..Content-Type:
0x0060: 2061 7070 6c69 6361 7469 6f6e 2f6a 736f .application/jso
0x0070: 6e3b 6368 6172 7365 743d 5554 462d 380d n;charset=UTF-8.
0x0080: 0a43 6f6e 7465 6e74 2d4c 656e 6774 683a .Content-Length:
0x0090: 2031 3537 330d 0a0d 0a7b 2272 6573 756c .1573....{"resul
0x00a0: 7422 3a7b 2266 6c61 6722 3a22 3122 2c22 t":{"flag":"1","
0x00b0: 7072 6f6d 7074 223a 2273 7563 6365 7373 prompt":"success
0x00c0: 6675 6c22 2c22 6c65 6e67 7468 223a 2231 ful","length":"1
0x00d0: 227d 2c22 7265 636f 7264 223a 7b22 726f "},"record":{"ro
0x00e0: 7722 3a7b 2255 5345 525f 434f 4445 223a w":{"USER_CODE":
0x00f0: 2238 3131 3036 3530 3030 3639 222c 2255 "81106500069","U
0x0100: 5345 525f 524f 4c45 5322 3a22 3122 2c22 SER_ROLES":"1","
0x0110: 5553 4552 5f4e 414d 4522 3a22 e891 a3e5 USER_NAME":"....
0x0120: bbba e5b9 b322 2c22 5553 4552 5f54 5950 .....","USER_TYP
0x0130: 4522 3a22 3022 2c22 494e 545f 4f52 4722 E":"0","INT_ORG"
0x0140: 3a22 3831 3122 2c22 4944 5f54 5950 4522 :"811","ID_TYPE"
0x0150: 3a22 3030 222c 2249 445f 434f 4445 223a :"00","ID_CODE":
0x0160: 2236 3330 3130 3331 3936 3530 3831 3930 "630103196508190
0x0170: 3431 3522 2c22 4f50 454e 5f44 4154 4522 415","OPEN_DATE"
0x0180: 3a22 3230 3036 3132 3031 222c 2243 4c4f :"20061201","CLO
0x0190: 5345 5f44 4154 4522 3a22 3022 2c22 5553 SE_DATE":"0","US
0x01a0: 4552 5f46 4e41 4d45 223a 22e8 91a3 e5bb ER_FNAME":".....
0x01b0: bae5 b9b3 222c 2249 445f 4953 535f 4147 ....","ID_ISS_AG
0x01c0: 4359 223a 22e4 b8ad e59b bd22 2c22 4944 CY":"......","ID
0x01d0: 5f45 5850 5f44 4154 4522 3a22 3230 3231 _EXP_DATE":"2021
0x01e0: 3036 3037 222c 2249 445f 5a49 505f 434f 0607","ID_ZIP_CO
0x01f0: 4445 223a 2238 3130 3030 3122 2c22 4944 DE":"810001","ID
0x0200: 5f41 4444 5222 3a22 e4b9 9de5 b79e e8af _ADDR":"........
0x0210: 81e5 88b8 2022 2c22 5a49 505f 434f 4445 .....","ZIP_CODE
0x0220: 223a 2238 3130 3030 3122 2c22 4144 4452 ":"810001","ADDR
0x0230: 4553 5322 3a22 e4b9 9de5 b79e e8af 81e5 ESS":"..........
0x0240: 88b8 e4bb b0e5 b1b1 222c 2254 454c 223a ........","TEL":
0x0250: 5b5d 2c22 4641 5822 3a5b 5d2c 2245 4d41 [],"FAX":[],"EMA
0x0260: 494c 223a 5b5d 2c22 4d4f 4249 4c45 5f54 IL":[],"MOBILE_T
0x0270: 454c 223a 2231 3334 3131 3131 3232 3232 EL":"13411112222
0x0280: 222c 2243 4954 495a 454e 5348 4950 223a ","CITIZENSHIP":
0x0290: 2243 484e 222c 224e 4154 494f 4e41 4c49 "CHN","NATIONALI
0x02a0: 5459 223a 2230 3022 2c22 4544 5543 4154 TY":"00","EDUCAT
0x02b0: 494f 4e22 3a22 3322 2c22 4e41 5449 5645 ION":"3","NATIVE
0x02c0: 5f50 4c41 4345 223a 2230 222c 2253 4558 _PLACE":"0","SEX
0x02d0: 223a 2230 222c 2242 4952 5448 4441 5922 ":"0","BIRTHDAY"
0x02e0: 3a22 3139 3635 3038 3139 222c 2252 454d :"19650819","REM
0x02f0: 4152 4b22 3a5b 5d2c 224d 4152 5259 223a ARK":[],"MARRY":
0x0300: 2231 222c 2241 564f 4341 5449 4f4e 223a "1","AVOCATION":
0x0310: 5b5d 2c22 5645 4849 434c 4522 3a5b 5d2c [],"VEHICLE":[],
0x0320: 2248 4f55 5345 5f4f 574e 4552 223a 5b5d "HOUSE_OWNER":[]
0x0330: 2c22 4f46 4649 4345 5f54 454c 223a 5b5d ,"OFFICE_TEL":[]
0x0340: 2c22 5745 4c4c 5f54 454c 223a 5b5d 2c22 ,"WELL_TEL":[],"
0x0350: 4c49 4e4b 5445 4c5f 4f52 4445 5222 3a22 LINKTEL_ORDER":"
0x0360: 3122 2c22 4f46 4649 4345 5f41 4444 5222 1","OFFICE_ADDR"
0x0370: 3a5b 5d2c 2243 4f52 505f 4144 4452 223a :[],"CORP_ADDR":
0x0380: 22e4 b99d e5b7 9ee8 af81 e588 b8e4 bbb0 "...............
0x0390: e5b1 b122 2c22 4944 5f42 4547 5f44 4154 ...","ID_BEG_DAT
0x03a0: 4522 3a22 3230 3031 3036 3037 222c 224c E":"20010607","L
0x03b0: 494e 4b41 4444 525f 4f52 4445 5222 3a22 INKADDR_ORDER":"
0x03c0: 3122 2c22 4f50 454e 5f53 4f55 5243 4522 1","OPEN_SOURCE"
0x03d0: 3a22 3022 2c22 5445 4c5f 4348 4b5f 464c :"0","TEL_CHK_FL
0x03e0: 4147 223a 2230 222c 2245 4d41 494c 5f43 AG":"0","EMAIL_C
0x03f0: 484b 5f46 4c41 4722 3a5b 5d2c 2246 494e HK_FLAG":[],"FIN
0x0400: 5f45 4455 5f46 4c41 4722 3a5b 5d2c 2243 _EDU_FLAG":[],"C
0x0410: 5553 545f 5354 4154 5553 223a 2230 222c UST_STATUS":"0",
0x0420: 2253 5a4f 5247 5459 5045 223a 5b5d 2c22 "SZORGTYPE":[],"
0x0430: 4e41 5449 4f4e 414c 5f41 5454 5222 3a5b NATIONAL_ATTR":[
0x0440: 5d2c 224c 4953 5445 445f 4154 5452 223a ],"LISTED_ATTR":
0x0450: 5b5d 2c22 454e 5445 5250 5249 5345 5f4c [],"ENTERPRISE_L
0x0460: 4556 454c 223a 5b5d 2c22 4355 5354 5f54 EVEL":[],"CUST_T
0x0470: 5950 4522 3a22 3022 2c22 4348 414e 4e45 YPE":"0","CHANNE
0x0480: 4c53 223a 2233 3234 3039 4775 7a22 2c22 LS":"32409Guz","
0x0490: 4355 5354 5f43 4c53 223a 2261 222c 2243 CUST_CLS":"a","C
0x04a0: 5553 545f 4558 545f 4154 5452 223a 2230 UST_EXT_ATTR":"0
0x04b0: 222c 2253 5045 4349 414c 5f53 5441 5455 ","SPECIAL_STATU
0x04c0: 5322 3a22 3022 2c22 494e 4f55 5453 4944 S":"0","INOUTSID
0x04d0: 455f 4944 454e 5449 5459 223a 2230 222c E_IDENTITY":"0",
0x04e0: 2243 5249 5445 5249 4f4e 223a 227a 222c "CRITERION":"z",
0x04f0: 2241 474d 545f 5249 534b 5f46 4143 544f "AGMT_RISK_FACTO
0x0500: 5222 3a22 4d50 222c 224f 5045 4e5f 4143 R":"MP","OPEN_AC
0x0510: 4354 5f4d 4f4e 5448 223a 2230 222c 2249 CT_MONTH":"0","I
0x0520: 445f 4558 5022 3a22 3122 2c22 494e 434f D_EXP":"1","INCO
0x0530: 4d45 223a 2230 222c 2254 5241 4445 223a ME":"0","TRADE":
0x0540: 2232 222c 224f 4343 555f 5459 5045 223a "2","OCCU_TYPE":
0x0550: 2239 3922 2c22 4649 4e5f 4d41 4e41 4745 "99","FIN_MANAGE
0x0560: 525f 464c 4147 223a 5b5d 2c22 4649 4e5f R_FLAG":[],"FIN_
0x0570: 574f 524b 5f54 494d 4522 3a5b 5d2c 2246 WORK_TIME":[],"F
0x0580: 494e 5f51 5541 4c5f 4345 5254 223a 5b5d IN_QUAL_CERT":[]
0x0590: 2c22 414e 4e55 414c 5f49 4e43 4f4d 4522 ,"ANNUAL_INCOME"
0x05a0: 3a5b 5d2c 2241 4e4e 5541 4c5f 494e 434f :[],"ANNUAL_INCO
0x05b0: 4d45 5f43 484b 223a 5b5d 2c22 414e 5449 ME_CHK":[],"ANTI
0x05c0: 5f4d 4f4e 4559 4c41 554e 4452 595f 5459 _MONEYLAUNDRY_TY
0x05d0: 5045 223a 5b5d 2c22 564f 4341 5449 4f4e PE":[],"VOCATION
0x05e0: 223a 2242 222c 2246 4953 4c5f 454d 4149 ":"B","FISL_EMAI
0x05f0: 4c22 3a5b 5d2c 224f 5054 5f45 4d41 494c L":[],"OPT_EMAIL
0x0600: 223a 5b5d 2c22 574f 524b 504c 4143 4522 ":[],"WORKPLACE"
0x0610: 3a5b 5d2c 2250 5241 4354 4943 455f 4441 :[],"PRACTICE_DA
0x0620: 5445 223a 2230 2020 2020 2020 2022 2c22 TE":"0.......","
0x0630: 494e 4455 535f 5459 5045 223a 2239 3922 INDUS_TYPE":"99"
0x0640: 2c22 4253 425f 5553 4552 5f46 4e41 4d45 ,"BSB_USER_FNAME
0x0650: 223a 5b5d 2c22 4253 425f 4944 5f54 5950 ":[],"BSB_ID_TYP
0x0660: 4522 3a5b 5d2c 2242 5342 5f49 445f 434f E":[],"BSB_ID_CO
0x0670: 4445 223a 5b5d 2c22 4253 425f 4944 5f45 DE":[],"BSB_ID_E
0x0680: 5850 5f44 4154 4522 3a22 3022 2c22 4944 XP_DATE":"0","ID
0x0690: 5f54 5950 4532 223a 5b5d 2c22 4944 5f43 _TYPE2":[],"ID_C
0x06a0: 4f44 4532 223a 5b5d 2c22 4944 5f45 5850 ODE2":[],"ID_EXP
0x06b0: 5f44 4154 4532 223a 2230 227d 7d7d _DATE2":"0"}}}
12:40:33.678422 IP 172.16.247.230.18092 > 172.16.160.12.50966: Flags [P.], seq 1449:1661, ack 542, win 235, options [nop,nop,TS val 94697542 ecr 15683831], length 212
0x0000: 1051 72f1 3f48 0022 d914 1734 0800 4500 .Qr.?H."...4..E.
0x0010: 0108 1e8f 4000 4006 2b4d ac10 f7e6 ac10 ....@.@.+M......
0x0020: a00c 46ac c716 3d9c 9c5a 0bee 6423 8018 ..F...=..Z..d#..
0x0030: 00eb f10e 0000 0101 080a 05a4 f846 00ef .............F..
0x0040: 50f7 4c5f 454d 4149 4c22 3a5b 5d2c 224f P.L_EMAIL":[],"O
0x0050: 5054 5f45 4d41 494c 223a 5b5d 2c22 574f PT_EMAIL":[],"WO
0x0060: 524b 504c 4143 4522 3a5b 5d2c 2250 5241 RKPLACE":[],"PRA
0x0070: 4354 4943 455f 4441 5445 223a 2230 2020 CTICE_DATE":"0..
0x0080: 2020 2020 2022 2c22 494e 4455 535f 5459 .....","INDUS_TY
0x0090: 5045 223a 2239 3922 2c22 4253 425f 5553 PE":"99","BSB_US
0x00a0: 4552 5f46 4e41 4d45 223a 5b5d 2c22 4253 ER_FNAME":[],"BS
0x00b0: 425f 4944 5f54 5950 4522 3a5b 5d2c 2242 B_ID_TYPE":[],"B
0x00c0: 5342 5f49 445f 434f 4445 223a 5b5d 2c22 SB_ID_CODE":[],"
0x00d0: 4253 425f 4944 5f45 5850 5f44 4154 4522 BSB_ID_EXP_DATE"
0x00e0: 3a22 3022 2c22 4944 5f54 5950 4532 223a :"0","ID_TYPE2":
0x00f0: 5b5d 2c22 4944 5f43 4f44 4532 223a 5b5d [],"ID_CODE2":[]
0x0100: 2c22 4944 5f45 5850 5f44 4154 4532 223a ,"ID_EXP_DATE2":
0x0110: 2230 227d 7d7d "0"}}}

2、排除ipv4和ipv6双栈问题

只抓ipv4包

1
tcpdump -nv -i eth0 '((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0' and host 172.16.160.12 and host 172.16.247.230 and port 18092  -XX

只抓ipv6包

2、 strace查看系统调用

1
2
3
4
5
6
7
8
9
$ jps -ml
23649 kess-hr-proxy-test-1.0.jar
23586 kess-hr-proxy-dev-1.0.jar
86133 sun.tools.jps.Jps -ml
23126 kess-hr-proxy-uat-1.0.jar
23695 kess-hr-proxy-sim-1.0.jar

$ strace -t -fp 23126 > aaa.log 2>&1
$ vi aaa.log

正常访问如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
275 [pid 23135] 13:02:24 futex(0x7f87240f0854, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96308, tv_nsec=293387689}, 0xffffffff <unfinished ...>
276 [pid 23128] 13:02:24 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
277 [pid 23128] 13:02:24 futex(0x7f8724094e28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
278 [pid 23147] 13:02:24 <... epoll_wait resumed>[{EPOLLIN, {u32=34, u64=34}}], 4096, -1) = 1
279 [pid 23128] 13:02:24 <... futex resumed>) = 0
280 [pid 23147] 13:02:24 accept4(34, <unfinished ...>
281 [pid 23128] 13:02:24 futex(0x7f8724094e54, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96309, tv_nsec=287104147}, 0xffffffff <unfinished ...>
282 [pid 23147] 13:02:24 <... accept4 resumed>{sa_family=AF_INET6, sin6_port=htons(55220), inet_pton(AF_INET6, "::ffff:172.16.160.12", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [128->28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 203
283 [pid 23147] 13:02:24 getsockname(203, {sa_family=AF_INET6, sin6_port=htons(18092), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [1 28->28]) = 0
284 [pid 23147] 13:02:24 getsockname(203, {sa_family=AF_INET6, sin6_port=htons(18092), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [1 28->28]) = 0
285 [pid 23147] 13:02:24 setsockopt(203, SOL_TCP, TCP_NODELAY, [1], 4) = 0
286 [pid 23147] 13:02:24 getsockopt(203, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
287 [pid 23147] 13:02:24 getsockopt(203, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
288 [pid 23147] 13:02:24 setsockopt(203, SOL_TCP, TCP_NODELAY, [1], 4) = 0
289 [pid 23147] 13:02:24 write(29, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
290 [pid 23167] 13:02:24 <... epoll_wait resumed>[{EPOLLIN, {u32=29, u64=29}}], 4096, -1) = 1
291 [pid 23147] 13:02:24 <... write resumed>) = 8
292 [pid 23167] 13:02:24 epoll_ctl(28, EPOLL_CTL_ADD, 203, {EPOLLRDHUP|EPOLLET, {u32=203, u64=203}} <unfinished ...>
293 [pid 23147] 13:02:24 accept4(34, <unfinished ...>
294 [pid 23167] 13:02:24 <... epoll_ctl resumed>) = 0
295 [pid 23147] 13:02:24 <... accept4 resumed>0x7f8714bfa220, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
296 [pid 23167] 13:02:24 epoll_ctl(28, EPOLL_CTL_MOD, 203, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=203, u64=203}} <unfinished ...>
297 [pid 23147] 13:02:24 epoll_wait(21, <unfinished ...>
298 [pid 23167] 13:02:24 <... epoll_ctl resumed>) = 0
299 [pid 23167] 13:02:24 epoll_wait(28, <unfinished ...>
300 [pid 23135] 13:02:24 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
301 [pid 23135] 13:02:24 futex(0x7f87240f0828, FUTEX_WAKE_PRIVATE, 1) = 0
302 [pid 23135] 13:02:24 futex(0x7f87240f0854, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96308, tv_nsec=343817671}, 0xffffffff <unfinished ...>
303 [pid 23167] 13:02:24 <... epoll_wait resumed>[{EPOLLIN, {u32=203, u64=203}}], 4096, -1) = 1
304 [pid 23167] 13:02:24 read(203, "POST /WebServiceProxy/ProxyServi"..., 1024) = 542
305 [pid 23167] 13:02:24 write(1, "2020-09-02 13:02:24.464 INFO 23"..., 288) = 288
306 [pid 23167] 13:02:24 write(7, "2020-09-02 13:02:24.464 INFO 23"..., 288) = 288
307 [pid 23167] 13:02:24 read(12, "+\245\216\376\374/\306\30\214 \335\250\332\217Q\206\22\v9\30c\366\212(+\322]\311\322\0224\236", 32) = 32
308 [pid 23167] 13:02:24 socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 204
309 [pid 23167] 13:02:24 setsockopt(204, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0
310 [pid 23167] 13:02:24 fcntl(204, F_GETFL) = 0x2 (flags O_RDWR)
311 [pid 23167] 13:02:24 fcntl(204, F_SETFL, O_RDWR|O_NONBLOCK) = 0
312 [pid 23167] 13:02:24 connect(204, {sa_family=AF_INET6, sin6_port=htons(8080), inet_pton(AF_INET6, "::ffff:10.198.197.71", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress)
313 [pid 23167] 13:02:24 poll([{fd=204, events=POLLOUT}], 1, 600000 <unfinished ...>
314 [pid 23135] 13:02:24 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
315 [pid 23135] 13:02:24 futex(0x7f87240f0828, FUTEX_WAKE_PRIVATE, 1) = 0
316 [pid 23135] 13:02:24 futex(0x7f87240f0854, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96308, tv_nsec=394348286}, 0xffffffff <unfinished ...>
317 [pid 23167] 13:02:24 <... poll resumed>) = 1 ([{fd=204, revents=POLLOUT}])
318 [pid 23167] 13:02:24 getsockopt(204, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
319 [pid 23167] 13:02:24 fcntl(204, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
320 [pid 23167] 13:02:24 fcntl(204, F_SETFL, O_RDWR) = 0
321 [pid 23167] 13:02:24 getsockname(204, {sa_family=AF_INET6, sin6_port=htons(33814), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [2 8]) = 0
322 [pid 23167] 13:02:24 sendto(204, "POST /kess/services/KessService?"..., 1104, 0, NULL, 0) = 1104
323 [pid 23167] 13:02:24 poll([{fd=204, events=POLLIN|POLLERR}], 1, 600000 <unfinished ...>
324 [pid 23135] 13:02:24 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
325 [pid 23135] 13:02:24 futex(0x7f87240f0828, FUTEX_WAKE_PRIVATE, 1) = 0
326 [pid 23135] 13:02:24 futex(0x7f87240f0854, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96308, tv_nsec=444789879}, 0xffffffff <unfinished ...>
327 [pid 23167] 13:02:24 <... poll resumed>) = 1 ([{fd=204, revents=POLLIN}])
328 [pid 23167] 13:02:24 recvfrom(204, "HTTP/1.1 200 OK\r\nServer: Apache-"..., 8192, 0, NULL, NULL) = 1368
329 [pid 23167] 13:02:24 ioctl(204, FIONREAD, [0]) = 0
330 [pid 23167] 13:02:24 poll([{fd=204, events=POLLIN|POLLERR}], 1, 600000) = 1 ([{fd=204, revents=POLLIN}])
331 [pid 23167] 13:02:24 recvfrom(204, "0001&lt;/ID_ZIP_CODE&gt;&lt;ID_A"..., 8192, 0, NULL, NULL) = 1368
332 [pid 23167] 13:02:24 ioctl(204, FIONREAD, [0]) = 0
333 [pid 23167] 13:02:24 poll([{fd=204, events=POLLIN|POLLERR}], 1, 600000) = 1 ([{fd=204, revents=POLLIN}])
334 [pid 23167] 13:02:24 recvfrom(204, ";&lt;ENTERPRISE_LEVEL&gt;&lt;/EN"..., 8192, 0, NULL, NULL) = 1368
335 [pid 23167] 13:02:24 ioctl(204, FIONREAD, [139]) = 0
336 [pid 23167] 13:02:24 poll([{fd=204, events=POLLIN|POLLERR}], 1, 600000) = 1 ([{fd=204, revents=POLLIN}])
337 [pid 23167] 13:02:24 recvfrom(204, ";/row&gt;&lt;/record&gt;&lt;/res"..., 8192, 0, NULL, NULL) = 139
338 [pid 23167] 13:02:24 ioctl(204, FIONREAD, [0]) = 0
339 [pid 23167] 13:02:24 poll([{fd=204, events=POLLIN|POLLERR}], 1, 600000) = 1 ([{fd=204, revents=POLLIN}])
340 [pid 23167] 13:02:24 recvfrom(204, "", 8192, 0, NULL, NULL) = 0
341 [pid 23167] 13:02:24 dup2(36, 204) = 204
342 [pid 23167] 13:02:24 close(204) = 0
343 [pid 23167] 13:02:24 write(1, "2020-09-02 13:02:24.571 INFO 23"..., 127) = 127
344 [pid 23167] 13:02:24 write(7, "2020-09-02 13:02:24.571 INFO 23"..., 127) = 127
345 [pid 23167] 13:02:24 write(1, "2020-09-02 13:02:24.574 INFO 23"..., 1743) = 1743
346 [pid 23167] 13:02:24 write(7, "2020-09-02 13:02:24.574 INFO 23"..., 1743) = 1743
347 [pid 23167] 13:02:24 writev(203, [{iov_base="HTTP/1.1 200 OK\r\nContent-Type: a"..., iov_len=87}, {iov_base="{\"result\":{\"flag\":\"1\",\"prompt\":\""..., iov_len=1573}], 2) = 1660
348 [pid 23167] 13:02:24 epoll_wait(28, <unfinished ...>
349 [pid 23135] 13:02:24 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
350 [pid 23135] 13:02:24 futex(0x7f87240f0828, FUTEX_WAKE_PRIVATE, 1) = 0
351 [pid 23135] 13:02:24 futex(0x7f87240f0854, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96308, tv_nsec=495260383}, 0xffffffff <unfinished ...>
352 [pid 23167] 13:02:24 <... epoll_wait resumed>[{EPOLLIN|EPOLLRDHUP, {u32=203, u64=203}}], 4096, -1) = 1
353 [pid 23167] 13:02:24 read(203, "", 1024) = 0
354 [pid 23167] 13:02:24 getsockopt(203, SOL_SOCKET, SO_LINGER, {l_onoff=0, l_linger=0}, [8]) = 0
355 [pid 23167] 13:02:24 epoll_ctl(28, EPOLL_CTL_DEL, 203, 0x7f86ea5eac90) = 0
356 [pid 23167] 13:02:24 close(203) = 0
357 [pid 23167] 13:02:24 epoll_wait(28, <unfinished ...>
358 [pid 23135] 13:02:24 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
359 [pid 23135] 13:02:24 futex(0x7f87240f0828, FUTEX_WAKE_PRIVATE, 1) = 0
360 [pid 23135] 13:02:24 futex(0x7f87240f0854, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96308, tv_nsec=545701725}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)
361 [pid 23135] 13:02:24 futex(0x7f87240f0828, FUTEX_WAKE_PRIVATE, 1) = 0

访问超时调用如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
817 [pid 23147] 13:02:32 <... epoll_wait resumed>[{EPOLLIN, {u32=34, u64=34}}], 4096, -1) = 1
818 [pid 23147] 13:02:32 accept4(34, {sa_family=AF_INET6, sin6_port=htons(55235), inet_pton(AF_INET6, "::ffff:172.16.160.12", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [128->28 ], SOCK_CLOEXEC|SOCK_NONBLOCK) = 147
819 [pid 23147] 13:02:32 getsockname(147, {sa_family=AF_INET6, sin6_port=htons(18092), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [1 28->28]) = 0
820 [pid 23147] 13:02:32 getsockname(147, {sa_family=AF_INET6, sin6_port=htons(18092), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [1 28->28]) = 0
821 [pid 23147] 13:02:32 setsockopt(147, SOL_TCP, TCP_NODELAY, [1], 4) = 0
822 [pid 23147] 13:02:32 getsockopt(147, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
823 [pid 23147] 13:02:32 getsockopt(147, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
824 [pid 23147] 13:02:32 setsockopt(147, SOL_TCP, TCP_NODELAY, [1], 4) = 0
825 [pid 23147] 13:02:32 accept4(34, 0x7f8714bfa220, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
826 [pid 23147] 13:02:32 epoll_wait(21, <unfinished ...>
827 [pid 23135] 13:02:32 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
828 [pid 23135] 13:02:32 futex(0x7f87240f0828, FUTEX_WAKE_PRIVATE, 1) = 0
829 [pid 23135] 13:02:32 futex(0x7f87240f0854, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=96316, tv_nsec=820377335}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)

涉及线程

23135、23147、23167

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ printf '%x %x %x \n' 23135 23147 23167
5a5f 5a6b 5a7f
$ jstack 23126 | grep -A 50 5a5f
"VM Periodic Task Thread" os_prio=0 tid=0x00007f87240ef800 nid=0x5a5f waiting on condition

JNI global references: 1555

$ jstack 23126 | grep -A 50 5a6b
"reactor-http-epoll-1" #11 daemon prio=5 os_prio=0 tid=0x00007f87251a0800 nid=0x5a6b runnable [0x00007f8714bfa000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:148)
at io.netty.channel.epoll.Native.epollWait(Native.java:141)
at io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange(EpollEventLoop.java:290)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:347)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)

$ jstack 23126 | grep -A 50 5a7f
"reactor-http-epoll-3" #16 daemon prio=5 os_prio=0 tid=0x00007f86e4555800 nid=0x5a7f runnable [0x00007f86ea5eb000]
java.lang.Thread.State: RUNNABLE
at io.netty.channel.epoll.Native.epollWait(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:148)
at io.netty.channel.epoll.Native.epollWait(Native.java:141)
at io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange(EpollEventLoop.java:290)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:347)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)

EAGAIN (Resource temporarily unavailable)

1
accept4(34, 0x7f8714bfa220, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)

在linux进行非阻塞的socket接收数据时经常出现Resource temporarily unavailable,errno代码为11(EAGAIN),这是什么意思?

这表明你在非阻塞模式下调用了阻塞操作,在该操作没有完成就返回这个错误,这个错误不会破坏socket的同步,不用管它,下次循环接着recv就可以。对非阻塞socket而言,EAGAIN不是一种错误。在VxWorks和Windows上,EAGAIN的名字叫做EWOULDBLOCK。

另外,如果出现EINTR即errno为4,错误描述Interrupted system call,操作也应该继续。

最后,如果recv的返回值为0,那表明连接已经断开,我们的接收操作也应该结束。

1
2
strace -t -fp 23126 2>&1 | egrep 'accept|poll|read|write|connect|send|recv|ctl|sock'
strace -t -fp 23126 2>&1 | grep -v 'futex'

正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
[root@appweb-docker-002 ~]# strace -t -fp 23126 2>&1 | grep -v 'futex'
strace: Process 23126 attached with 28 threads
[pid 87892] 10:37:34 epoll_wait(77, <unfinished ...>
[pid 80941] 10:37:34 restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid 80912] 10:37:34 epoll_wait(84, <unfinished ...>
[pid 80911] 10:37:34 epoll_wait(71, <unfinished ...>
[pid 23167] 10:37:34 epoll_wait(28, <unfinished ...>
[pid 23164] 10:37:34 epoll_wait(25, <unfinished ...>
[pid 23147] 10:37:34 epoll_wait(21, <unfinished ...>
[pid 23169] 10:37:34 epoll_wait(31, <unfinished ...>
[pid 80928] 10:37:34 epoll_wait(74, <unfinished ...>
[pid 79173] 10:37:34 accept(40, <unfinished ...>
[pid 23135] 10:37:34 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 23128] 10:37:35 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 23132] 10:37:36 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 23133] 10:37:36 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 23147] 10:37:47 <... epoll_wait resumed>[{EPOLLIN, {u32=34, u64=34}}], 4096, -1) = 1
[pid 23147] 10:37:47 accept4(34, {sa_family=AF_INET6, sin6_port=htons(58662), inet_pton(AF_INET6, "::ffff:172.16.208.224", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [128->28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 39
[pid 23147] 10:37:47 getsockname(39, {sa_family=AF_INET6, sin6_port=htons(18092), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [128->28]) = 0
[pid 23147] 10:37:47 getsockname(39, {sa_family=AF_INET6, sin6_port=htons(18092), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [128->28]) = 0
[pid 23147] 10:37:47 setsockopt(39, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 23147] 10:37:47 getsockopt(39, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
[pid 23147] 10:37:47 getsockopt(39, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
[pid 23147] 10:37:47 setsockopt(39, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 23147] 10:37:47 write(26, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
[pid 23164] 10:37:47 <... epoll_wait resumed>[{EPOLLIN, {u32=26, u64=26}}], 4096, -1) = 1
[pid 23147] 10:37:47 <... write resumed>) = 8
[pid 23164] 10:37:47 epoll_ctl(25, EPOLL_CTL_ADD, 39, {EPOLLRDHUP|EPOLLET, {u32=39, u64=39}} <unfinished ...>
[pid 23147] 10:37:47 accept4(34, <unfinished ...>
[pid 23164] 10:37:47 <... epoll_ctl resumed>) = 0
[pid 23147] 10:37:47 <... accept4 resumed>0x7f8714bfa220, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23164] 10:37:47 epoll_ctl(25, EPOLL_CTL_MOD, 39, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=39, u64=39}} <unfinished ...>
[pid 23147] 10:37:47 epoll_wait(21, <unfinished ...>
[pid 23164] 10:37:47 <... epoll_ctl resumed>) = 0
[pid 23164] 10:37:47 epoll_wait(25, [{EPOLLIN, {u32=39, u64=39}}], 4096, -1) = 1
[pid 23164] 10:37:47 read(39, "POST /WebServiceProxy/ProxyServi"..., 1024) = 542
[pid 23164] 10:37:47 write(1, "2020-09-03 10:37:47.572 INFO 23"..., 288) = 288
[pid 23164] 10:37:47 write(7, "2020-09-03 10:37:47.572 INFO 23"..., 288) = 288
[pid 23164] 10:37:47 read(12, "\216\245\275\331\315\27g8\327BW\17\202;\365E\20X`9\17\261<\270\227\347\0167\2736\222a", 32) = 32
[pid 23164] 10:37:47 socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 41
[pid 23164] 10:37:47 setsockopt(41, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0
[pid 23164] 10:37:47 fcntl(41, F_GETFL) = 0x2 (flags O_RDWR)
[pid 23164] 10:37:47 fcntl(41, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 23164] 10:37:47 connect(41, {sa_family=AF_INET6, sin6_port=htons(8080), inet_pton(AF_INET6, "::ffff:10.198.197.71", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress)
[pid 23164] 10:37:47 poll([{fd=41, events=POLLOUT}], 1, 600000 <unfinished ...>
[pid 23164] 10:37:47 <... poll resumed>) = 1 ([{fd=41, revents=POLLOUT}])
[pid 23164] 10:37:47 getsockopt(41, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 23164] 10:37:47 fcntl(41, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 23164] 10:37:47 fcntl(41, F_SETFL, O_RDWR) = 0
[pid 23164] 10:37:47 getsockname(41, {sa_family=AF_INET6, sin6_port=htons(35318), inet_pton(AF_INET6, "::ffff:172.16.247.230", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, [28]) = 0
[pid 23164] 10:37:47 sendto(41, "POST /kess/services/KessService?"..., 1104, 0, NULL, 0) = 1104
[pid 23164] 10:37:47 poll([{fd=41, events=POLLIN|POLLERR}], 1, 600000 <unfinished ...>
[pid 23164] 10:37:47 <... poll resumed>) = 1 ([{fd=41, revents=POLLIN}])
[pid 23164] 10:37:47 recvfrom(41, <unfinished ...>
[pid 23164] 10:37:47 <... recvfrom resumed>"HTTP/1.1 200 OK\r\nServer: Apache-"..., 8192, 0, NULL, NULL) = 1368
[pid 23164] 10:37:47 read(12, "\320\22c_\334-Q\222\237\214\202\267\300\2314\315\0_f\211\273\231\340\2571\362?\23\312T\214\323", 32) = 32
[pid 23164] 10:37:47 ioctl(41, FIONREAD, [1368]) = 0
[pid 23164] 10:37:47 poll([{fd=41, events=POLLIN|POLLERR}], 1, 600000) = 1 ([{fd=41, revents=POLLIN}])
[pid 23164] 10:37:47 recvfrom(41, "0001&lt;/ID_ZIP_CODE&gt;&lt;ID_A"..., 8192, 0, NULL, NULL) = 1368
[pid 23164] 10:37:47 ioctl(41, FIONREAD, [1507]) = 0
[pid 23164] 10:37:47 poll([{fd=41, events=POLLIN|POLLERR}], 1, 600000) = 1 ([{fd=41, revents=POLLIN}])
[pid 23164] 10:37:47 recvfrom(41, ";&lt;ENTERPRISE_LEVEL&gt;&lt;/EN"..., 8192, 0, NULL, NULL) = 1507
[pid 23164] 10:37:47 ioctl(41, FIONREAD, [0]) = 0
[pid 23164] 10:37:47 poll([{fd=41, events=POLLIN|POLLERR}], 1, 600000) = 1 ([{fd=41, revents=POLLIN}])
[pid 23164] 10:37:47 recvfrom(41, "", 8192, 0, NULL, NULL) = 0
[pid 23164] 10:37:47 dup2(36, 41) = 41
[pid 23164] 10:37:47 close(41) = 0
[pid 23164] 10:37:47 write(1, "2020-09-03 10:37:47.686 INFO 23"..., 127) = 127
[pid 23164] 10:37:47 write(7, "2020-09-03 10:37:47.686 INFO 23"..., 127) = 127
[pid 23164] 10:37:47 write(1, "2020-09-03 10:37:47.688 INFO 23"..., 1743) = 1743
[pid 23164] 10:37:47 write(7, "2020-09-03 10:37:47.688 INFO 23"..., 1743) = 1743
[pid 23164] 10:37:47 writev(39, [{iov_base="HTTP/1.1 200 OK\r\nContent-Type: a"..., iov_len=87}, {iov_base="{\"result\":{\"flag\":\"1\",\"prompt\":\""..., iov_len=1573}], 2) = 1660
[pid 23164] 10:37:47 epoll_wait(25, [{EPOLLIN|EPOLLRDHUP, {u32=39, u64=39}}], 4096, -1) = 1
[pid 23164] 10:37:47 read(39, "", 1024) = 0
[pid 23164] 10:37:47 getsockopt(39, SOL_SOCKET, SO_LINGER, {l_onoff=0, l_linger=0}, [8]) = 0
[pid 23164] 10:37:47 epoll_ctl(25, EPOLL_CTL_DEL, 39, 0x7f86ebaecd10) = 0
[pid 23164] 10:37:47 close(39) = 0
[pid 23164] 10:37:47 epoll_wait(25, <unfinished ...>
[pid 23128] 10:37:48 mprotect(0x7f872bd70000, 4096, PROT_READ) = 0
[pid 23128] 10:37:48 mprotect(0x7f872bd70000, 4096, PROT_READ|PROT_WRITE) = 0
[pid 23128] 10:37:48 mprotect(0x7f872bd71000, 4096, PROT_NONE) = 0
[pid 23128] 10:37:48 mprotect(0x7f872bd71000, 4096, PROT_READ) = 0
[pid 80913] 10:38:21 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 80914] 10:38:21 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)

基础知识扫盲

strace命令

strace命令是一个集诊断、调试、统计与一体的工具,我们可以使用strace对应用的系统调用和信号传递的跟踪结果来对应用进行分析,以达到解决问题或者是了解应用工作过程的目的。当然strace与专业的调试工具比如说gdb之类的是没法相比的,因为它不是一个专业的调试器。

strace的最简单的用法就是执行一个指定的命令,在指定的命令结束之后它也就退出了。在命令执行的过程中,strace会记录和解析命令进程的所有系统调用以及这个进程所接收到的所有的信号值。

语法

1
2
3
4
5
6
7
strace  [  -dffhiqrtttTvxx  ] [ -acolumn ] [ -eexpr ] ...
[ -ofile ] [-ppid ] ... [ -sstrsize ] [ -uusername ]
[ -Evar=val ] ... [ -Evar ]...
[ command [ arg ... ] ]

strace -c [ -eexpr ] ... [ -Ooverhead ] [ -Ssortby ]
[ command [ arg... ] ]

选项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-c 统计每一系统调用的所执行的时间,次数和出错的次数等.
-d 输出strace关于标准错误的调试信息.
-f 跟踪由fork调用所产生的子进程.
-ff 如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
-F 尝试跟踪vfork调用.在-f时,vfork不被跟踪.
-h 输出简要的帮助信息.
-i 输出系统调用的入口指针.
-q 禁止输出关于脱离的消息.
-r 打印出相对时间关于,,每一个系统调用.
-t 在输出中的每一行前加上时间信息.
-tt 在输出中的每一行前加上时间信息,微秒级.
-ttt 微秒级输出,以秒了表示时间.
-T 显示每一调用所耗的时间.
-v 输出所有的系统调用.一些调用关于环境变量,状态,输入输出等调用由于使用频繁,默认不输出.
-V 输出strace的版本信息.
-x 以十六进制形式输出非标准字符串
-xx 所有字符串以十六进制形式输出.
-a column 设置返回值的输出位置.默认 为40.
-e expr 指定一个表达式,用来控制如何跟踪.格式:[qualifier=][!]value1[,value2]...
qualifier只能是 trace,abbrev,verbose,raw,signal,read,write其中之一.value是用来限定的符号或数字.默认的 qualifier是 trace.感叹号是否定符号.例如:-eopen等价于 -e trace=open,表示只跟踪open调用.而-etrace!=open 表示跟踪除了open以外的其他调用.有两个特殊的符号 all 和 none. 注意有些shell使用!来执行历史记录里的命令,所以要使用\\.
-e trace=set 只跟踪指定的系统 调用.例如:-e trace=open,close,rean,write表示只跟踪这四个系统调用.默认的为set=all.
-e trace=file 只跟踪有关文件操作的系统调用.
-e trace=process 只跟踪有关进程控制的系统调用.
-e trace=network 跟踪与网络有关的所有系统调用.
-e strace=signal 跟踪所有与系统信号有关的 系统调用
-e trace=ipc 跟踪所有与进程通讯有关的系统调用
-e abbrev=set 设定strace输出的系统调用的结果集.-v 等与 abbrev=none.默认为abbrev=all.
-e raw=set 将指定的系统调用的参数以十六进制显示.
-e signal=set 指定跟踪的系统信号.默认为all.如 signal=!SIGIO(或者signal=!io),表示不跟踪SIGIO信号.
-e read=set 输出从指定文件中读出 的数据.例如: -e read=3,5
-e write=set 输出写入到指定文件中的数据.
-o filename 将strace的输出写入文件filename
-p pid 跟踪指定的进程pid.
-s strsize 指定输出的字符串的最大长度.默认为32.文件名一直全部输出.
-u username 以username的UID和GID执行被跟踪的命令

epoll使用详解

EPOLL 的API用来执行类似poll()的任务。能够用于检测在多个文件描述符中任何IO可用的情况。Epoll API可以用于边缘触发(edge-triggered)和水平触发(level-triggered), 同时epoll可以检测更多的文件描述符。以下的系统调用函数提供了创建和管理epoll实例:

  • epoll_create() 可以创建一个epoll实例并返回相应的文件描述符(epoll_create1() 扩展了epoll_create() 的功能)。
  • 注册相关的文件描述符使用epoll_ctl()
  • epoll_wait() 可以用于等待IO事件。如果当前没有可用的事件,这个函数会阻塞调用线程。

epoll_create 创建epoll

1
2
int epoll_create(int size);
int epoll_create1(int flags);

创建一个epoll的句柄,size用来告诉内核这个监听的数目一共有多大。这个参数不同于select()中的第一个参数,给出最大监听的fd+1的值。需要注意的是,当创建好epoll句柄后,它就是会占用一个fd值,在linux下如果查看/proc/进程id/fd/,是能够看到这个fd的,所以在使用完epoll后,必须调用close()关闭,否则可能导致fd被耗尽。

epoll_ctl 设置epoll事件

1
2
3
#include <sys/epoll.h>

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

这个系统调用能够控制给定的文件描述符epfd\指向的epoll实例,op\是添加事件的类型,fd\是目标文件描述符。

有效的op值有以下几种:

  • EPOLL_CTL_ADD 在epfd\中注册指定的fd文件描述符并能把event\fd\关联起来。
  • EPOLL_CTL_MOD 改变* fd*和*evetn*之间的联系。
  • EPOLL_CTL_DEL 从指定的epfd\中删除fd\文件描述符。在这种模式中event\是被忽略的,并且为可以等于NULL。

event\这个参数是用于关联制定的fd\文件描述符的。它的定义如下:

1
2
3
4
5
6
7
8
9
10
11
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;

struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};

events\这个参数是一个字节的掩码构成的。下面是可以用的事件:

  • EPOLLIN - 当关联的文件可以执行 read ()操作时。
  • EPOLLOUT - 当关联的文件可以执行 write ()操作时。
  • EPOLLRDHUP - (从 linux 2.6.17 开始)当socket关闭的时候,或者半关闭写段的(当使用边缘触发的时候,这个标识在写一些测试代码去检测关闭的时候特别好用)
  • EPOLLPRI - 当 read ()能够读取紧急数据的时候。
  • EPOLLERR - 当关联的文件发生错误的时候,epoll_wait() 总是会等待这个事件,并不是需要必须设置的标识。
  • EPOLLHUP - 当指定的文件描述符被挂起的时候。epoll_wait() 总是会等待这个事件,并不是需要必须设置的标识。当socket从某一个地方读取数据的时候(管道或者socket),这个事件只是标识出这个已经读取到最后了(EOF)。所有的有效数据已经被读取完毕了,之后任何的读取都会返回0(EOF)。
  • EPOLLET - 设置指定的文件描述符模式为边缘触发,默认的模式是水平触发。
  • EPOLLONESHOT - (从 linux 2.6.17 开始)设置指定文件描述符为单次模式。这意味着,在设置后只会有一次从epoll_wait() 中捕获到事件,之后你必须要重新调用 epoll_ctl() 重新设置。

返回值:\如果成功,返回0。如果失败,会返回-1, errno\将会被设置

有以下几种错误:

  • EBADF - epfd\ 或者 fd\ 是无效的文件描述符。
  • EEXIST - op\是EPOLL_CTL_ADD,同时 fd\ 在之前,已经被注册到epoll中了。
  • EINVAL - epfd\不是一个epoll描述符。或者fd\epfd\相同,或者op\参数非法。
  • ENOENT - op\是EPOLL_CTL_MOD或者EPOLL_CTL_DEL,但是fd\还没有被注册到epoll上。
  • ENOMEM - 内存不足。
  • EPERM - 目标的fd\不支持epoll。

epoll_wait 等待epoll事件

1
2
3
4
5
6
7
8
#include <sys/epoll.h>

int epoll_wait(int epfd, struct epoll_event *events,
int maxevents, int timeout);

int epoll_pwait(int epfd, struct epoll_event *events,
int maxevents, int timeout,
const sigset_t *sigmask);

epoll_wait 这个系统调用是用来等待epfd\中的事件。events\指向调用者可以使用的事件的内存区域。maxevents\告知内核有多少个events,必须要大于0.

timeout\这个参数是用来制定epoll_wait 会阻塞多少毫秒,会一直阻塞到下面几种情况:

  1. 一个文件描述符触发了事件。
  2. 被一个信号处理函数打断,或者timeout超时。

timeout\等于-1的时候这个函数会无限期的阻塞下去,当timeout\等于0的时候,就算没有任何事件,也会立刻返回。

struct epoll_event 如下定义:

1
2
3
4
5
6
7
8
9
10
11
typedef union epoll_data {
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;

struct epoll_event {
uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};

每次epoll_wait() 返回的时候,会包含用户在epoll_ctl中设置的events。

还有一个系统调用epoll_pwait ()。epoll_pwait()和epoll_wait ()的关系就像select()和 pselect()的关系。和pselect()一样,epoll_pwait()可以让应用程序安全的等待知道某一个文件描述符就绪或者捕捉到信号。

下面的 epoll_pwait () 调用:

1
ready = epoll_pwait(epfd, &events, maxevents, timeout, &sigmask);

在内部等同于:

1
2
3
pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
ready = epoll_wait(epfd, &events, maxevents, timeout);
pthread_sigmask(SIG_SETMASK, &origmask, NULL);

如果 sigmask\为NULL, epoll_pwait()等同于epoll_wait()。

返回值:\有多少个IO事件已经准备就绪。如果返回0说明没有IO事件就绪,而是timeout超时。遇到错误的时候,会返回-1,并设置 errno。

有以下几种错误:

  • EBADF - epfd\是无效的文件描述符
  • EFAULT - 指针events\指向的内存没有访问权限
  • EINTR - 这个调用被信号打断。
  • EINVAL - epfd\不是一个epoll的文件描述符,或者maxevents\小于等于0

关于ET、LT两种工作模式

边缘触发(edge-triggered 简称ET)和水平触发(level-triggered 简称LT):

epoll的事件派发接口可以运行在两种模式下:边缘触发(edge-triggered)和水平触发(level-triggered),两种模式的区别请看下面,我们先假设下面的情况:

  1. 一个代表管道读取的文件描述符已经注册到epoll实例上了。
  2. 在管道的写入端写入了2kb的数据。
  3. epoll_wait 返回一个可用的rfd文件描述符。
  4. 从管道读取了1kb的数据。
  5. 调用epoll_wait 完成。

如果rfd被设置了ET,在调用完第五步的epool_wait 后会被挂起,尽管在缓冲区还有可以读取的数据,同时另外一段的管道还在等待发送完毕的反馈。这是因为ET模式下只有文件描述符发生改变的时候,才会派发事件。所以第五步操作,可能会去等待已经存在缓冲区的数据。在上面的例子中,一个事件在第二步被创建,再第三步中被消耗,由于第四步中没有读取完缓冲区,第五步中的epoll_wait可能会一直被阻塞下去。

下面情况下推荐使用ET模式:

  1. 使用非阻塞的IO。
  2. epoll_wait() 只需要在read或者write返回的时候。

相比之下,当我们使用LT的时候(默认),epoll会比poll更简单更快速,而且我们可以使用在任何一个地方。