【原创】packetbeat 之“ERR Failed to read integer reply: Expected digit"”问题

问题描述

错误信息如下:

➜  packetbeat git:(master) ✗ ./packetbeat -c ./packetbeat.yml -e -I redis_xg-bjdev-rediscluster-1_prot-7101_20161222110711_20161222110721.pcap -E packetbeat.protocols.redis.ports=7101 -t
2017/01/11 09:42:37.662148 logp.go:219: INFO Metrics logging every 30s
2017/01/11 09:42:37.661865 beat.go:267: INFO Home path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat] Config path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat] Data path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat/data] Logs path: [/Users/sunfei/workspace/GIT/IdeaProjects/src/github.com/moooofly/beats/packetbeat/logs]
2017/01/11 09:42:37.662756 beat.go:177: INFO Setup Beat: packetbeat; Version: 6.0.0-alpha1
2017/01/11 09:42:37.663073 file.go:45: INFO File output path set to: ./logs
2017/01/11 09:42:37.663096 file.go:46: INFO File output base filename set to: packetbeat
2017/01/11 09:42:37.663105 file.go:49: INFO Rotate every bytes set to: 10240000
2017/01/11 09:42:37.663110 file.go:53: INFO Number of files set to: 7
2017/01/11 09:42:37.663140 outputs.go:106: INFO Activated file as output plugin.
2017/01/11 09:42:37.663409 publish.go:291: INFO Publisher name: sunfeideMacBook-Pro.local
2017/01/11 09:42:37.663797 async.go:63: INFO Flush Interval set to: -1s
2017/01/11 09:42:37.663817 async.go:64: INFO Max Bulk Size set to: -1
2017/01/11 09:42:37.664065 procs.go:79: INFO Process matching disabled
2017/01/11 09:42:37.664290 protos.go:89: INFO registered protocol plugin: amqp
2017/01/11 09:42:37.664309 protos.go:89: INFO registered protocol plugin: http
2017/01/11 09:42:37.664315 protos.go:89: INFO registered protocol plugin: MysqL
2017/01/11 09:42:37.664320 protos.go:89: INFO registered protocol plugin: redis
2017/01/11 09:42:37.665784 beat.go:207: INFO packetbeat start running.
2017/01/11 09:47:45.218365 redis_parse.go:306: ERR Failed to read integer reply: Expected digit
value should be Int,but we get -> // 后面的内容输出为自己增加的调试信息
"{\"id\":153,\"email\":\"xiaoxxx@xxxxxxx\",\"work_code\":\"Exxxxxxx\",\"mobile\":186xxxx0925,\"name\":\"谢xxx\",\"walle_id\":77287,\"status\":6,\"pinyin_name\":\"xxj\",\"sex\":1,\"security_level\":60,\"certificate_type\":0,\"certificate_number\":\"420683198908113733\",\"created_at\":1431550029000,\"updated_at\":1449228237000,\"nchr_id\":\"0001A910000000002EQP\"}}"

2017/01/11 09:42:38.430211 sniffer.go:384: INFO Input finish. Processed 40644 packets. Have a nice day!
2017/01/11 09:42:38.430657 util.go:48: INFO flows worker loop stopped
2017/01/11 09:42:38.430709 logp.go:245: INFO Total non-zero values:  libbeat.publisher.published_events=8080 tcp.dropped_because_of_gaps=15 redis.unmatched_responses=15
2017/01/11 09:42:38.430722 logp.go:246: INFO Uptime: 909.957024ms
2017/01/11 09:42:38.430728 beat.go:211: INFO packetbeat stopped.
➜  packetbeat git:(master) ✗

能够看到错误信息为

2017/01/11 09:47:45.218365 redis_parse.go:306: ERR Failed to read integer reply: Expected digit
value should be Int,but we get ->
"{\"id\":153,\"nchr_id\":\"0001A910000000002EQP\"}}"

问题原因

调用 HMGET 进行多个数据查询时,且应答包含的数据量比较大时,则会遇到 TCP 协议层面的分包回复;若在应答回复的过程中,出现丢包,则会导致数据解析出错;

结论增强:经过深入研究发现,应答内容数据量大并不是充分条件,而是必要条件;导致数据解析出错的根本原因和 TCP 分包时的位置有关,因为发现有些情况下的分包数据并不会导致解析错误的发生);

具体分析如下:

37642 号包:发起 HMGET 请求,查询 key 为 hr-e0acd6e0-4c21-4917-a676-c4fd8094f2aa:user 下的 8 个 filed 的值;

*10
$5
HMGET
$44
hr-e0acd6e0-4c21-4917-a676-c4fd8094f2aa:user
$5
18370
$5
52708
$1
0
$5
18370
$4
1117
$2
13
$3
153
$3
147

37642 号包提示 [TCP PrevIoUs segment not captured] 信息,参考 HMGET 请求的内容,结合当前包的数据内容可以看出,该包为 HMGET 应答的最后两个 field 的部分内容(并且截断位置也应该是无规律的);

:{"id":153,"email":"xiaoxxxxx@xxxx","work_code":"Exxxxx","mobile":186xxxx0925,"name":".........","walle_id":77287,"status":6,"pinyin_name":"xxj","sex":1,"security_level":60,"certificate_type":0,"certificate_number":"420683198908113733","created_at":1431550029000,"updated_at":1449228237000,"nchr_id":"0001A910000000002EQP"}}
$519
{"userId":147,"userBuList":[3175],"tagsList":[],"userBuRoleDto":[{"id":81524,"bu_id":3175,"bu_name":"............BU","role_id":859,"role_name":".........","user_id":147,"user_name":"......"}],"user":{"id":147,"email":"xinxxxxx@xxxx","work_code":"Exxxx","mobile":186xxxx626,"name":"......","walle_id":56063,"pinyin_name":"jx","security_level":70,"certificate_number":"420106198511242510","created_at":1431550030000,"updated_at":1449228115000,"nchr_id":"0001A910000000002ERE"}}

37648 号包提示 [TCP Fast Retransmission] 信息,结合数据包内容可以看出,重传数据即为之前判定丢失的数据;

*8
$532
{"userId":18370,"userBuList":[4594],"userBuRoleDto":[{"id":120993,"bu_id":4594,"bu_name":"..................","role_id":924,"role_name":"......","user_id":18370,"user":{"id":18370,"email":"huixxxxx@xxxx","mobile":137xxxxx8281,"walle_id":23156752,"pinyin_name":"yh","security_level":20,"certificate_number":"130984199001023033","created_at":1438657893000,"updated_at":1449228019000,"nchr_id":"0001A910000000013PM5"}}
$536
{"userId":52708,"userBuList":[4704],"userBuRoleDto":[{"id":109320,"bu_id":4704,"bu_name":".....................","role_id":867,"user_id":52708,"user_name":"Neil"}],"user":{"id":52708,"email":"lianxxxxx@xxxxx","mobile":186xxxx6367,"name":"Neil","walle_id":123225472,"pinyin_name":"Neil","certificate_number":"370181198109040350","created_at":1476093008000,"updated_at":1477916894000,"nchr_id":"0001B61000000010WI0Q"}}
$-1
$532
{"userId":18370,"email":"hui.xxxx@xxxx","mobile":137xxxx8281,"nchr_id":"0001A910000000013PM5"}}
$533
{"userId":1117,"userBuList":[4596],"userBuRoleDto":[{"id":89811,"bu_id":4596,"role_id":923,"user_id":1117,"user_name":"........."}],"user":{"id":1117,"email":"xuxxxxxx@xxxxx","mobile":185xxxxx2164,"walle_id":1682514,"pinyin_name":"hxh","certificate_number":"640102199404171817","created_at":1431550294000,"updated_at":1463403636000,"nchr_id":"0001A910000000002GCC"}}
$512
{"userId":13,"userBuList":[104],"userBuRoleDto":[{"id":769,"bu_id":104,"bu_name":".........","role_id":865,"role_name":"CEO","user_id":13,"user":{"id":13,"email":"markxxxx@xxx","mobile":134xxxx0180,"walle_id":869105,"pinyin_name":"zxh","security_level":90,"certificate_number":"310103198504094033","created_at":1431550018000,"nchr_id":"0001A910000000002EGU"}}
$526
{"userId":153,"userBuList":[3174],"userBuRoleDto":[{"id":53306,"bu_id":3174,"bu_name":"............","role_id":922,"user_id":153,"user"

解决办法

这个是 packetbeat 在 redis 协议解析时的 bug ,目前尚未解决

其他

在官网论坛上的讨论

遗留问题:

  • 抓到的数据包显示:每个报文都会被重传一次,原因何在?
  • 触发快速重传的机制?
  • 导致丢包的原因?

相关文章

程序目录结构 简单实现,用户登录后返回一个jwt的token,下次请求带上token请求用户信息接口并返回信息...
本篇博客的主要内容是用go写一个简单的Proof-of-Work共识机制,不涉及到网络通信环节,只是一个本地的简...
简介 默克尔树(MerkleTree)是一种典型的二叉树结构,其主要特点为: 最下面的叶节点包含存储数据或其...
接下来学习并发编程, 并发编程是go语言最有特色的地方, go对并发编程是原生支持. goroutine是go中最近本...
先普及一下, 什么是广度优先搜索 广度优先搜索类似于树的层次遍历。从图中的某一顶点出发,遍历每一个顶...
第一天: 接口的定义和实现 第二天: 一. go语言是面向接口编程. 在学习继承的时候说过, go语言只有封装,...