- A+
问题背景
之前写过一篇《使用脚本收发 protobuf 协议数据 》,通过 pbjs 命令可以将 protobuf 二进制数据转换为 json:
> pbjs msg.proto --decode ProbeIpv6Response < response.bin { "selfAddr": { "addrV6": "2409:8900:7900:8f0d:ecd9:4aee:aa3:7ad", "portV6": 46066 }, "brosAddr": [ { "addrV6": "2409:8a34:4405:6624:5250:9d04:cf77:d", "portV6": 18720 }, { "addrV6": "2409:8a34:401a:4151:59e6:69b4:37ad:dea2", "portV6": 18679 }, { "addrV6": "2409:8a20:2a02:20c0:7d11:9a6b:6b51:a9bb", "portV6": 18824 }, { "addrV6": "2409:8a20:e0d:7773:50d4:93b0:680a:b555", "portV6": 18968 }, { "addrV6": "2409:8a44:5b20:edf2:7c09:a5e1:cdbf:69c6", "portV6": 18008 } ] }
反过来将 json 编码为二进制数据也没问题:
> pbjs msg.proto --encode ProbeIpv6Response < response.json > response2.bin > xxd response2.bin 00000000: 122b 0a25 3234 3039 3a38 3930 303a 3739 .+.%2409:8900:79 00000010: 3030 3a38 6630 643a 6563 6439 3a34 6165 00:8f0d:ecd9:4ae 00000020: 653a 6161 333a 3761 6410 f2e7 021a 2a0a e:aa3:7ad.....*. 00000030: 2432 3430 393a 3861 3334 3a34 3430 353a $2409:8a34:4405: 00000040: 3636 3234 3a35 3235 303a 3964 3034 3a63 6624:5250:9d04:c 00000050: 6637 373a 6410 a092 011a 2d0a 2732 3430 f77:d.....-.'240 00000060: 393a 3861 3334 3a34 3031 613a 3431 3531 9:8a34:401a:4151 00000070: 3a35 3965 363a 3639 6234 3a33 3761 643a :59e6:69b4:37ad: 00000080: 6465 6132 10f7 9101 1a2d 0a27 3234 3039 dea2.....-.'2409 00000090: 3a38 6132 303a 3261 3032 3a32 3063 303a :8a20:2a02:20c0: 000000a0: 3764 3131 3a39 6136 623a 3662 3531 3a61 7d11:9a6b:6b51:a 000000b0: 3962 6210 8893 011a 2c0a 2632 3430 393a 9bb.....,.&2409: 000000c0: 3861 3230 3a65 3064 3a37 3737 333a 3530 8a20:e0d:7773:50 000000d0: 6434 3a39 3362 303a 3638 3061 3a62 3535 d4:93b0:680a:b55 000000e0: 3510 9894 011a 2d0a 2732 3430 393a 3861 5.....-.'2409:8a 000000f0: 3434 3a35 6232 303a 6564 6632 3a37 6330 44:5b20:edf2:7c0 00000100: 393a 6135 6531 3a63 6462 663a 3639 6336 9:a5e1:cdbf:69c6 00000110: 10d8 8c01
编码生成的 response2.bin 与原始的 response.bin 完全一致。
然而后来在编码另一种消息格式的时候,重新生成的 bin 文件和原始文件有很大差异,导致不能通过 pbjs 将 json 转化为 binary 数据。
问题现象
为了说明白这个问题,先来看消息定义:
message common { required uint32 mem1 = 1; required uint32 mem2 = 2; required bytes mem3 = 3; required uint32 mem4 = 4; required uint64 mem5 = 5; optional uint32 mem6 = 6; optional bytes mem7 = 7; optional uint32 mem8 = 8; optional uint64 mem9 = 9; } message query_md5 { required common mema = 1; required uint32 memb = 2; required bytes memc = 3; required uint32 memd = 4; required uint64 meme = 5; repeated bytes memf = 6; }
出于协议安全考虑,这里字段全部使用 memxx 代替。下面是 proto 消息对应的原始数据:
> xxd tmp/resp.bin 0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a .7.....@...8...z 0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba ...g+...k .(... 0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32 ...0.:.2.2.101.2 0000030: 3740 0348 f0db 8883 0910 001a 1067 c607 7@.H.........g.. 0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028 !^G..%'-m...- .( 0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0 ....2..[.&G..... 0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96 K=.$8.2.1.D./2.. 0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca xek....`2..u.5.. 0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf .ot....j..2.(L.. 0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b 6..W...9.z}2.>. 00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210 C.b.......2..~2. 00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1 ...F...UR...l... 00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd 2..R..%0.kz.Ui.. 00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307 .2..33....h98.. 00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0 ....2..F..(tHj.. 00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832 ..oQ..2....y[..2 0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4 ...O`.H.2..F.c.. 0000110: 87cd fc3f d560 285c 0ea4 ...?.`(..
经过 pbjs 解码后得到如下 json:
> pbjs query_md5.proto --decode query_md5 < tmp/resp.bin > resp.json > jq -c '.' resp.json {"mema":{"mem1":2,"mem2":1048643,"mem3":{"type":"Buffer","data":[186,56,186,147,175,122,218,232,25,103,43,137,221,210,107,92]},"mem4":11,"mem5":{"low":1695456564,"high":11,"unsigned":true},"mem6":0,"mem7":{"type":"Buffer","data":[50,46,50,46,49,48,49,46,50,55]},"mem8":3,"mem9":{"low":-1872613904,"high":0,"unsigned":true}},"memb":0,"memc":{"type":"Buffer","data":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]},"memd":0,"meme":{"low":22177440,"high":0,"unsigned":true},"memf":[{"type":"Buffer","data":[209,91,243,38,71,8,191,199,1,224,75,61,198,36,56,163]},{"type":"Buffer","data":[49,149,68,243,47,50,27,150,120,101,107,130,253,184,149,96]},{"type":"Buffer","data":[154,117,23,53,252,202,230,111,116,134,233,250,220,106,159,171]},{"type":"Buffer","data":[40,76,235,191,54,224,29,87,92,166,147,222,57,27,122,125]},{"type":"Buffer","data":[62,11,67,156,98,165,164,1,195,255,207,0,50,153,188,126]},{"type":"Buffer","data":[246,185,151,70,156,230,149,85,82,211,245,11,108,163,142,177]},{"type":"Buffer","data":[152,82,231,241,37,48,203,107,122,160,85,105,251,205,10,92]},{"type":"Buffer","data":[211,51,51,177,213,22,216,104,57,56,243,7,191,254,212,192]},{"type":"Buffer","data":[166,70,12,223,40,116,72,106,11,192,237,241,111,81,181,158]},{"type":"Buffer","data":[30,238,230,121,91,241,8,50,213,167,252,79,96,207,72,171]},{"type":"Buffer","data":[196,70,150,99,246,164,135,205,252,63,213,96,40,92,14,164]}]}
内容比较多使用 jq -c 列为一行了。将 json 再次编码后,得到的 bin 文件内容如下:
> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin > xxd resp.bin 0000000: 0a08 0802 10c3 8040 1a00 1000 1a00 .......@......
从数据长度就能看出来,明显与第一次不一样。
初步分析
既然之前 pbjs 能成功的恢复 binary 数据,说明它本身的问题不大,复习下第一个消息的格式:
> cat msg.proto message ProbeIpv6Request { string xxxxx = 1; string xxxx = 2; string xxxxxxxx = 3; string xxxxxxx = 4; } message V6AddrType { string addrV6 = 1; uint32 portV6 = 2; } message ProbeIpv6Response { string xxxxx = 1; V6AddrType selfAddr = 2; repeated V6AddrType brosAddr = 3; }
与出问题的消息区别主要在于:前者使用 string,后者使用 bytes。
bytes vs string
难道问题出在 bytes 类型上?尝试将第二个消息中的 bytes 替换为 string:
message common { required uint32 mem1 = 1; required uint32 mem2 = 2; required string mem3 = 3; required uint32 mem4 = 4; required uint64 mem5 = 5; optional uint32 mem6 = 6; optional string mem7 = 7; optional uint32 mem8 = 8; optional uint64 mem9 = 9; } message query_md5 { required common mema = 1; required uint32 memb = 2; required string memc = 3; required uint32 memd = 4; required uint64 meme = 5; repeated string memf = 6; }
但愿 pbjs 对它这两种类型做了兼容,按 string 类型直接解析 binary 数据:
> pbjs query_md5.proto --decode query_md5 < tmp/resp.bin > resp.json > cat resp.json { "mema": { "mem1": 2, "mem2": 1048643, "mem3": "�8���z��u0019g+���k\", "mem4": 11, "mem5": { "low": 1695456564, "high": 11, "unsigned": true }, "mem6": 0, "mem7": "2.2.101.27", "mem8": 3, "mem9": { "low": -1872613904, "high": 0, "unsigned": true } }, "memb": 0, "memc": "g�u0007!^G��%'-m��u0002-", "memd": 0, "meme": { "low": 22177440, "high": 0, "unsigned": true }, "memf": [ "�[�&Gb��u0001�K=�$8�", "1�D�/2u001b�xek����`", "�uu00175���ot����j��", "(L��6�u001dW\���9u001bz}", ">u000bC�b��u0001���u00002��~", "���F���UR��u000bl���", "�R��%0�kz�Ui��n\", "�33��u0016�h98�u0007����", "�Ff�(tHju000b���oQ��", "u001e��y[�b2է�O`�H�", "�F�c�����?�`(\u000e�" ] }
哈哈,居然解出来了,虽然 bytes 字段出现了乱码。如果原封不动的再 encode 回去,应该没问题吧?
> pbjs query_md5.proto --encode query_md5 < resp.json > resp.bin > xxd resp.bin 0000000: 0a49 0802 10c3 8040 1a22 efbf bd38 efbf .I.....@."...8.. 0000010: bdef bfbd efbf bd7a efbf bdef bfbd 1967 .......z.......g 0000020: 2bef bfbd efbf bdef bfbd 6b5c 200b 28b4 +.........k .(. 0000030: baba a8b6 0130 003a 0a32 2e32 2e31 3031 .....0.:.2.2.101 0000040: 2e32 3740 0348 f0db 8883 0910 001a 1a67 .27@.H.........g 0000050: efbf bd07 215e 47ef bfbd efbf bd25 272d ....!^G......%'- 0000060: 6def bfbd efbf bd02 2d20 0028 a0cd c90a m.......- .(.... 0000070: 321e efbf bd5b efbf bd26 4708 efbf bdef 2....[...&G..... 0000080: bfbd 01ef bfbd 4b3d efbf bd24 38ef bfbd ......K=...$8... 0000090: 321e 31ef bfbd 44ef bfbd 2f32 1bef bfbd 2.1...D.../2.... 00000a0: 7865 6bef bfbd efbf bdef bfbd efbf bd60 xek............` 00000b0: 3224 efbf bd75 1735 efbf bdef bfbd efbf 2$...u.5........ 00000c0: bd6f 74ef bfbd efbf bdef bfbd efbf bd6a .ot............j 00000d0: efbf bdef bfbd 321c 284c efbf bdef bfbd ......2.(L...... 00000e0: 36ef bfbd 1d57 5cef bfbd efbf bdef bfbd 6....W......... 00000f0: 391b 7a7d 3220 3e0b 43ef bfbd 62ef bfbd 9.z}2 >.C...b... 0000100: efbf bd01 efbf bdef bfbd efbf bd00 32ef ..............2. 0000110: bfbd efbf bd7e 3226 efbf bdef bfbd efbf .....~2&........ 0000120: bd46 efbf bdef bfbd efbf bd55 52ef bfbd .F.........UR... 0000130: efbf bd0b 6cef bfbd efbf bdef bfbd 321e ....l.........2. 0000140: efbf bd52 efbf bdef bfbd 2530 efbf bd6b ...R......%0...k 0000150: 7aef bfbd 5569 efbf bdef bfbd 0a5c 3222 z...Ui.......2" 0000160: efbf bd33 33ef bfbd efbf bd16 efbf bd68 ...33..........h 0000170: 3938 efbf bd07 efbf bdef bfbd efbf bdef 98.............. 0000180: bfbd 321e efbf bd46 0cef bfbd 2874 486a ..2....F....(tHj 0000190: 0bef bfbd efbf bdef bfbd 6f51 efbf bdef ..........oQ.... 00001a0: bfbd 321c 1eef bfbd efbf bd79 5bef bfbd ..2........y[... 00001b0: 0832 d5a7 efbf bd4f 60ef bfbd 48ef bfbd .2.....O`...H... 00001c0: 3222 efbf bd46 efbf bd63 efbf bdef bfbd 2"...F...c...... 00001d0: efbf bdef bfbd efbf bd3f efbf bd60 285c .........?...`( 00001e0: 0eef bfbd ....
可以是可以,但还是和原始数据有很大差异:
这次是多了很多内容,给我的热情浇了一大盆冷水。抱着试试看的态度,将这个 binary 数据发给服务器,果然报错了:
{"error_code":196608,"error_msg":"fgid not find","request_id":3933672364}
看起来是解析 bytes 字段时失败了。
在我的场景中,使用 pbjs 主要就是根据 json 生成请求的 protobuf 数据并发送给服务器,从而得到 protobuf 响应,之后通过 pbjs 解析响应数据得到 json 数据,最后喂给 jq 来获取想要的各种信息。
如果这一步走不通,后面的就全阻塞了,即使在本地可以使用 string 类型来回转换数据。
json unicode
一开始怀疑 string 类型中一些字符没能成功转换为对应的二进制数据,以上例中的 memc 字段为例:
"memc":{"type":"Buffer","data":[103,198,7,33,94,71,174,137,37,39,45,109,160,246,2,45]}
转换后变为:
"memc": "g�u0007!^G��%'-m��u0002-",
一些乱码字符看起来很可疑,如何在 json 中表示一个字符的二进制形式?搜到了 json 中的 unicode 表达式 u,它要求后面必需跟四位 hex 值,因此这里做了一些转换:
"memc": "u0067u00c6u0007u0021u005eu0047u00aeu0089u0025u0027u002du006du00a0u00f6u0002u002d",
将其它的几个 string 类型字段也如法炮制:
{ "mema": { "mem1": 2, "mem2": 1048643, "mem3": "u00bau0038u00bau0093u00afu007au00dau00e8u0019u0067u002bu0089u00ddu00d2u006bu005c", "mem4": 11, "mem5": { "low": 1695456564, "high": 11, "unsigned": true }, "mem6": 0, "mem7": "2.2.101.27", "mem8": 3, "mem9": { "low": -1872613904, "high": 0, "unsigned": true } }, "memb": 0, "memc": "u0067u00c6u0007u0021u005eu0047u00aeu0089u0025u0027u002du006du00a0u00f6u0002u002d", "memd": 0, "meme": { "low": 22177440, "high": 0, "unsigned": true }, "memf": [ "u00d1u005bu00f3u0026u0047u0008u00bfu00c7u0001u00e0u004bu003du00c6u0024u0038u00a3", "u0031u0095u0044u00f3u002fu0032u001bu0096u0078u0065u006bu0082u00fdu00b8u0095u0060", "u009au0075u0017u0035u00fcu00cau00e6u006fu0074u0086u00e9u00fau00dcu006au009fu00ab", "u0028u004cu00ebu00bfu0036u00e0u001du0057u005cu00a6u0093u00deu0039u001bu007au007d", "u003eu000bu0043u009cu0062u00a5u00a4u0001u00c3u00ffu00cfu0000u0032u0099u00bcu007e", "u00f6u00b9u0097u0046u009cu00e6u0095u0055u0052u00d3u00f5u000bu006cu00a3u008eu00b1", "u0098u0052u00e7u00f1u0025u0030u00cbu006bu007au00a0u0055u0069u00fbu00cdu000au005c", "u00d3u0033u0033u00b1u00d5u0016u00d8u0068u0039u0038u00f3u0007u00bfu00feu00d4u00c0", "u00a6u0046u000cu00dfu0028u0074u0048u006au000bu00c0u00edu00f1u006fu0051u00b5u009e", "u001eu00eeu00e6u0079u005bu00f1u0008u0032u00d5u00a7u00fcu004fu0060u00cfu0048u00ab", "u00c4u0046u0096u0063u00f6u00a4u0087u00cdu00fcu003fu00d5u0060u0028u005cu000eu00a4" ] }
使用 pbjs 编码新的 json 文件尝试:
> pbjs query_md5.proto --encode query_md5 < resp.uni.json > resp.uni.bin > xxd resp.uni.bin 0000000: 0a40 0802 10c3 8040 1a19 c2ba 38c2 bac2 .@.....@....8... 0000010: 93c2 af7a c39a c3a8 1967 2bc2 89c3 9dc3 ...z.....g+..... 0000020: 926b 5c20 0b28 b4ba baa8 b601 3000 3a0a .k .(......0.:. 0000030: 322e 322e 3130 312e 3237 4003 48f0 db88 2.2.101.27@.H... 0000040: 8309 1000 1a15 67c3 8607 215e 47c2 aec2 ......g...!^G... 0000050: 8925 272d 6dc2 a0c3 b602 2d20 0028 a0cd .%'-m.....- .(.. 0000060: c90a 3217 c391 5bc3 b326 4708 c2bf c387 ..2...[..&G..... 0000070: 01c3 a04b 3dc3 8624 38c2 a332 1731 c295 ...K=..$8..2.1.. 0000080: 44c3 b32f 321b c296 7865 6bc2 82c3 bdc2 D../2...xek..... 0000090: b8c2 9560 321a c29a 7517 35c3 bcc3 8ac3 ...`2...u.5..... 00000a0: a66f 74c2 86c3 a9c3 bac3 9c6a c29f c2ab .ot........j.... 00000b0: 3216 284c c3ab c2bf 36c3 a01d 575c c2a6 2.(L....6...W.. 00000c0: c293 c39e 391b 7a7d 3218 3e0b 43c2 9c62 ....9.z}2.>.C..b 00000d0: c2a5 c2a4 01c3 83c3 bfc3 8f00 32c2 99c2 ............2... 00000e0: bc7e 321b c3b6 c2b9 c297 46c2 9cc3 a6c2 .~2.......F..... 00000f0: 9555 52c3 93c3 b50b 6cc2 a3c2 8ec2 b132 .UR.....l......2 0000100: 17c2 9852 c3a7 c3b1 2530 c38b 6b7a c2a0 ...R....%0..kz.. 0000110: 5569 c3bb c38d 0a5c 3219 c393 3333 c2b1 Ui.....2...33.. 0000120: c395 16c3 9868 3938 c3b3 07c2 bfc3 bec3 .....h98........ 0000130: 94c3 8032 17c2 a646 0cc3 9f28 7448 6a0b ...2...F...(tHj. 0000140: c380 c3ad c3b1 6f51 c2b5 c29e 3218 1ec3 ......oQ....2... 0000150: aec3 a679 5bc3 b108 32c3 95c2 a7c3 bc4f ...y[...2......O 0000160: 60c3 8f48 c2ab 3219 c384 46c2 9663 c3b6 `..H..2...F..c.. 0000170: c2a4 c287 c38d c3bc 3fc3 9560 285c 0ec2 ........?..`(.. 0000180: a4 .
新版本看起来比之前有一些变化:
缩短了一些,然而服务器仍然报相同的错误。
事实证明这个方案不可行,使用 string 类型替换 bytes 类型这个方向走到头儿了。
解决方案
既然必需使用 bytes 类型,而 pbjs 又有问题,那有没有其它转换工具呢?
protobufjs
一般的 pbjs help 输出如下:
> pbjs Usage: pbjs [options] <schema_path> Options: -V, --version output the version number --es5 <js_path> Generate ES5 JavaScript code --es6 <js_path> Generate ES6 JavaScript code --ts <ts_path> Generate TypeScript code --decode <msg_type> Decode standard input to JSON --encode <msg_type> Encode standard input to JSON -h, --help output usage information
无意间我的 pbjs 输出了下面的信息:
> pbjs protobuf.js v1.1.2 CLI for JavaScript Translates between file formats and generates static code. -t, --target Specifies the target format. Also accepts a path to require a custom target. json JSON representation json-module JSON representation as a module proto2 Protocol Buffers, Version 2 proto3 Protocol Buffers, Version 3 static Static code without reflection (non-functional on its own) static-module Static code without reflection as a module -p, --path Adds a directory to the include path. --filter Set up a filter to configure only those messages you need and their dependencies to compile, this will effectively reduce the final file size Set A json file path, Example of file content: {"messageNames":["mypackage.messageName1", "messageName2"] } -o, --out Saves to a file instead of writing to stdout. --sparse Exports only those types referenced from a main file (experimental). Module targets only: -w, --wrap Specifies the wrapper to use. Also accepts a path to require a custom wrapper. default Default wrapper supporting both CommonJS and AMD commonjs CommonJS wrapper amd AMD wrapper es6 ES6 wrapper (implies --es6) closure A closure adding to protobuf.roots where protobuf is a global --dependency Specifies which version of protobuf to require. Accepts any valid module id -r, --root Specifies an alternative protobuf.roots name. -l, --lint Linter configuration. Defaults to protobuf.js-compatible rules: eslint-disable block-scoped-var, id-length, no-control-regex, no-magic-numbers, no-prototype-builtins, no-redeclare, no-shadow, no-var, sort-vars --es6 Enables ES6 syntax (const/let instead of var) Proto sources only: --keep-case Keeps field casing instead of converting to camel case. --alt-comment Turns on an alternate comment parsing mode that preserves more comments. Static targets only: --no-create Does not generate create functions used for reflection compatibility. --no-encode Does not generate encode functions. --no-decode Does not generate decode functions. --no-verify Does not generate verify functions. --no-convert Does not generate convert functions like from/toObject --no-delimited Does not generate delimited encode/decode functions. --no-typeurl Does not generate getTypeUrl function. --no-beautify Does not beautify generated code. --no-comments Does not output any JSDoc comments. --no-service Does not output service classes. --force-long Enforces the use of 'Long' for s-/u-/int64 and s-/fixed64 fields. --force-number Enforces the use of 'number' for s-/u-/int64 and s-/fixed64 fields. --force-message Enforces the use of message instances instead of plain objects. --null-defaults Default value for optional fields is null instead of zero value. usage: pbjs [options] file1.proto file2.json ... (or pipe) other | pbjs [options] -
原来有两个 pbjs,一个是 npm install pbjs 所得,一个是 npm install protobufjs[-cli] 所得,后者是用来生成处理 protobuf 数据的 javascript 代码的。
如果先安装了一个,另外一个就会报错:
$ sudo npm install protobufjs -g npm ERR! code EEXIST npm ERR! path /usr/local/bin/pbjs npm ERR! EEXIST: file already exists npm ERR! File exists: /usr/local/bin/pbjs npm ERR! Remove the existing file and try again, or run npm npm ERR! with --force to overwrite files recklessly. npm ERR! A complete log of this run can be found in: npm ERR! /root/.npm/_logs/2023-09-24T03_19_13_647Z-debug-0.log
需要卸载之前安装的才行。网上搜索 pbjs 关键字,有的讲的是第一种,有的讲的是第二种,原因就是安装的包不同,千万不要将这二者混为一谈。
有一种方法可以同时保有两者,就是将另外一个安装在本地:
> npm install protobufjs-cli added 84 packages in 2m > ls node_modules/ acorn brace-expansion entities esutils inherits lodash minimatch protobufjs strip-json-comments underscore acorn-jsx catharsis escape-string-regexp fast-levenshtein js2xmlparser long minimist @protobufjs supports-color word-wrap ansi-styles chalk escodegen fs.realpath jsdoc lru-cache mkdirp protobufjs-cli tmp wrappy argparse color-convert eslint-visitor-keys glob @jsdoc markdown-it once requizzle type-check xmlcreate @babel color-name espree graceful-fs klaw markdown-it-anchor optionator rimraf @types yallist balanced-match concat-map esprima has-flag levn marked path-is-absolute semver uc.micro bluebird deep-is estraverse inflight linkify-it mdurl prelude-ls source-map uglify-js > find . -type f -name "pbjs" ./node_modules/protobufjs-cli/bin/pbjs > ./node_modules/protobufjs-cli/bin/pbjs protobuf.js v1.1.2 CLI for JavaScript Translates between file formats and generates static code. ...... usage: pbjs [options] file1.proto file2.json ... (or pipe) other | pbjs [options] -
缺点是只能用下面的方式引用了:
> ./node_modules/protobufjs-cli/bin/pbjs
关于 protobufjs,主要关注它将 proto 消息转换为 json 描述的格式以便 js 代码直接使用:
> ./node_modules/protobufjs-cli/bin/pbjs -t json query_md5.proto > query_md5.json > cat query_md5.json {{ "nested": { "common": { "fields": { "mem1": { "rule": "required", "type": "uint32", "id": 1 }, "mem2": { "rule": "required", "type": "uint32", "id": 2 }, "mem3": { "rule": "required", "type": "bytes", "id": 3 }, "mem4": { "rule": "required", "type": "uint32", "id": 4 }, "mem5": { "rule": "required", "type": "uint64", "id": 5 }, "mem6": { "type": "uint32", "id": 6 }, "mem7": { "type": "bytes", "id": 7 }, "mem8": { "type": "uint32", "id": 8 }, "mem9": { "type": "uint64", "id": 9 } } }, "query_md5": { "fields": { "mema": { "rule": "required", "type": "common", "id": 1 }, "memb": { "rule": "required", "type": "uint32", "id": 2 }, "memc": { "rule": "required", "type": "bytes", "id": 3 }, "memd": { "rule": "required", "type": "uint32", "id": 4 }, "meme": { "rule": "required", "type": "uint64", "id": 5 }, "memf": { "rule": "repeated", "type": "bytes", "id": 6 } } } }
稍后会用到。
javascript
无论是 protobufjs 还是 pbjs,都可以根据 proto 文件生成 javascript 代码,回顾 pbjs 的帮助信息:
> pbjs Usage: pbjs [options] <schema_path> Options: -V, --version output the version number --es5 <js_path> Generate ES5 JavaScript code --es6 <js_path> Generate ES6 JavaScript code --ts <ts_path> Generate TypeScript code --decode <msg_type> Decode standard input to JSON --encode <msg_type> Encode standard input to JSON -h, --help output usage information
主要是通过 --es5/6 选项来实现,protobufjs 也有类似选项,这里出于描述方便,统一使用 pbjs 说明。
通过运行 js 代码来将 binary 数据转换为 json,也不失为一种解决方案。参考网上的帖子,得到下面的 js 代码:
let pbroot = require("protobufjs").Root; let json = require("./query_md5.json"); let root = pbroot.fromJSON(json); // console.log (root); var fs = require('fs'); fs.readFile('./tmp/resp.bin', function (err, data) { if (err) { console.log(err); } else { console.log(data); console.log(data.length + ' bytes'); let Message = root.lookupType("query_md5"); try{ let message = Message.decode(data); console.log(message); }catch(e){ console.log(e); } } });
注意第 2 行中的 query_md5.json 文件就是上一节中通过 protobufjs 生成的。对上面的代码做个简单说明:
- 加载 query_md5.json 中定义的 proto 类型 (query_md5)
- 读取 binary 数据 (tmp/resp.bin) 并进行解析
- 输出解析结果
运行 js 代码得到下面的输出:
> node index.js <Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 232 more bytes> 282 bytes query_md5 { memf: [ <Buffer d1 5b f3 26 47 08 bf c7 01 e0 4b 3d c6 24 38 a3>, <Buffer 31 95 44 f3 2f 32 1b 96 78 65 6b 82 fd b8 95 60>, <Buffer 9a 75 17 35 fc ca e6 6f 74 86 e9 fa dc 6a 9f ab>, <Buffer 28 4c eb bf 36 e0 1d 57 5c a6 93 de 39 1b 7a 7d>, <Buffer 3e 0b 43 9c 62 a5 a4 01 c3 ff cf 00 32 99 bc 7e>, <Buffer f6 b9 97 46 9c e6 95 55 52 d3 f5 0b 6c a3 8e b1>, <Buffer 98 52 e7 f1 25 30 cb 6b 7a a0 55 69 fb cd 0a 5c>, <Buffer d3 33 33 b1 d5 16 d8 68 39 38 f3 07 bf fe d4 c0>, <Buffer a6 46 0c df 28 74 48 6a 0b c0 ed f1 6f 51 b5 9e>, <Buffer 1e ee e6 79 5b f1 08 32 d5 a7 fc 4f 60 cf 48 ab>, <Buffer c4 46 96 63 f6 a4 87 cd fc 3f d5 60 28 5c 0e a4> ], mema: common { mem1: 2, mem2: 1048643, mem3: <Buffer ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c>, mem4: 11, mem5: Long { low: 1695456564, high: 11, unsigned: true }, mem6: 0, mem7: <Buffer 32 2e 32 2e 31 30 31 2e 32 37>, mem8: 3, mem9: Long { low: -1872613904, high: 0, unsigned: true } }, memb: 0, memc: <Buffer 67 c6 07 21 5e 47 ae 89 25 27 2d 6d a0 f6 02 2d>, memd: 0, meme: Long { low: 22177440, high: 0, unsigned: true } } <Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 232 more bytes>
能正确的解析 binary 数据。对代码稍加改动:
... let buffer= Message.encode(Message.create(message)).finish(); console.log (buffer); fs.writeFile('./resp.bin', buffer, function (err) { if (err) { console.log(err); } else { console.log('success'); } }); ...
将解析后的数据 (message) 再编码为二进制 (buffer) 并输出到文件 (resp.bin):
... <Buffer 0a 37 08 02 10 c3 80 40 1a 10 ba 38 ba 93 af 7a da e8 19 67 2b 89 dd d2 6b 5c 20 0b 28 b4 ba ba a8 b6 01 30 00 3a 0a 32 2e 32 2e 31 30 31 2e 32 37 40 ... 52 more bytes> success > xxd resp.bin 0000000: 0a37 0802 10c3 8040 1a10 ba38 ba93 af7a .7.....@...8...z 0000010: dae8 1967 2b89 ddd2 6b5c 200b 28b4 baba ...g+...k .(... 0000020: a8b6 0130 003a 0a32 2e32 2e31 3031 2e32 ...0.:.2.2.101.2 0000030: 3740 0348 f0db 8883 0910 001a 1067 c607 7@.H.........g.. 0000040: 215e 47ae 8925 272d 6da0 f602 2d20 0028 !^G..%'-m...- .( 0000050: a0cd c90a 3210 d15b f326 4708 bfc7 01e0 ....2..[.&G..... 0000060: 4b3d c624 38a3 3210 3195 44f3 2f32 1b96 K=.$8.2.1.D./2.. 0000070: 7865 6b82 fdb8 9560 3210 9a75 1735 fcca xek....`2..u.5.. 0000080: e66f 7486 e9fa dc6a 9fab 3210 284c ebbf .ot....j..2.(L.. 0000090: 36e0 1d57 5ca6 93de 391b 7a7d 3210 3e0b 6..W...9.z}2.>. 00000a0: 439c 62a5 a401 c3ff cf00 3299 bc7e 3210 C.b.......2..~2. 00000b0: f6b9 9746 9ce6 9555 52d3 f50b 6ca3 8eb1 ...F...UR...l... 00000c0: 3210 9852 e7f1 2530 cb6b 7aa0 5569 fbcd 2..R..%0.kz.Ui.. 00000d0: 0a5c 3210 d333 33b1 d516 d868 3938 f307 .2..33....h98.. 00000e0: bffe d4c0 3210 a646 0cdf 2874 486a 0bc0 ....2..F..(tHj.. 00000f0: edf1 6f51 b59e 3210 1eee e679 5bf1 0832 ..oQ..2....y[..2 0000100: d5a7 fc4f 60cf 48ab 3210 c446 9663 f6a4 ...O`.H.2..F.c.. 0000110: 87cd fc3f d560 285c 0ea4 ...?.`(..
与原始数据做个对比:
完全一致!看起来这种方法可行,只是有些麻烦。
protoc
说到通过 proto 文件编解码二进制数据,最拿手的就不应该是 protobuf 自带的 protoc 工具吗?
$ ./protoc --help Usage: ./protoc [OPTION] PROTO_FILES Parse PROTO_FILES and generate output based on the options given: -IPATH, --proto_path=PATH Specify the directory in which to search for imports. May be specified multiple times; directories will be searched in order. If not given, the current working directory is used. --version Show version info and exit. -h, --help Show this text and exit. --encode=MESSAGE_TYPE Read a text-format message of the given type from standard input and write it in binary to standard output. The message type must be defined in PROTO_FILES or their imports. --decode=MESSAGE_TYPE Read a binary message of the given type from standard input and write it in text format to standard output. The message type must be defined in PROTO_FILES or their imports. --decode_raw Read an arbitrary protocol message from standard input and write the raw tag/value pairs in text format to standard output. No PROTO_FILES should be given when using this flag. -oFILE, Writes a FileDescriptorSet (a protocol buffer, --descriptor_set_out=FILE defined in descriptor.proto) containing all of the input files to FILE. --include_imports When using --descriptor_set_out, also include all dependencies of the input files in the set, so that the set is self-contained. --include_source_info When using --descriptor_set_out, do not strip SourceCodeInfo from the FileDescriptorProto. This results in vastly larger descriptors that include information about the original location of each decl in the source file as well as surrounding comments. --dependency_out=FILE Write a dependency output file in the format expected by make. This writes the transitive set of input file paths to FILE --error_format=FORMAT Set the format in which to print errors. FORMAT may be 'gcc' (the default) or 'msvs' (Microsoft Visual Studio format). --print_free_field_numbers Print the free field numbers of the messages defined in the given proto files. Groups share the same field number space with the parent message. Extension ranges are counted as occupied fields numbers. --plugin=EXECUTABLE Specifies a plugin executable to use. Normally, protoc searches the PATH for plugins, but you may specify additional executables not in the path using this flag. Additionally, EXECUTABLE may be of the form NAME=PATH, in which case the given plugin name is mapped to the given executable even if the executable's own name differs. --cpp_out=OUT_DIR Generate C++ header and source. --csharp_out=OUT_DIR Generate C# source file. --java_out=OUT_DIR Generate Java source file. --javanano_out=OUT_DIR Generate Java Nano source file. --js_out=OUT_DIR Generate JavaScript source. --objc_out=OUT_DIR Generate Objective C header and source. --php_out=OUT_DIR Generate PHP source file. --python_out=OUT_DIR Generate Python source file. --ruby_out=OUT_DIR Generate Ruby source file.
说干就干:
> ./protoc --decode=query_md5 query_md5.proto < tmp/resp.bin > resp.pb [libprotobuf WARNING ../../src/google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: query_md5.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.) > cat resp.pb mema { mem1: 2 mem2: 1048643 mem3: "2728272223257z332350 31g+211335322k\" mem4: 11 mem5: 48940096820 mem6: 0 mem7: "2.2.101.27" mem8: 3 mem9: 2422353392 } memb: 0 memc: "g306 07!^G256211%'-m240366 02-" memd: 0 meme: 22177440 memf: "321[363&G 10277307 01340K=306$8243" memf: "1225D363/2 33226xek202375270225`" memf: "232u 275374312346ot206351372334j237253" memf: "(L3532776340 35W\2462233369 33z}" memf: ">