overview

needs to handle packets and subpackages when performing TCP Socket development. This article gives a detailed explanation of the steps to solve the problem. The language used is Python. In fact, it is very simple to solve the problem. Under the application layer, a protocol is defined: message head + message length + message text.

what is a sticky bag and a subpackage?

about subpackage and sticky package

sticky package: the sender sends two strings "hello" + "world", but the receiver has received HelloWorld at once.

subpackage: the sender sends the string "HelloWorld", and the recipient receives two strings "hello" and "world".

although the socket environment has the above problems, but the TCP transmission data can guarantee a few points: the order of

  • is the same. For example, the sender sends Hello, and the receiver also sequentially receives Hello, which is committed by the TCP protocol. So this is the key to solve the subpackage and sticky problem. The
  • partition does not insert other data in the middle of the packet.

, so if you want to use socket communications, you must define an agreement yourself. At present, the most commonly used protocol standards are: message header (Baotou) + message length + message body

TCP. Why does subcontract

TCP send data in segment (Segment)? After building TCP link, there is a maximum message length (MSS). If the application layer data packet is more than MSS, the application layer packet is split into two segments to be sent. At this time, the application layer of the receiver will splice the two TCP packets to correctly handle the data.

related routers have a MTU (maximum transmission unit), usually 1500 bytes, excluding IP header 20 bytes, leaving only MTU-20 bytes for TCP. So the MSS of the general TCP is MTU-20=1460 bytes.

is sent by multiple packets when the application layer data is over 1460 bytes.

TCP extended reading RFC MSS's definition of the default value is 536, this is because the RFC 791 said in any of the IP devices have at least 576 receiving size (in fact it is 576 dial-up network MTU, and the 20 byte 576 minus IP head is 536). Why does
TCP stick to

? Sometimes, TCP will use an algorithm called Nagle in order to improve network utilization. The algorithm means that the sending end will delay sending even if it has the data to be sent. If the application layer transfers data to TCP very fast, it will stick the two application level data packets together, and TCP will send a TCP packet to the receiving end at last.

  • Python development environment: 3.5.1
  • version of the

  • operating system: Windows 10 x64

message head (including message length)

not only the message header is a byte 0xAA such as what, can also include the protocol version number, instruction, but also can put the message length to the message with the head Baotou, the only requirement is the length to be fixed, the bag body is variable length. The following is a custom in Baotou:

I

message length

Version (VER) (bodySize) instructions (CMD)

version number, message length, instruction data type is unsigned 32 bit integer variables, so the message length is fixed at 4 x 3=12 bytes. In Python, there is no type definition, so it is generally the use of the struct module to generate Baotou. Example:

 import struct import JSON ver = 1 body = json.dumps (dict (hello= world)) print (body) # {"hello": "world"} CMD = 101 = [ver (header, body.__len__) cmd], headPack = struct.pack ("3I!", *header) print (headPack) # b'x00x00x00x01x00x00x00x12x00x00x00e'

on the use of custom end segmentation packet

some people will want to use a custom end segment of each packet, the packet transmission will not need to specify the length of what Baotou does not need to. But if this is done, the network transmission performance loses very much, because each read one byte has to do a if judgment whether it is the end character. Therefore, it is suggested that the message head + message length + message text is selected.

, and when using the custom terminator, if the symbol appears in the message body, the data will be cut off. At this time, we need to deal with the symbol escape, which is analogous to the rn backslash. So it is very not recommended to use the terminator to split the packet. The data format of the text of the message body of the

message

can be used in Json format, which is generally the data used to store unique information. In the following code, I use the {"hello", "world"} data to test it. Python uses the JSON module to generate JSON data

Python example

and uses Python code to show how to handle the sticky packages and subpackages of TCP Socket. The core is to judge by using a FIFO queue to receive buffer dataBuffer and a small while loop.

the process is this: to read out from the socket data into dataBuffer, and then enter the back (up) a small cycle, if the dataBuffer content is less than the length of message length (bodySize), is out of circulation is greater than the length of the message, continue to receive; from the buffer to read Baotou and the bag body length, then to determine whether the message header + message length is larger than the entire buffer, if less out of small cycles continue to receive, if more than read the inclusion content, then processing the data, and then put the message header and eliminating Xi Zhengwen from dataBuffer (a team). Under

, a flowchart is drawn with Markdown.

 Python # server-side code Version:3.5.1 import socket import struct HOST = 'PORT = 1234 dataBuffer = bytes (headerSize) = 12 Sn = 0 def dataHandle (headPack, body): Global Sn = Sn 1 print ("%s package"% Sn) print (ver:%s, bodySize:%s, cmd:%s% headPack) print (body.decode) (print) ("if") __name__ = ='__main__': with socket.socket (socket.AF_INET, socket.SOCK_STREAM) as s: s.bind ((HOST, PORT)) s.listen (1) Conn. Addr = s.accept (with) conn: print ('Connected by'addr) while True: data = conn.recv (1024) if Data: # the data stored in the buffer, similar to the push data while True: if data dataBuffer = len (dataBuffer <); headerSize: print ("packet (%s Byte) message is less than the length of the skull, jump out of a small cycle"% len (dataBuffer)) break Baotou: # # read struct on behalf of Network order, 3I! 3 unsigned int data headPack = struct.unpack (3I', headerSize]'! DataBuffer[: bodySize = headPack[1]) # packet processing, out of function continue to receive data if len (dataBuffer) < headerSize+; bodySize: Print ("packet (%s Byte) is not complete (total%s Byte), out of small cycle" (% len (dataBuffer), headerSize+b OdySize)) break # reads the message text content of body = dataBuffer[headerSize:headerSize+bodySize] # data processing dataHandle (headPack, body) of the # stick package processing dataBuffer = dataBuffer[headerSize+bodySize:] # gets the next packet, similar to the pop 

according to the number of the test server client code

attach the test stick package and subcontracting the

 Python Version:3.5.1 # client code 

import socket import time import struct import JSON host = "localhost" port = 1234 = ADDR (host, port) if __name__ = ='__main__': client = socket.socket (client.connect) (ADDR) # normal packet definition ver = 1 body = json.dumps (dict ("wor hello= LD ")) print (body) CMD = 101, header = [ver, body.__len__ (cmd]), headPack = struct.pack (" 3I! ", *header) = headPack+body.encode (sendData1) data definition ver = body = json.dumps 2 sub # (dict (hello= world2)) print (body) CMD = 102 header = [ver, body.__len__ (cmd]), headPack = struct.pack (" 3I! ", *header) sendData2_1 = headPack+body[(sendData2_2: 2].encode) = (body[2:].encode) # stick package data definition ver = 3 body1 = json.dumps (dict (hello= world3)) print (body1) CMD = 103 (header = [ver, body1.__len__ headPack1 = struct.pack (cmd])," 3I! ", *header) ver = 4 body2 = json.dumps (dict (hello= world4)) print (body2) CMD = 104 header = [ver, body2.__len__ (cmd]), headPack2 = struct.pack (" 3I, *head! " Er = headPack1+body1.encode (sendData3)) (+headPack2+body2.encode) # normal packet client.send (sendData1) time.sleep (3) client.send sub # (sendData2_1) time.sleep (0.2) client.send (sendData2_2) time.sleep (3) # adhesion test client.send (sendData3) time.sleep (3) client.close (

)

server

printing below is the

print results tested, the receiver has perfect processing visible stick package and sub problem.

 Connected by ('127.0.0.1', 23297) first packets ver:1, bodySize:18, cmd:101 {"hello": "world"} packets (0 Byte) is less than the length of Baotou, out of small circulation data packet (14 Byte) incomplete (total 31 Byte), out of small circulation second packets ver:2, bodySize:19 cmd:102, {"hello": "world2"} packets (0 Byte) is less than the length of Baotou, out of small circulation third packets ver:3, bodySize:19, cmd:103 {"hello": "world3"} fourth packets of ver:4, bodySize:19, cmd:104 {"hello": "world4"}

stick package

and sub block or whether it is in fact using asynchronous socket development framework in the framework, the framework itself will provide a method for receiving data available to developers, developers generally have to override this method. Here is the Twidted development framework and processing stick package sub sample, only the core program:

 # Twiested class MyProtocol (Protocol): _data_buffer = bytes (DEF dataReceived) # code omitted (self, data): "Called whenever data is received." "self._data_buffer" data = headerSize = 12 while True: if len (self._data_buffer) < headerSize: return # head # struct reads the message: on behalf of Network order, 3I! On behalf of the 3 unsigned int data headPack = struct.unpack (3I', self._data_buffer['!: headerSize]) # gets the message text length bodySize = headPack[1] if len # packet processing (self._data_buffer): < headerSize+bodySize return # read news text Content body = self._data_buffer[headerSize:headerSize+body


This concludes the body part