FreeRTOS-TCP performance issues with LPC1768

Hello, I am having some performance and reliability issue with the FreeRTOS-TCP stack on a NXP LPC1768. I am using FreeRTOS v10.1.0 and the NetworkInterface.c file from the download here https://interactive.freertos.org/hc/en-us/community/posts/210030166-FreeRTOS-TCP-Labs-port-and-demo-for-Embedded-Artists-LPC4088-Dev-Kit which in turn requires LPCOpen v2.10. FreeRTOS is configured with 8 priority levels. The Ethernet Rx Task is created as priority 7 (configMAX_PRIORITIES – 1) and the IP task then becomes priority 6. The network stack is running and gets an IP address from a DHCP server. If I then ping the device from a PC elsewhere on the network then I get a reply time of 0.3ms. I am trying to integrate the Minnow WebSockets server (and utlimately SharkSSL security layer) from Real Time Logic. When I start the task that runs the WebSockets server (currently at priority 4), even when no browser is trying to connect the ping response time goes out to between 3 and 5 seconds! When a browser tries to make the initial connection to the server to get the static content (index.hml, some Javascript files and images) the transfers all start to time out. When I look at the network traffic with Wireshark then I see lots of reset and TCP retransmission entries. With the debug output enabled in the driver it seems that there are lots of receive interrupts occurring with no data associated. I am at a loss as to how to debug this from here. I tried looking at the LPC17xx NetworkInterface.c file that ships with FreeRTOS v10.1.0 but it doesn’t appear to compile against the FreeRTOS-TCP and LPCOpen files. -Andy.

FreeRTOS-TCP performance issues with LPC1768

I am trying to integrate the Minnow WebSockets server (and utlimately SharkSSL security layer) from Real Time Logic. When I start the task that runs the WebSockets server (currently at priority 4), even when no browser is trying to connect the ping response time goes out to between 3 and 5 seconds!
This seems to be the thing to understand first. If I understand your description correctly, adding the above referenced library is the only change made between getting a 0.3ms ping reply and a 3 to 5 second ping reply. That is so massively long there must be some coarse optimizations to be done to remove most of that latency before looking in detail at any performance gains that might be made in the MAC driver itself or interface between the MAC and the TCP stack itself. I’m not familiar with the library, but am going to guess it is doing something that is not very multithreading friendly. Are you able to trace the execution using something like Percepio Tracealyzer (http://www.freertos.org/trace) or Segger SystemView for FreeRTOS? That will help you understand where the time is being eaten up to enable a more targeted debug effort.

FreeRTOS-TCP performance issues with LPC1768

I’ve integrated the Percepio Traceanalyzer code and captured some traces and lots of the early ones seemed to be filled with nothing more than the system tick. I have disabled the capture of that in the hope it captures more useful stuff instead. I’m not sure exactly how to interpret what I am looking at but the only thing I can see of any significant duration is listed in the NetEvnt Queue where the IP-Task xQueueReceive blocks for 100ms trying to receive, followed by a failed to receive, followed by it blocking again for 900ms as shown in the attached image. This is something that repeats over and over. -Andy.

FreeRTOS-TCP performance issues with LPC1768

According to the post here someone else ported this driver to the LPC1768 and appears to have had the same problems I am seeing. I tried to contact them through the interactive site back in September when I thought we were about to embark on this project but they have not responded to my message. -Andy.

FreeRTOS-TCP performance issues with LPC1768

I think I may have made some progress with this… In the handler task that deals with received buffers from the hardware it was checking if the receive queue was empty using an if() and then only processing the first buffer. Changing it to a while(!empty) and therefore processing all the data seems to have got the ping response consistently down to 0.5ms -Andy.

FreeRTOS-TCP performance issues with LPC1768

Hi Andi,
Changing it to a while(!empty) and therefore processing all the data
That’s indeed the usual way of working: the working task is woken-up from a MAC-interrupt ( either by a task notification or a semaphore ). The working task checks both queues: reception and transmission, each in a while(!empty) loop. While working on these queues, new notifications may come in. That is no problem, the next xTaskNotifyWait() will return immediately without blocking.
seems to have got the ping response consistently down to 0.5ms
Great! In case you have Linux: try sudo ping -f <address> to have a continuous flood of ICMP packets. Here are some items that determine the performance of an Ethernet driver:
  • Filtering unwanted traffic using MAC filters and hash filters.
  • Early filtering: packets that are accepted by the MAC can be filtered by the driver: think of unwanted broadcast packets. It is a waste of time to pass them to the IP-stack if they’re not used later on.
  • Consider using multi-casting in stead of broadcasting. I have seen networks that are littered with IPv4 broadcasts.
  • CRC offloading, unfortunately not available on a LPC1768
  • Use zero-copy methods wherever possible
Normal optimisations:
  • Disable stack checking
  • Use C optimisation level
  • Disable asserts ( if you dare )
You can also check the network performance with iperf3, see this post

FreeRTOS-TCP performance issues with LPC1768

Things aren’t quite perfect. Running a prolonged ping test (64 bytes) will run for several hours before the system locks up. I have added a bunch of breakpoints to the error handlers in the Ethernet driver to see if it traps in any of those but it doesn’t look like it does. I’ll try and capture the trace buffer the next time it fails to see if that shows where it has gone. -Andy.

FreeRTOS-TCP performance issues with LPC1768

Running a prolonged ping test (64 bytes)
Are you using the ping flood ( -f ) option for this? Or is it just a slow ping?
run for several hours before the system locks up.
When it is locked up, can’t you press a pause button and see where it hangs out? Is the system totally locked up, or extremely slow? Can you show some heart-beat, a LED that is blinking from a user task? If you look at the NetworkInterface.c of e.g. STM32Fx, you will see that prvEMACHandlerTask() can issue some warnings: ~~~ Network buffers: 4 lowest 2 TX DMA buffers: lowest 0 Queue space: lowest 34 ~~~ It can be useful to monitor these resources: the number of free Network Buffers, the number of available DMA buffers, see if all DMA transmissions are successful, see if there is a ( DMA ) reception overflow ( which can be nasty ). And also it is wise to monitor the free space on the heap. While writing this message, I had ping running, talking to a FreeRTOS+TCP device: ~~~ root@ubuntu:~# ping -f 192.168.2.106 PING 192.168.2.106 (192.168.2.106) 56(84) bytes of data. .^C — 192.168.2.106 ping statistics — 1397706 packets transmitted, 1397705 received, 0% packet loss, time 1094968ms rtt min/avg/max/mdev = 0.330/0.698/69.936/0.579 ms, pipe 4, ipg/ewma 0.783/0.655 ms ~~~

FreeRTOS-TCP performance issues with LPC1768

When it gives up it appears that it has hit the hard fault handler but I haven’t worked out what has caused it to get there. The ping test was just done with a slow ping. I will repeat the test with the -f option and see what happens. Looking again at the initial issue I raised regarding the Minnow Server. Wireshark shows the HTTP request coming from the browser and an ACK coming back from the device. It then doesn’t seem to send any of the data from the web server. The support team at Real Time Logic seem to believe that there is a problem with the way that large packets are sent – possibly that the driver just gives up on them. -Andy.

FreeRTOS-TCP performance issues with LPC1768

Running ping with -f shows just one . on the screen then: ~~~ PING 192.168.10.184 (192.168.10.184): 56 data bytes ..Request timeout for icmpseq 3659 ..Request timeout for icmpseq 3835 .Request timeout for icmp_seq 3836 ~~~ Every request thereafter times out. EDIT: I have removed the if (bReleaseAfterSend != pdFALSE) from around the call to vReleaseNetworkBufferAndDescriptor(pxNetworkBuffer); so and performance seems to be improving. Flood ping test now gives: ~~~ — 192.168.10.184 ping statistics — 893181 packets transmitted, 893086 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.298/0.444/14.706/0.055 ms ~~~ -Andy.

FreeRTOS-TCP performance issues with LPC1768

Not sure whether this makes any difference but I am using heap4.c and BufferAllocation1.c EDIT: For other reasons, I have changed this to use heap_3.c and forced the heap into one the 16KB AHBSRAM regions in the LPC1768 and behaviour is much the same. -Andy.

FreeRTOS-TCP performance issues with LPC1768

I have added some of the warning messages from the STM32Fx driver and I see that the number of network buffers reported by uxGetMinimumFreeNetworkBuffers() does go to 0. Should the network driver be blocking somewhere to wait for one to free up? -Andy.

FreeRTOS-TCP performance issues with LPC1768

When the number of available Network Buffers is getting close to zero, you have a serious problem. Here are some simple strategies to investigate it: I would try not to block on the availability, and in stead just drop a packet. Suppose your driver received a packet, you want to copy it to a Network Buffer, but none is available: I would increase some error-counter and drop the packet. And for a zero-copy driver: a packet has arrived, but you do not have a new Network Buffer to replace it: leave the current Network Buffer assigned to DMA and drop the packet. Suppose that xNetworkInterfaceOutput() is called and there is no DMA buffer available within e.g. 10 ms, I would increase a counter and drop it. At every place where packets are dropped, please update some counter so that you can study the behaviour! UDP packets are stored in Network Buffers. It is important to have those packets read/consumed by the task that owns the socket. If not, you loose valuable Network Buffers. If the CPU is very busy, it may happen that such a task doesn’t get enough CPU-time to process them. ( please note the macro ipconfigUDP_MAX_RX_PACKETS, that can help here ) In principle, the IP-task should never block on getting resources. One exception is xNetworkInterfaceOutput(), which may wait a few ms for an available DMA buffer. But if it waits for a Network Buffer, in most cases, it will be waiting for itself. In the end, you do not want a TCP/IP stack in which packets are dropped regularly. The above method is a way of studying the behaviour. Once all parameters are well tuned, packets are rarely dropped and there should always be enough Network Buffers.