BufferAllocation_1.c xFreeBuffersList memory corruption issue

I’ve written a zero-copy network driver for FreeRTOS+TCP on STM32F4. For the most part it works well, but every once in a while I get an issue with pxGetNetworkBufferDescriptor returning NULL. I’m calling it with xBlockTimeTicks set to portMAX_DELAY, so in theory this should never happen. Looking through the source code, the buffer descriptors are kept in a list called xFreeBuffersList, and are fetched by this line: ~~~ pxReturn = ( NetworkBufferDescriptort * ) listGETOWNEROFHEAD_ENTRY( &xFreeBuffersList ); ~~~ Following this, some validation is performed on pxReturn to make sure it is a valid descriptor, and if not, return NULL. This seems to be where it’s failing: ~~~ if( ( bIsValidNetworkDescriptor( pxReturn ) != pdFALSEUNSIGNED ) && listISCONTAINED_WITHIN( &xFreeBuffersList, &( pxReturn->xBufferListItem ) ) ) ~~~ So it’s basically looking up the ListItemt associated with the network descriptor, and verifying that it is part of xFreeBuffersList. listISCONTAINEDWITHIN just compares the pvContainer field of the ListItemt to the list pointer (xFreeBuffersList in this case). The problem seems to be that pvContainer is set to NULL. As this should get set when the descriptor is inserted into the list (i.e. when it was last released), and the descriptor is definitely getting inserted into the list correctly (it must be, given that we just got it from the list), this makes me think the list structure is getting corrupted in memory somehow. A couple of times I’ve also seen a different issue, where listGETOWNEROFHEADENTRY returns some random value which isn’t a buffer descriptor. Whenever this happened, I tracked it down to fields in the list structure being invalid (sometimes NULL or sometimes random values), which also suggests memory corruption to me. There are only 5 places in my code where I call into BufferAllocation_1.c:
  • During xNetworkInterfaceInitialise, I call pxGetNetworkBufferDescriptor multiple times to allocate buffers to the receive DMA descriptors.
  • During xNetworkInterfaceOutput, I assign the buffer passed in by the TCP/IP stack to a transmit DMA descriptor. If this fails (due to a DMA descriptor not being available within a given timeout), I call vReleaseNetworkBufferAndDescriptor to discard the buffer.
  • During the receive deferred interrupt processing task, I attempt to send the received buffer to the TCP/IP stack. If this fails, I call vReleaseNetworkBufferAndDescriptor to discard the buffer.
  • Also in the deferred processing task, I call pxGetNetworkBufferWithDescriptor to replace the receive buffer that was just used.
  • In the transmit complete interrupt handler. I call vNetworkBufferReleaseFromISR to release the transmit buffer.
In all of these cases, I’m calling the buffer allocation functions exactly as described in the documentation, so I don’t think this is the issue. Whenever I have to deal with a buffer in an ISR, I make sure to only call the interrupt-safe functions. Checking in BufferAllocation1.c, all of the functions that interact with the descriptor list use ipconfigBUFFERALLOCLOCK and ipconfigBUFFERALLOC_UNLOCK (or the interrupt-safe equivalents), so I’m not sure where the problem is happening. I’ve written a simple test application to reproduce the issue – it basically just listens for UDP packets on port 10000 and echoes them back to the sender. I’m setting the FREERTOSZEROCOPY flag to reduce unnecessary buffer allocation/deallocation, to rule that out. This is the only task on the system which uses the TCP stack: ~~~ static void echoThread(void *pvParameters) { const uint16_t port = 10000;
Socket_t sock = FreeRTOS_socket(FREERTOS_AF_INET, FREERTOS_SOCK_DGRAM, FREERTOS_IPPROTO_UDP);
CAD_ASSERT(sock != FREERTOS_INVALID_SOCKET);

struct freertos_sockaddr hostAddr;
hostAddr.sin_port = FreeRTOS_htons(port);
FreeRTOS_bind(sock, &hostAddr, sizeof(hostAddr));

struct freertos_sockaddr clientAddr;
uint32_t clientAddrLength = sizeof(clientAddr);
uint8_t *buffer;
int32_t length;
while (1)
{
    length = FreeRTOS_recvfrom(sock, &buffer, 0, FREERTOS_ZERO_COPY, &clientAddr, &clientAddrLength);
    if (length > 0)
    {
        length = FreeRTOS_sendto(sock, buffer, length, FREERTOS_ZERO_COPY, &clientAddr, clientAddrLength);
    }

    if (length <= 0)
    {
        FreeRTOS_ReleaseUDPPayloadBuffer(buffer);
    }
}
} ~~~ The issue only shows up very rarely – probably once in every 100,000 to a million packets. The only reliable way I’ve found to reproduce it is to flood the network interface with traffic, at a rate of several thousand packets/second. Even then it’s difficult to reproduce, sometimes taking several minutes. If it helps at all, I’ve found that it tends to show up most often just after starting to send packets – stopping and then restarting the packet generator is more likely to trigger it than just leaving it running. I did find one other post on here (https://freertos.org/FreeRTOSSupportForumArchive/August2016/freertosTCPBufferAllocation1xFreeBuffersListcorruptionbab400aej.html) with a similar issue. In that case it was tracked down to stack overflow (identified by the 0xA5A5A5A5 sequence), but in my case I don’t see that sequence anywhere so I doubt it’s stack overflow. Also if it was stack overflow I’d expect it to happen a lot more frequently and less randomly than it is doing. At this point I’m not sure how I can debug the issue any further, as the problem seems to be happening within the TCP stack itself. Any ideas? Thanks, Tom

BufferAllocation_1.c xFreeBuffersList memory corruption issue

Hmm, tricky to see in this as you seem to have done a good level of debugging already. Is the driver code all yours? Or are you also relying on some STM32 library/cube functions? If you are using STM32 library functions, do any of them mess with the basepri register? That could potentially effect critical sections. Also, how are you passing received buffers into the TCP/IP stack? Is it from an interrupt? If so, are you then trying to allocated a new buffer from the interrupt too? Are you using a recent version of FreeRTOS with configASSERT() defined? FreeRTOS V10.0.1 would be recommended as that has asserts for just about all the interrupt priority configuration errors people make.

BufferAllocation_1.c xFreeBuffersList memory corruption issue

We’re not using any of the STM32 library code, all the driver code has been written from scratch. Nothing in my code touches BASEPRI. We’re using FreeRTOS v9 as we started work on the project just before v10 was released and haven’t got round to upgrading yet, but the TCP/IP stack code was taken from the v10 release. configASSERT is defined but isn’t failing anywhere. Receive buffers get passed back to the stack from a deferred processing task, not from the interrupt. The only buffer management that happens in an ISR is releasing the transmit buffers after the transmit complete interrupt. One thing I’ve just noticed – vNetworkBufferReleaseFromISR calls vListInsertEnd to put the buffer back into the free buffers list. Is this safe to call from an ISR? (It’s wrapped inside portSETINTERRUPTMASKFROMISR/portCLEARINTERRUPTMASKFROMISR, but I’m not sure if this is adequate to make a non-interrupt safe function interrupt safe). I’ll try releasing the transmit buffers from the deferred processing task instead of directly from the interrupt, and see if that makes any difference.

BufferAllocation_1.c xFreeBuffersList memory corruption issue

One thing I’ve just noticed – vNetworkBufferReleaseFromISR calls vListInsertEnd to put the buffer back into the free buffers list. Is this safe to call from an ISR? (It’s wrapped inside portSETINTERRUPTMASKFROMISR/portCLEARINTERRUPTMASKFROMISR, but I’m not sure if this is adequate to make a non-interrupt safe function interrupt safe).
That should be ok.
I’ll try releasing the transmit buffers from the deferred processing task instead of directly from the interrupt, and see if that makes any difference.
Let us know if that solves the problem.

BufferAllocation_1.c xFreeBuffersList memory corruption issue

Hi Tom,
I’ve written a zero-copy network driver for FreeRTOS+TCP on STM32F4
That is very brave of you, but there was a zero-copy driver already. It is not in the official distribution yet, but it was published in this list. I will post it as an attachment here below (STM32Fxx.zip).
but every once in a while I get an issue with pxGetNetworkBufferDescriptor returning NULL.
I solved this problem by dropping a packet. When a packet has been received, then you need to provide a new empty network buffer. If that is impossible, I would drop the received packet, and leave the buffer in place so it will be used to receive a next package.
I’m calling it with xBlockTimeTicks set to portMAX_DELAY
Are you sure it is wise such a long time? Does this happen from prvEMACHandlerTask() ? When the function is called from the IP-task ( e.g. from xNetworkInterfaceOutput() ), you should use a time-out of zero. If not, the IP-task will probably be waiting for the IP-task to release some network buffer.
During xNetworkInterfaceInitialise, I call pxGetNetworkBufferDescriptor multiple times to allocate buffers to the receive DMA descriptors.
Comment : during initialisation, there must be enough network buffers. You can use configASSERT() for this. If it is triggered, the is a configuration error.
During xNetworkInterfaceOutput, I assign the buffer passed in by the TCP/IP stack to a transmit DMA descriptor. If this fails (due to a DMA descriptor not being available within a given timeout), I call vReleaseNetworkBufferAndDescriptor to discard the buffer.
This function has lead to some confusion in the past. To summarise: If ipconfigZERO_COPY_TX_DRIVER = 1, the parameter bReleaseAfterSend is always true If bReleaseAfterSend is true, it means that the driver will own the pxDescriptor. So yes, if sending fails, you must release the network buffer. If sending succeeds, the releasing of the buffer is suspended. In my zero-copy driver, the releasing is indeed done within prvEMACHandlerTask(). I stop commenting here. Please have a look at the attached driver. It has been tested on both STM32F4 and STM32F7. Regards.

BufferAllocation_1.c xFreeBuffersList memory corruption issue

That is very brave of you, but there was a zero-copy driver already. It is not in the official distribution yet, but it was published in this list. I will post it as an attachment here below (STM32Fxx.zip).
Thanks Hein, I’ll take a look at that and see how it differs from my code. From a quick glance, it seems like the buffer management is mostly the same, but I’m doing some initial filtering in the ISR to reduce context switches from dropped packets.
I solved this problem by dropping a packet. When a packet has been received, then you need to provide a new empty network buffer. If that is impossible, I would drop the received packet, and leave the buffer in place so it will be used to receive a next package.
I don’t think this will solve my issue, which I think is down to the buffer list becoming corrupted, rather than a timeout (as noted, I’m calling pxGetNetworkBufferAndDescriptor with portMAX_DELAY). Once it gets in this state, every call to pxGetNetworkBufferAndDescriptor returns NULL immediately (i.e. not due to timeout), so without fixing the underlying issue this would just cause the receive process to drop packets forever.
Are you sure it is wise such a long time? Does this happen from prvEMACHandlerTask() ? When the function is called from the IP-task ( e.g. from xNetworkInterfaceOutput() ), you should use a time-out of zero. If not, the IP-task will probably be waiting for the IP-task to release some network buffer.
Yes, this is only done from the receive task, never from anything called by the IP task. I don’t ever allocate buffers in xNetworkInterfaceOutput, I only use the buffer that’s passed in as a parameter. My reason for using an infinite block time is that, once the receive buffer has been given back to the IP task, the new buffer must be allocated to the receive DMA, regardless of how long it takes. Otherwise the receive DMA will be gradually depleted of buffers. However I agree with you that it’s better to try to get the replacement buffer first, and if that fails then drop the packet and reuse the existing buffer – I’ll change my code to do that.

BufferAllocation_1.c xFreeBuffersList memory corruption issue

Hi Tom,
but I’m doing some initial filtering in the ISR to reduce context switches from dropped packets.
Maybe you want to share some code?
I don’t think this will solve my issue, which I think is down to the buffer list becoming corrupted
I don’t know where the corruption comes from, but I doubt that there is a problem with BufferAllocation_1.c.
My reason for using an infinite block time is that, once the receive buffer has been given back to the IP task, the new buffer must be allocated to the receive DMA, regardless of how long it takes.
Here I do not agree. My choice was to drop the packet received ( do not forward it to the IP-task ), and leave the buffer in places.
Otherwise the receive DMA will be gradually depleted of buffers.
True, that is why I would not swap the buffer.
I’ll change my code to do that.
Yes good. But please check if it really happens and how often. Regards, Hein