Need general advice on debugging a crashing application

Hi there, I have an application that is exhibiting odd crashing behavior. Some times it will throw an exception related to accessing out of bounds memory (usually in the Queue handler). Sometimes it will lock up without throwing an exception. I have tried to isolate the source of the crash and noticed that it seemed related to a queue I use to pass pointers to a memory array between processes. Commenting out the reading or writing of this queue seems to abate the issue. I have seen this type of odd behavior before and it was caused by accidentally using xSemaphoreGive instead of xSemaphoreGiveFromISR within an ISR, or by having an interrupt priority level set higher than configMAXSYSCALLINTERRUPT_PRIORITY. I have verified that neither of these are the case, and additionally the queue is not filled or read from an ISR either, so I began to suspect stack overflows. I have set configCHECKFORSTACK_OVERFLOW to 3 but the overflow trap is never reached. I am open to suggestions on how to proceed as for now I am very confused.

Need general advice on debugging a crashing application

Which port are you using, makes a difference to the answer? Do you have configASSERT() defined? 3 is not a valid value for configCHECKFORSTACK_OVERFLOW. Try setting it to 2.

Need general advice on debugging a crashing application

Hi Aiden, There is always a bit of guessing what someone’s program is supposed to do before understanding what might go wrong.
accidentally using xSemaphoreGive instead of xSemaphoreGiveFromISR within an ISR
These things must be absolutely clear and correct 🙂 Note that calling ‘xSemaphoreGiveFromISR()‘ from a ISR does not cause a task switch. The interrupt should end with code like this:
portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
having an interrupt priority level set higher than configMAXSYSCALLINTERRUPT_PRIORITY.
I suppose that you’ve noticed that a higher priority is often expressed in a lower number? When debugging with configASSERT() enabled, the priorities may be checked for you in port.c:
configASSERT( ucCurrentPriority >= ucMaxSysCallPriority );
Beside checking for stack overflows: make sure that your interrupts do not cause any corruption, protect the tasks against your interrupts.
as for now I am very confused
As a general rule: try systematic exclusion of parts of code. Maybe you want to show what happens in your ISR’s? Can you show some literal code? Regards.

Need general advice on debugging a crashing application

Hi, I’m using the PIC32MZ port that comes with the demo – V8.0.1 I have configASSERT defined but I am not entirely sure how to make use of it other than by checking parameter passes to new processes. I’ve tried 2 but it made no difference. No stack overflow was detected, the system just halted.

Need general advice on debugging a crashing application

Hi Hein, I will try to answer these in order. I am aware that giving the semaphore from an ISR does not cause a context switch on its own – here is an example of an ISR where I think I am following the correct procedures: void vDMA3Handler(void) { //Interrupt is called when a whole block has been moved portBASETYPE xHigherPriorityTaskWoken = pdFALSE; DCH3INTCLR=0x00ff00ff; IFS4CLR=IFS4DMA3IFMASK; //clear any pending DMA channel 3 interrupt //Set up the DMA channel for the next block //Initialize DMA3 to pack frames from UART3 AFEFrameWR++; if(AFEFrameWR==numAFEframes) {AFEFrameWR=0;} DCH3DSA=KVATOPA(&AFEFrames[AFEFrameWR][AFEHdrwid]); DCH3INTSET=1<<DCH3INTCHBCIEPOSITION; //DMA Channel 3 block done interrupt enable DCH3CONSET=1<<DCH3CONCHENPOSITION; //Enable channel 3 RX /unblock the task/ xSemaphoreGiveFromISR(xSemaphoreAFEFrameReady,&xHigherPriorityTaskWoken); /* See comment above the call to xSemaphoreGiveFromISR(). */ portEND_SWITCHING_ISR( xHigherPriorityTaskWoken ); } Regarding your statement that higher priorities are lower numbers this worries me a lot since I had been under the impression that higher priority numbers mean higher priority! is this true in all ports? On PIC chips the minimum processor/interrupt priority is 0 while the maximum is 7 – do FreeRTOS ports follow the convention of the hardware they run on or do they always have lower numbers as higher priority? Regarding your statement of ‘protect the tasks from the isrs’ I am not sure how to do this. For example, I was under the impression that I could access the OS objects such as queues, mutexes, semaphores and timers without having to disable interrupts – is this wrong?

Need general advice on debugging a crashing application

Hi Aiden, The FreeRTOS port will follow the priority number of the hardware, so if Microchip goes from 0 (lowest) to 7 (highest), that is order used in the source code. The FreeRTOS API’s are all interrupt-safe, they will use critical sections when needed. What I meant was to protect your own data (like AFEFrameWR) and resources against sudden (unexpected) changes from your interrupts. Regards.

Need general advice on debugging a crashing application

Hi Hein, Thanks for confirming those points. It’s nice to know that my mental model of how the OS should work matches with how it does work, but at the same time I’ve obviously still made a terrible mistake somewhere or it wouldn’t be crashing 🙂 I take care of variables like AFEFrameWR by issuing them with the ‘volatile’ qualifier, which in the compiler I use indicates that it is modified by interrupts so every time it is referenced it is first loaded from memory. That said, this particular variable is only used inside the ISR to keep track of where the next write should occur. What’s going on in the part of the code that seems to have trouble is that I have data streaming in over UART3 which goes via a DMA channel to fill a buffer. When the block transfer is complete (when the buffer is full) the DMA interrupt fires to direct the DMA to the next buffer, and emits a semaphore that unblocks a task. while(1) { //Wait for the next frame of AFE data xSemaphoreTake(xSemaphoreAFEFrameReady,portMAXDELAY); U4TXREG='C'; //Ship out the previous frame for processing/logging. Messtart=(unsigned char*)&AFEFrames[AFEFrameRD][0]; xQueueSendToBack(LFQueue, ( void * ) &Messtart, ( TickTypet ) 1 ); //Set the time stamp in the next set of AFE data //Convert it to a CPU time and week number if applicable //load these in to the next frame AFEFrameRD++; if(AFEFrameRD==numAFEframes) {AFEFrameRD=0;} AFEnextframetime=AFEnextframetime+(2016numAFEsetstocollect); //Since the cpu clocks are locked, we know exactly how many 20 MHz timer ticks until the next frame xSemaphoreTake(xMutexTimestampCPUT,portMAX_DELAY); //Lock the PPS table before proceeding flags=TimestampCPUT(AFEnextframetime, &TOWGPSps, &WNGPS); xSemaphoreGive(xMutexTimestampCPUT); if(flags==-1) { //error TOWGPSps=0; WNGPS=0; } AFEFrames[AFEFrameRD][8]=WNGPS&0x00FF; AFEFrames[AFEFrameRD][9]=WNGPS>>8; AFEFrames[AFEFrameRD][10]=(TOWGPSps)&0x00000000000000FFll; AFEFrames[AFEFrameRD][11]=(TOWGPSps>>8)&0x00000000000000FFll; AFEFrames[AFEFrameRD][12]=(TOWGPSps>>16)&0x00000000000000FFll; AFEFrames[AFEFrameRD][13]=(TOWGPSps>>24)&0x00000000000000FFll; AFEFrames[AFEFrameRD][14]=(TOWGPSps>>32)&0x00000000000000FFll; AFEFrames[AFEFrameRD][15]=(TOWGPSps>>40)&0x00000000000000FFll; AFEFrames[AFEFrameRD][16]=(TOWGPSps>>48)&0x00000000000000FFll; AFEFrames[AFEFrameRD][17]=(TOWGPSps>>56)&0x00000000000000FFll;
    }
This task inserts a pointer to the completed message in to a queue, puts a time stamp in to the next message buffer, and then suspends until the semaphore from the DMA channel is emitted again. If I comment out the line where the data is added to the queue the crashes seem to go away. If an element is added to an already full queue could this cause the crash? edit: There’s a * missing in the above post – I am using angle brackets and the word code to insert a code block. The tags seem to be recognized, but is there a different way to put in code snippits? I can’t find it in the formatting guidelines, just highlighting instructions.

Need general advice on debugging a crashing application

Hi Aiden, Something that comes to mind immediately is the following:
for( ;; )
{
    //Wait for the next frame of AFE data
    xSemaphoreTake(xSemaphoreAFEFrameReady, portMAX_DELAY);
    /* Now 1 frame will be handled */
}
How did you create this semaphore? Could it happen that two or more interrupts have occurred before the task is waking up? A SemaphoreHandle_t is actually a QueueHandle_t with a queue length of 1 (and an item size of 0). What happens if your CPU is very busy and interrupts come too fast? Can it happen that it takes a while before xMutexTimestampCPUT is available in: ~~~~~ // Lock the PPS table before proceeding xSemaphoreTake(xMutexTimestampCPUT, portMAX_DELAY); ~~~~~ If your CPU is fast enough to handle the stream of frames, you can do the following, use xQueueCreate() in stead of xSemaphoreCreateBinary() : ~~~~ – xSemaphoreAFEFrameReady = xSemaphoreCreateBinary(); + xSemaphoreAFEFrameReady = xQueueCreate( numAFEframes, + semSEMAPHOREQUEUEITEM_LENGTH ); ~~~~ ( make sure that numAFEframes has been initialised before creating the semaphore as above ) In this way there will be much more elastic, the consuming task may run a bit behind. Regards, Hein

Need general advice on debugging a crashing application

Hi there Hein, What’s the proper way to include code comments on this forum? Anyway, the Semaphore is declared using xSemaphoreHandle xSemaphoreAFEFrameReady; It is then initialized using the following two lines xSemaphoreAFEFrameReady=xSemaphoreCreateBinary(); xSemaphoreTake(xSemaphoreAFEFrameReady,0); // binary semaphores start ‘true’ So yes it is only a binary semaphore and missing an event would result in the loss of a frame – that said I’m comfortable with this: Each frame is 19.2 ms long, the CPU runs at 200 MHz, and the task which is waiting for that semaphore is the highest priority task in the system. Tasks 3 levels below it in priority are still executing properly (disk output for example is working) so losing frames due to waiting for this mutex seems unlikely. Even if an event here is missed there shouldn’t (?) be a stability concern: The buffers are configured such that the writing side (DMA interrupt) uses and updates the write index, while the task uses and updates the read index – even if they got completely out of sync there should be no penalty other than loss of frames. At least that’s my belief… can you spot a problem in the reasoning?

Need general advice on debugging a crashing application

What’s the proper way to include code comments on this forum?
About showing code in your posts: the easiest is probably to put a line with 5 tildes ~ before and after the literal code, as here: ~~~~~ ~~~~~ int foo() { // This is an example of posting literal code return 0; } ~~~~~ ~~~~~ I can not forecast what the consequences would be if your task misses a frame. You can test if the xSemaphoreGiveFromISR() in your ISR ever fails. And if it fails you can create the semaphore (queue) with a longer length as in :
xSemaphoreAFEFrameReady = xQueueCreate( numAFEframes, 0 );
Regards.

Need general advice on debugging a crashing application

Hi Hein, Thanks for the information on how to properly comment code snippets. I have looked in to the documentation on Queues and Semaphores and both seem to return pdFalse but otherwise fail safely. I will consider increasing the semaphores from a binary one to a counting one if I encounter data loss. The crashing issues turned out to be an unrelated DMA channel problem. Thanks for the help.

Need general advice on debugging a crashing application

I have looked in to the documentation on Queues and Semaphores and both seem to return pdFALSE but otherwise fail safely.
True, all these “send functions”:
BaseType_t xQueueSend()
BaseType_t xQueueSendFromISR()
BaseType_t xSemaphoreGive()
BaseType_t xSemaphoreGiveFromISR()
return pdFALSE in case the queue is full. If that happens in your application, it means that the previous frame has not been processed.
The crashing issues turned out to be an unrelated DMA channel problem.
I’m all curious. Is this something we (the readers of this forum) can learn from? ( I once had a mysterious problem with DMA under Linux. I used DMA to access the SD-card. It would work well for a whole day and then all of a sudden it would stop. After that, the SD-card could not be used until a cold reboot. I asked advice on forums, I asked support from Atmel, nobody could help me. I started looking my self: I found out that after the SD-card driver had crashed, there had always been a CPU frequency change during the DMA transfer. The transfer never finished and the channel became unusable. The reason for these frequency changes is to save power ) Regards.