Task switching timings

Hi! I’m new to FreeRTOS and am trying to understand if I did something wrong… I have STM32F103 MCU at 72MHz. The compiler is GCC ( arm-none-eabi-gcc.exe (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204] ) For timings I use DWT->CYCCNT, which gives count of HCLK clocks since MCU reset. It gives perfect 100% repeatable result. I have two tasks. The first is idle (idle) task created by FreeRTOS with low priority. The second is normal priority task (my) created before scheduler started. So after scheduler is started, my task gets run because it has higher priority and ilde task has no chance to run before I block my task. I put reading of DWT->CYCCNT in vApplicationIdleHook and then just hang it in endless loop. I do xSemaphoreTake of not given binary semaphore (wherefore it blocks my task and forces context switch to idle task). I also put reading of DWT->CYCCNT just before xSemaphoreTake. The two readings differ by 1130 CPU clocks. As I understand, the following happens:
  1. xSemaphoreTake checks the state of semaphore and finds it is not given
  2. empty list inside of semaphore is appended with ID of my task
  3. scheduler finds next task to unblock (it’s idle task)
  4. scheduler sets PendSV bit
  5. PendSV handler switches the task context
Is it normal for FreeRTOS to spend that much time doing what I listed? Do I miss something? BTW I measured how much portYIELD() takes to switch back to the same task (as no other task is scheduled). It is 170 clocks.

Task switching timings

BTW I measured how much portYIELD() takes to switch back to the same task (as no other task is scheduled). It is 170 clocks.
That sounds about right. The numbers here are for the context switch only: https://freertos.org/FAQMem.html#ContextSwitchTime and, although shorter than your measured time (84 clocks), that will probably just be due to different configurations. Regarding the semaphore times. The semaphores are relatively heavy weight objects as they are feature rich – you can have any number of tasks blocked to give, any number blocked to take, and the blocked tasks are in priority order, etc. The yield, which you listed separately, occurs within that too. In most cases a direct to task notification can be used for a much lighter weight and faster signalling mechanism.

Task switching timings

That sounds about right. The numbers here are for the context switch only: https://freertos.org/FAQMem.html#ContextSwitchTime and, although shorter than your measured time (84 clocks), that will probably just be due to different configurations.
I agree. 170 clocks seem reasonable. 84 should be added with 1. set PENDSV bit ~5 clocks 2. enter/leave interrupt 12+12 clocks 3. vTaskSwitchContext execution About direct task notifications: I substituted xSemaphoreTake with ulTaskNotifyTake(pdTRUE, portMAX_DELAY) Now it takes 432 clocks. This also seems reasonable. But still… What is the difference between semaphore and notification in my case? From what I already described
  1. xSemaphoreTake checks the state of semaphore and finds it is not given
  2. empty list inside of semaphore is appended with ID of my task
  3. scheduler finds next task to unblock (it’s idle task)
  4. scheduler sets PendSV bit
  5. PendSV handler switches the task context
only p2 differs. And that takes 1130 – 432 ~ 700 clocks! BTW The code is compiled with -o2 option and configASSERT undefined configUSEPORTOPTIMISEDTASKSELECTION 1 trace and statistic facilities turned off

Task switching timings

The semaphore includes much more logic as it can be used by multiple tasks at once. For example, consider the case where a task is unblocked because a semaphore becomes available, but before the task can run, a higher priority task takes the semaphore again. Now, when the unblocked task does get a chance to run, it finds the semaphore unavailable, has to recalculate its block time to take into account the length of time that passed since the function was called, and then re-block. I know that is not the path your code is taking, but you still need to go through the if()/else()/loop calls to determine that.

Task switching timings

You described a really strange case. I would say that if ownership of samaphore’s ‘coin’ already assigned to low priority task, it should not be taken back. This mechanism is very much like Cortex-M Late-arriving feature, but with big difference. Late-arriving is just optimization, which prevents MCU from doing unnecessary work. But your mechanism changes sequence of events. I understand it tries to make as much as possible to allow higher priority task to work, but It still seems strange to me. Anyway, thanks for your explanation and suggestion. I’ll try to implement synchronization by direct task notifications.

Task switching timings

The way to think about it is that when the semaphore was given, the highest priority task waiting on the semaphore is woken because the semaphore is AVAILABLE, not that it has been given the semaphore. the task still needs to run to actually take the semaphore. This says that the take always takes place in the context of the task requesting the semaphore (which might simplify some code, especially tracing code). This means that if the task being woken wasn’t the highest priority ready task, some other higher priority task has the chance to take the semaphore before the originally woken task. The giving task doesn’t give the coin to the other task, it just gives it back to the semaphore, so it it ready to be taken. This is a very different context than the interrupt case you metioned. In the interrupt case, the low priority interrupt was already higher priority then the execution context (or it wouldn’t occur), while in the semaphore give case, the give was likely waking a task of a lower priority then itself (otherwise it would have immediately been changed to and taken the semaphore).

Task switching timings

Richard Damon, I understand the concept. I’m just wondering why it is realized in FreeRTOS. Giving the coin to task in kernel mode would slow down the kernel part by several CPU clocks, but would free user mode part of several hundred clocks. Why this trade-off was chosen? And I understand that semaphore is unnecessary complex for interrupt case. As I understand, the only alternative to it is direct task notifications, but they have negative consequences. They force you to design wating part of code tightly bound for different components. This is very much like socket reactor concept for Berkley sockets. The key part of it is the code around single place with select().

Task switching timings

Richard, I understand the concept. I’m just wondering why it is realized in FreeRTOS. Giving the coin to task in kernel mode would slow down the kernel part by several CPU clocks, but would free user mode part of several hundred clocks. Why this trade-off was chosen?
..because it is the most true to the scheduling policy and reduces the risk of unbounded priority inversion.

Task switching timings

My feeling on this is that the current system is likely the simplest. The sequence that happens when you give a semaphore and when you take a semaphore are always uniform. The give operation doesn’t need to do different things to the semaphore based on the presence of a task waiting on it. (It does do some additional steps to the highest priority task if there is on, but its actions on the semaphore are uniform). The semaphore is always aquired by the same sequence of code. This make it easier to ‘prove’ that the code is correct, and since one variant of the code (SafeRTOS) has gone through a certification, being able to do that is useful. You also talk about ‘kernal mode’ and ‘user mode’, but there is no distinction. Most uses of FreeRTOS are on simple processors which don’t have any such distinction. Even if run on a processor with protection, the entire semaphore code would be run in protected mode.

Task switching timings

My feeling on this is that the current system is likely the simplest.
Got it. Thanks. Reliability things are expensive.
You also talk about ‘kernal mode’ and ‘user mode’, but there is no distinction.
I think I messed with the therms… by ‘kernel mode’ I meant mode when all OS objects are blocked from modification and interrupts are disabled. And also I’ve read somewhere on freertos.org that there is a policy to not have loops of unpredictable size in kernel mode, which is acceptable in user mode when interrupts are enabled. I’ve made another experiment with direct task notifications. The scenario is:
  1. Start hardware operation
  2. Make timestamp 0. Call xTaskNotifyWait
  3. Make timestamp 1 upon entry of idle thread
  4. Get interrupt from hardware. Make timestamp 2. Call xTaskGenericNotifyFromISR
  5. Make timestamp 3 upon return from xTaskGenericNotifyFromISR
  6. Make timestamp 4 upon return from xTaskNotifyWait
Results: 1. timestamp 0 – timestamp 1 459 clocks 2. timestamp 2 – timestamp 3 238 clocks 3. timestamp 3 – timestamp 4 275 clocks So this sums to 972 clocks and involves:
  1. Marking current task as waiting for notification
  2. Finding highest priority non-blocked task. Set PentSV bit
  3. Context switch (170 clocks)
  4. Compare notified task’s priority to current task’s. Set PentSV bit
  5. Context switch (170 clocks)
So there’s still something I don’t understand.
  1. Why p1 + p2 takes 289 clocks? Especially considering port-optimized task switch which finds non-blocked task very fast?
  2. Why p4 takes 238 clocks? The task to wake is already known…
  3. And why 100 clocks added to context switch here?
Is there any other mechanism for interrupt scenario in FreeRTOS with less overhead? BTW I’m stuck with FreeRTOS 9.0,0 because it is the latest in Stm32CubeMx. Is there any imrovements in v10 regarding performance?