FreeRTOS+FAT using DMA with cache on

Hello, I’m currently using FreeRTOS+FAT with new NXP RT1050 processor. My configuration is: – FreeRTOS+FAT with FS on eMMC – read/write transfers are using DMA transfers – cache ON How should I manage cache coherency when using FAT ? I created custom malloc/free which allocates memory from ext SDRAM with cache turned off. This seems to resolve problem only partially. Standard malloc/free on my architecture is allocating memory from another ext SDRAM segment with cache turned on so for example if I allocate two buffers, bufferIN and bufferOUT using standards malloc and then use them during fffread/fffwrite I expect that will still have coherency problems as DMA will copy/read memory from eMMC directly to those two buffers. Am I right ? If yes what are possible solutions ? Will I be forced to use momery regions without cache when working with FreeRTOS+FAT ? I can also manage cache coherency under “xParameters.fnWriteBlocks and xParameters.fnReadBlocks” functions but IMO it will be better to use memory segment without cache for DMA transfers. Thanks in advance Mateusz Piesta

FreeRTOS+FAT using DMA with cache on

Hi Mateusz, as you have seen, you can provide special functions for memory allocation in FreeRTOSFATConfig.h: ~~~ #define ffconfigMALLOC( aSize ) uncachedmalloc( aSize ) #define ffconfigFREE( apPtr ) uncachedfree( apPtr ) ~~~ I am not an expert on CPU and caching, but here are my thoughts and recommendations: Cached memories work fine, until you access that memory with a DMA controller, either for reading or writing. A DMA controller bypasses the caching mechanism and writes and reads directly from the physical memory. Before DMA reads memory, make sure that any changes to the memory regions involved are being flushed ( clean the cache ). After DMA has written to memory, make sure that the regions are being invalidated. This will force the CPU to update the data cache. While DMA works with memory, do not access that memory ( region ) from the CPU. Make sure that the size of the memory passed to DMA is always a multiple of a cache-line ( e.g. 32 bytes ). Even if only 16 bytes are being used, make sure that the other 16 bytes won’t be accessed by the CPU. I once made an error with this: I declared network buffers in a FreeRTOS+TCP driver, and gave these buffers a 16-byte alignment. As a consequence, Buffer-2 got changed unwillingly because it had cache overlap with Buffer-1.

FreeRTOS+FAT using DMA with cache on

Hi Hein, I came to the same conclusion. I’ve managed to get it to work by using the same technique you described. I am managing cache coherency exacly the same as you proposed. There is one more thing that bothers me, do buffers passed to fffwrite/fffread functions should be aligned or FAT will handle it ? Thank you for clarification

FreeRTOS+FAT using DMA with cache on

I am managing cache coherency exacly the same as you proposed.
Good to hear that. FreeRTOS+FAT uses buffers of 512 bytes each. The application provides one big buffer as FF_CreationParameters_t::pucCacheMemory, which is interpreted as an array of 512-byte blocks. So if you make sure that pucCacheMemory is well-aligned, all other blocks will be well-aligned as well. In most cases, ff_fwrite() and ff_fread() will be called with a +FAT buffer as a parameter, and thus cached-memory in your case. But there are two exceptions:
  • if you read or write “entire aligned sectors”: the user buffer will be passed directly to the access functions, thus bypassing the +FAT buffers.
  • If you use ffconfigOPTIMISE_UNALIGNED_ACCESS, a file buffer pxFile->pucBuffer will be passed to the access functions. pucBuffer was created with ffconfigMALLOC().
In the +TCP driver for Zynq, I called the cache code conditionally: ~~~ if( ucIsCachedMemory( pucEthernetBuffer ) != 0 ) { XilDCacheFlushRange( pucEthernetBuffer, xDataLength ); } … if( ucIsCachedMemory( pucEthernetBuffer ) != 0 ) { XilDCacheInvalidateRange( pucEthernetBuffer,xDataLength ); } ~~~ Maybe you could test the memory types in your ff_fwrite() and ff_fread() as here-above?

FreeRTOS+FAT using DMA with cache on

Ah okay if I understand you correctly I can pass aligned buffer placed in non cacheable region to: pxParameters->pucCacheMemory = ‘AlignedNonCacheableBuffer of size xIOManagerCacheSize’ and everything should be fine cause +FAT executed read/write operations through this cache buffer ? Will it be possible to use cached_malloc/free then ? I see that +FAT allocated some memory for different structures but they are not used with read/write operation directly. Unfortunately I don’t understand first case that you’ve mentioned. What you mean by “entire aligned sectors” ? You mean writing 4KB of aligned data ? For now ffconfigOPTIMISEUNALIGNEDACCESS is set 0 so no problem in this case but I will test in future. I think that your solution with testing for cached/non-cached buffer operation will do the trick, I just need confirmation if it safe to use cached_malloc/free in this scenario 🙂

FreeRTOS+FAT using DMA with cache on

Hi Mateusz, In my previous post I made a typo. Where I wrote: ~~~ fffwrite() and fffread() ~~~ I actually meant the implementation in the driver, often called: ~~~ prvFFRead() and prvFFWrite() ~~~ I’m sorry if that confused any one.
Ah okay if I understand you correctly I can pass aligned buffer placed in non cacheable region to: pxParameters->pucCacheMemory = ‘AlignedNonCacheableBuffer of size xIOManagerCacheSize’ and everything should be fine cause +FAT executed read/write operations through this cache buffer?
If the memory that you supply is non-cacheable, it must have an alignment that DMA can work with. If the memory is cacheable, its alignment must fit both with DMA and the D-CACHE. And also: when the first block is well-aligned, the subsequent blocks will also have a proper alignment because the block-size is 512 bytes.
Will it be possible to use cached_malloc/free then ? I see that +FAT allocated some memory for different structures but they are not used with read/write operation directly.
With the exception of pxFile->pucBuffer ( which you do not use for now ).
Unfortunately I don’t understand first case that you’ve mentioned. What you mean by “entire aligned sectors” ? You mean writing 4KB of aligned data ?
I mean for instance this kind of access: ~~~ attribute ((aligned (32))) uint8_t pucBuffer[ 16 * 512 ];
FF_Read( pxFile, 1, n * 512, pucBuffer );
FF_Write( pxFile, 1, n * 512, pucBuffer );
~~~ In these cases, the +FAT caching buffers will be by-passed, and pucBuffer will be passed directly to the driver functions prvFFRead() and prvFFWrite().
For now ffconfigOPTIMISE_UNALIGNED_ACCESS is set 0 so no problem in this case but I will test in future.
As you can guess, the fastest file i/o is done when you read or write in multiple blocks of 512 bytes. This is also favourable for memory cards: contiguous reads and writes are a lot faster. The ffconfigOPTIMISE_UNALIGNED_ACCESS option is only useful if you plan to read or write in small quantities.
I think that your solution with testing for cached/non-cached buffer operation will do the trick, I just need confirmation if it safe to use cached_malloc/free in this scenario 🙂
In case you have doubts, there is some good testing software in the original +FAT release It is in the old 160919_FreeRTOS_Labs.zip: ~~~ FreeRTOS-PlusDemoCommonFreeRTOSPlusFATDemosCreateAndVerifyExampleFiles.c FreeRTOS-PlusDemoCommonFreeRTOSPlusFATDemostestffstdiotestswithcwd.c ~~~

FreeRTOS+FAT using DMA with cache on

Looks like it works, below are steps I performed: – switched +FAT to use cacheablemalloc/free – made sure that cacheablemalloc/free buffers are 32-bytes aligned. I’m using custom memory allocator based on heap4.c so it’s actually easy to align buffers allocated with malloc – I created +FAT pucCacheMemory non-cached segment which is 32-bytes aligned and used it instead of malloced one – added simple isCachedMem function which cheks if address passed to prvRead/Write functions is withing cached-memory or not – in prvRead if memory is cached I perform DCacheInvalidate AFTER DMA transfer completed – in prvWrite if memory is cached I perform DCache_Clean/Flush BEFORE starting DMA transfer – finally turn on cache 😀 I also noticed that when using fffwrite function pointer to memory buffer is passed straight into to prvWrite function. Therefore you have to make sure that if memory block passed to fffwrite function is allocated dynamically you have to make sure that it is 32-bytes aligned and manage cache coherency. I hope someone find it useful and Hein thank you very much for meaningful discussion. As always you are very helpful 🙂 Best Regards Mateusz Piesta