I am not sure what you mean here. Having delved deeply in Apple's M3 multi-threaded HPC coding, my point was that unlike on Intel, where the atomic operations are fairly strongly consistent and the std::atomic must mostly impede the compiler and the out-of-order operations from reordering load and stores, on ARM, writing to memory does not automatically invalidate cache values in other cores' cache. Because of that, one has to dig deep in the meaning of the acquire/release/relaxed/consistent memory model concepts to fully optimize for minimum synchronization costs.Superscalar (MIMD) on multithreading (MIMD)? This is an anti-pattern. There is no chance of virtualization or factories here. Is Apple NUMA CMT on SMP OS? What a single threaded nightmare. (This is more Einstein than Newton.)The ARM docs state "All store-release operations are multi-copy atomic". So it seems that atomic operations exist and use of such operations have the same caveats as on Apple M?
@kilograham has answered my question and, assuming I understand what he said correctly, one can now use `std::atomic` to communicate between cores, both on RP2040 and RP2350, though I expect that performance is likely higher on RP2350 for sizes <= 32 (which is what my use case will be).
Statistics: Posted by FunMiles — Fri Dec 06, 2024 2:55 pm