I agree that 8 is not 16. It is confusing. Could 2 NEON instructions be executed per clock?8 is still not 16.Could 4 multiply and 4 add make 8 total?
How do you get to 16 FLOPs per core per clock?
I assume each core has a NEON SIMD that can do 4 floating point multiply plus accumulate operations. That is 8 FLOPs per core per clock.
Your other thread
viewtopic.php?p=2278897#p2278897
seems related to this one. I'm linking it because that thread has a reply by a Raspberry Pi engineer which states NEON is faster than the GPU.
Weirdly CPU Monkey
https://www.cpu-monkey.com/en/igpu-broa ... eocore_vii
claims the VideoCore VII has 240 GFlops of 16 bit precision, which if correct (Fido is doubtful) seems much better than NEON.
Statistics: Posted by ejolson — Thu Dec 19, 2024 4:45 pm