I do not have one "implementation" but have been trying with different approaches that all delivered under 50% of memory bandwith... I guess if anyone can purpose a solution should be from scratch... The problem is that all approaches I tried end up generating unpredictable branches that do not allow the CPU to optimally keep loading text from memory.
The problem should be equivalent to: https://www.reddit.com/r/simd/comments/1hmwukl/mask_calculat...
Falvyu's and bremac's solution seems to be the best.
wheres the code?...have a look at codereview[5], the whole site is geared for this kind of challenges
[5] codereview.stackexchange.com
I do not have one "implementation" but have been trying with different approaches that all delivered under 50% of memory bandwith... I guess if anyone can purpose a solution should be from scratch... The problem is that all approaches I tried end up generating unpredictable branches that do not allow the CPU to optimally keep loading text from memory.
https://godbolt.org/z/3YMbaeEGh
One approach....