When the number of elements can be held in local memory, it would make sense to do a single-pass solution. For scanning it may also make sense when only a few iterations within a single workgroup are required, rather than suffer the overheads of 3 passes.
Reported by: bmerry
Original Ticket: clogs/tickets/16