CUB
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
Classes | Public Types | List of all members
cub::BlockExchange< T, BLOCK_THREADS, ITEMS_PER_THREAD > Class Template Reference

Detailed description

template<typename T, int BLOCK_THREADS, int ITEMS_PER_THREAD>
class cub::BlockExchange< T, BLOCK_THREADS, ITEMS_PER_THREAD >

BlockExchange provides operations for reorganizing the partitioning of ordered data across a CUDA threadblock.

transpose_logo.png
.
Overview
BlockExchange allows threadblocks to reorganize data items between threads. More specifically, BlockExchange supports the following types of data exchanges:
Template Parameters
TThe data type to be exchanged.
BLOCK_THREADSThe threadblock size in threads.
ITEMS_PER_THREADThe number of items partitioned onto each thread.
Algorithm
Threads scatter items by item-order into shared memory, allowing one item of padding for every memory bank's worth of items. After a barrier, items are gathered in the desired arrangement.
raking.png
A threadblock of 16 threads reading a blocked arrangement of 64 items in a parallel "raking" fashion.
Usage Considerations
  • After any operation, a subsequent __syncthreads() barrier is required if the supplied BlockExchange::SmemStorage is to be reused or repurposed by the threadblock
Performance Considerations
  • Proper device-specific padding ensures zero bank conflicts for most types.

Public Types

typedef SmemStorage SmemStorage
 The operations exposed by BlockExchange require shared memory of this type. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated shared memory or union'd with other types to facilitate shared memory reuse.
 

Static Public Methods

Transpose exchanges
static __device__
__forceinline__ void 
BlockedToStriped (SmemStorage &smem_storage, T items[ITEMS_PER_THREAD])
 Transposes data items from blocked arrangement to striped arrangement. More...
 
static __device__
__forceinline__ void 
StripedToBlocked (SmemStorage &smem_storage, T items[ITEMS_PER_THREAD])
 Transposes data items from striped arrangement to blocked arrangement. More...
 
Scatter exchanges
static __device__
__forceinline__ void 
ScatterToBlocked (SmemStorage &smem_storage, T items[ITEMS_PER_THREAD], unsigned int ranks[ITEMS_PER_THREAD])
 Exchanges data items annotated by rank into blocked arrangement. More...
 
static __device__
__forceinline__ void 
ScatterToStriped (SmemStorage &smem_storage, T items[ITEMS_PER_THREAD], unsigned int ranks[ITEMS_PER_THREAD])
 Exchanges data items annotated by rank into striped arrangement. More...
 

Member Function Documentation

template<typename T , int BLOCK_THREADS, int ITEMS_PER_THREAD>
static __device__ __forceinline__ void cub::BlockExchange< T, BLOCK_THREADS, ITEMS_PER_THREAD >::BlockedToStriped ( SmemStorage smem_storage,
items[ITEMS_PER_THREAD] 
)
inlinestatic

Transposes data items from blocked arrangement to striped arrangement.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in,out]itemsItems to exchange, converting between blocked and striped arrangements.
template<typename T , int BLOCK_THREADS, int ITEMS_PER_THREAD>
static __device__ __forceinline__ void cub::BlockExchange< T, BLOCK_THREADS, ITEMS_PER_THREAD >::StripedToBlocked ( SmemStorage smem_storage,
items[ITEMS_PER_THREAD] 
)
inlinestatic

Transposes data items from striped arrangement to blocked arrangement.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in,out]itemsItems to exchange, converting between striped and blocked arrangements.
template<typename T , int BLOCK_THREADS, int ITEMS_PER_THREAD>
static __device__ __forceinline__ void cub::BlockExchange< T, BLOCK_THREADS, ITEMS_PER_THREAD >::ScatterToBlocked ( SmemStorage smem_storage,
items[ITEMS_PER_THREAD],
unsigned int  ranks[ITEMS_PER_THREAD] 
)
inlinestatic

Exchanges data items annotated by rank into blocked arrangement.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in,out]itemsItems to exchange
[in]ranksCorresponding scatter ranks
template<typename T , int BLOCK_THREADS, int ITEMS_PER_THREAD>
static __device__ __forceinline__ void cub::BlockExchange< T, BLOCK_THREADS, ITEMS_PER_THREAD >::ScatterToStriped ( SmemStorage smem_storage,
items[ITEMS_PER_THREAD],
unsigned int  ranks[ITEMS_PER_THREAD] 
)
inlinestatic

Exchanges data items annotated by rank into striped arrangement.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in,out]itemsItems to exchange
[in]ranksCorresponding scatter ranks

The documentation for this class was generated from the following file: