CUB
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
Classes | Public Types | Static Public Methods | List of all members
cub::BlockDiscontinuity< T, BLOCK_THREADS > Class Template Reference

Detailed description

template<typename T, int BLOCK_THREADS>
class cub::BlockDiscontinuity< T, BLOCK_THREADS >

BlockDiscontinuity provides operations for flagging discontinuities within a list of data items partitioned across a CUDA threadblock.

discont_logo.png
.
Overview
The operations exposed by BlockDiscontinuity allow threadblocks to set "head flags" for data elements that are different from their predecessor (as specified by a binary boolean operator). Head flags are often useful for orchestrating segmented scans and reductions.
For convenience, BlockDiscontinuity exposes a spectrum of entrypoints that differ by:
  • How the first item is handled (always-flagged vs. compared to a specific block-wide predecessor)
  • Output (discontinuity flags only vs. discontinuity flags and a copy of the last tile item for thread0)
Template Parameters
TThe data type to be exchanged.
BLOCK_THREADSThe threadblock size in threads.
Usage Considerations
  • Assumes a blocked arrangement of elements across threads
  • Any threadblock-wide scalar inputs and outputs (e.g., tile_predecessor and last_tile_item) are only considered valid in thread0
  • After any operation, a subsequent __syncthreads() barrier is required if the supplied BlockDiscontinuity::SmemStorage is to be reused or repurposed by the threadblock
Performance Considerations
  • Zero bank conflicts for most types.
Examples
Example 1. Given a tile of 512 non-zero matrix coordinates (ordered by row) in a blocked arrangement across a 128-thread threadblock, flag the first coordinate element of each row.
#include <cub.cuh>
struct NonZero
{
int row;
int col;
float val;
};
struct NewRowOp
{
__device__ __forceinline__ bool operator()(
const NonZero& a,
const NonZero& b)
{
return (a.row != b.row);
}
};
__global__ void SomeKernel(...)
{
// Parameterize BlockDiscontinuity for the parallel execution context
typedef cub::BlockDiscontinuity<NonZero, 128> BlockDiscontinuity;
// Declare shared memory for BlockDiscontinuity
__shared__ typename BlockDiscontinuity::SmemStorage smem_storage;
// A segment of consecutive non-zeroes per thread
NonZero coordinates[4];
// Obtain items in blocked order
...
// Obtain the last item of the previous tile
NonZero block_predecessor;
if (threadIdx.x == 0)
{
block_predecessor = ...
}
// Set head flags
int head_flags[4];
BlockDiscontinuity::Flag(smem_storage, coordinates, block_predecessor, NewRowOp(), head_flags);

Public Types

typedef _SmemStorage SmemStorage
 The operations exposed by BlockDiscontinuity require shared memory of this type. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated shared memory or union'd with other types to facilitate shared memory reuse.
 

Static Public Methods

template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__
__forceinline__ void 
Flag (SmemStorage &smem_storage, T(&input)[ITEMS_PER_THREAD], FlagOp flag_op, FlagT(&flags)[ITEMS_PER_THREAD], T &last_tile_item)
 Sets discontinuity flags for a tile of threadblock items, for which the first item has no reference (and is always flagged). The last tile item of the last thread is also returned to thread0. More...
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__
__forceinline__ void 
Flag (SmemStorage &smem_storage, T(&input)[ITEMS_PER_THREAD], FlagOp flag_op, FlagT(&flags)[ITEMS_PER_THREAD])
 Sets discontinuity flags for a tile of threadblock items, for which the first item has no reference (and is always flagged). More...
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__
__forceinline__ void 
Flag (SmemStorage &smem_storage, T(&input)[ITEMS_PER_THREAD], T tile_predecessor, FlagOp flag_op, FlagT(&flags)[ITEMS_PER_THREAD], T &last_tile_item)
 Sets discontinuity flags for a tile of threadblock items. The last tile item of the last thread is also returned to thread0. More...
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__
__forceinline__ void 
Flag (SmemStorage &smem_storage, T(&input)[ITEMS_PER_THREAD], T tile_predecessor, FlagOp flag_op, FlagT(&flags)[ITEMS_PER_THREAD])
 Sets discontinuity flags for a tile of threadblock items. More...
 

Member Function Documentation

template<typename T , int BLOCK_THREADS>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_THREADS >::Flag ( SmemStorage smem_storage,
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op,
FlagT(&)  flags[ITEMS_PER_THREAD],
T &  last_tile_item 
)
inlinestatic

Sets discontinuity flags for a tile of threadblock items, for which the first item has no reference (and is always flagged). The last tile item of the last thread is also returned to thread0.

Assuming a blocked arrangement of elements across threads, flagsi is set non-zero for item inputi when scan_op(previous-item, inputi) is true (where previous-item is either inputi-1, or inputITEMS_PER_THREAD-1 in the previous thread). Furthermore, flagsi is always non-zero for input0 in thread0.

The last_tile_item is undefined in threads other than thread0.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary boolean functor type, having input parameters (const T &a, const T &b) and returning true if a discontinuity exists between a and b, otherwise false.
Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in]inputInput items
[in]flag_opBinary boolean flag predicate
[out]flagsDiscontinuity flags
[out]last_tile_item[thread0 only] The last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1)
template<typename T , int BLOCK_THREADS>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_THREADS >::Flag ( SmemStorage smem_storage,
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op,
FlagT(&)  flags[ITEMS_PER_THREAD] 
)
inlinestatic

Sets discontinuity flags for a tile of threadblock items, for which the first item has no reference (and is always flagged).

Assuming a blocked arrangement of elements across threads, flagsi is set non-zero for item inputi when scan_op(previous-item, inputi) is true (where previous-item is either inputi-1, or inputITEMS_PER_THREAD-1 in the previous thread). Furthermore, flagsi is always non-zero for input0 in thread0.

The last_tile_item is undefined in threads other than thread0.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary boolean functor type, having input parameters (const T &a, const T &b) and returning true if a discontinuity exists between a and b, otherwise false.
Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in]inputInput items
[in]flag_opBinary boolean flag predicate
[out]flagsDiscontinuity flags
template<typename T , int BLOCK_THREADS>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_THREADS >::Flag ( SmemStorage smem_storage,
T(&)  input[ITEMS_PER_THREAD],
tile_predecessor,
FlagOp  flag_op,
FlagT(&)  flags[ITEMS_PER_THREAD],
T &  last_tile_item 
)
inlinestatic

Sets discontinuity flags for a tile of threadblock items. The last tile item of the last thread is also returned to thread0.

Assuming a blocked arrangement of elements across threads, flagsi is set non-zero for item inputi when scan_op(previous-item, inputi) is true (where previous-item is either inputi-1, or inputITEMS_PER_THREAD-1 in the previous thread). For thread0, item input0 is compared against /p tile_predecessor.

The tile_predecessor and last_tile_item are undefined in threads other than thread0.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary boolean functor type, having input parameters (const T &a, const T &b) and returning true if a discontinuity exists between a and b, otherwise false.
Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in]inputInput items
[in]tile_predecessor[thread0 only] Item with which to compare the first tile item (input0from thread0).
[in]flag_opBinary boolean flag predicate
[out]flagsDiscontinuity flags
[out]last_tile_item[thread0 only] The last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1)
template<typename T , int BLOCK_THREADS>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
static __device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_THREADS >::Flag ( SmemStorage smem_storage,
T(&)  input[ITEMS_PER_THREAD],
tile_predecessor,
FlagOp  flag_op,
FlagT(&)  flags[ITEMS_PER_THREAD] 
)
inlinestatic

Sets discontinuity flags for a tile of threadblock items.

Assuming a blocked arrangement of elements across threads, flagsi is set non-zero for item inputi when scan_op(previous-item, inputi) is true (where previous-item is either inputi-1, or inputITEMS_PER_THREAD-1 in the previous thread). For thread0, item input0 is compared against /p tile_predecessor.

The tile_predecessor and last_tile_item are undefined in threads other than thread0.

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the supplied smem_storage is to be reused or repurposed by the threadblock.

Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary boolean functor type, having input parameters (const T &a, const T &b) and returning true if a discontinuity exists between a and b, otherwise false.
Parameters
[in]smem_storageShared reference to opaque SmemStorage layout
[in]inputInput items
[in]tile_predecessor[thread0 only] Item with which to compare the first tile item (input0from thread0).
[in]flag_opBinary boolean flag predicate
[out]flagsDiscontinuity flags

The documentation for this class was generated from the following file: