• Anthony Chang's avatar
    Standalone sweep once softmax kernel w/ ckProfiler (#295) · 93c99f3d
    Anthony Chang authored
    * use 'sweep once' softmax kernel where applicable
    
    * threadwise copy's dst buffer can specify invalid element value
    
    * add int8 in/out float compute softmax support
    
    give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error
    
    * format
    
    * softmax inherits DeviceNormalization
    
    * softmax profiler stub
    
    * tighten up reference softmax interface
    
    * example prints tensor dimension
    
    * add fp32 to softmax profiler
    
    * rename header
    
    * hook with ckProfiler
    
    * format
    
    * resolve merge conflict
    
    * resolve merge conflicts
    
    * update normalization profiler help string
    
    * resolve conflict
    
    * typo
    
    * remove residual
    
    * softmax profiler: address feedback
    
    * test for mixed precision input/output
    
    * fully qualify ck::math::isnan
    
    * add comment for device normalization interface
    
    * revise wording
    
    * constness for alpha/beta scaler pointer
    93c99f3d
softmax_blockwise.cpp 8.85 KB