• Anthony Chang's avatar
    Fused attention (#345) · cac014f1
    Anthony Chang authored
    * initial stub for gemm_gemm_xdl_cshuffle
    
    * set up example code
    
    * compiles
    
    * prevent integer overflow
    
    * harmonize interface between ref_gemm and ref_batched_gemm
    
    * batched_gemm_gemm
    
    * fix example
    
    * host tensor gen: diagonal pattern in lowest two-dimensions only
    
    * make c descriptors containing only integral constants
    
    * clean up
    
    * add BlockwiseGemmXdlops_v2 while exploring an unified approach
    
    * implement proper interface
    
    * tidy up example
    
    * fix compilation warnings
    
    * coarsely controlled 2nd gemm padding
    
    * remove rocm-cmake's hard requirement for certain revision
    
    * clang-format
    
    * resolve merge conflict
    
    * fix compilation error on gfx10
    
    * adds acc0 elementwise op to interface
    
    * attention host validation
    
    * add blockwsie softmax v1
    
    * iteratively update softmax+gemm
    
    * transpose both gemm0 and gemm1 xdl output so as to avoid broadcasting softmax max/sum
    
    * add init method for easier debugging
    
    * do away with manual thre...
    cac014f1
CMakeLists.txt 9.05 KB