• Anthony Chang's avatar
    Allow distinct K0/K1 values for A/B block descriptor (#98) · 6d4450ef
    Anthony Chang authored
    
    
    * add gitignore
    
    * host tensor: allow generating sequentially increasing value in a given dimension
    
    * gridwise gemm v3r1: allow distinct K0/K1 values for A/B block descriptor
    
    - remove dangling header include
    - modify example gemm_xdl accordingly
    - infer KPack value from M/NPerXdl
    - device conv2d fwd: update parameters accordingly for the underlying gridwise gemm v3r1
    (API for conv2d fwd stays the same for now until we decide to expose individual K0s for activation and weight)
    
    * add LDS data dump utility
    
    * profiler: reflect API change for distinct K0/K1 for A/B matrices
    
    * profiler: add conflict-free LDS write FP16 kernel instances
    
    * fix accidental perf regression
    
    * address feedback; cosmetic changes
    
    * clang-format for new files
    
    * format
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    6d4450ef
.gitignore 415 Bytes