Commits · conv_bwd_data_bias_relu · liyinrong / composable_kernel

18 Sep, 2022 13 commits
- clean · 827d2194
  Chao Liu authored 2 years ago
  
  827d2194
- clean · 44ad1e37
  Chao Liu authored 2 years ago
  
  44ad1e37
- clean · 0d83a02e
  Chao Liu authored 2 years ago
  
  0d83a02e
- clean up cmake · 246af536
  Chao Liu authored 2 years ago
  
  246af536
- reorg file · 10c90734
  Chao Liu authored 2 years ago
  
  10c90734
- refactor · 0d9416b9
  Chao Liu authored 2 years ago
  
  0d9416b9
- update conv fwd's bias impl · 2c52c81f
  Chao Liu authored 2 years ago
  
  2c52c81f
- Merge remote-tracking branch 'origin/develop' into conv_bwd_data_bias_relu · ecbeba88
  Chao Liu authored 2 years ago
  
  ecbeba88
- refactor · af0d03d3
  Chao Liu authored 2 years ago
  
  af0d03d3
- adding conv bwd data multiple d · 903da525
  Chao Liu authored 2 years ago
  
  903da525
- adding conv bwd data multiple d · 9890b2b7
  Chao Liu authored 2 years ago
  
  9890b2b7
- adding conv bwd data multiple d · 65fcde06
  Chao Liu authored 2 years ago
  
  65fcde06
- adding conv bwd data multiple d · 7f5e0eb7
  Chao Liu authored 2 years ago
  
  7f5e0eb7
17 Sep, 2022 2 commits
- adding conv bwd data multiple d · 4a8a9639
  Chao Liu authored 2 years ago
  
  4a8a9639
- adding conv bwd data multiple d · 44ec7602
  Chao Liu authored 2 years ago
  
  44ec7602
16 Sep, 2022 2 commits
- adding conv bwd data multiple d · d24eff2d
  Chao Liu authored 2 years ago
  
  d24eff2d
- disable print for group conv multiple D (#421) · 43c898f6
  Chao Liu authored 2 years ago
  
  43c898f6
15 Sep, 2022 3 commits
- refactor · 3282885d
  Chao Liu authored 2 years ago
  
  3282885d
- refactor · b67ea467
  Chao Liu authored 2 years ago
  
  b67ea467
- adding conv_bwd multiple d · f4d0d22b
  Chao Liu authored 2 years ago
  
  f4d0d22b
14 Sep, 2022 3 commits

batched_gemm + multiple_d + gemm + multiple_d (#394) · 370efa6c

ltqin authored 2 years ago


* refactor

* start

* add device gemm file

* add BatchStrideD0

* add stridd0

* add gridwise file

* add d0 parameters to gridwise gemm

* add c layout transformer

* add d0 threadwise copy

* init kernel

* init kernel

* regular code

* nm desc put to out

* kernel parameter can not use reference

* host add bias+gelu

* run right for bias+gelu

* change AddFastGelu into another file

* interface add d1 bias parameters

* add d1 parameter to argument

* add d1 parameter to gridwise

* first all code,not verify

* gelu change to relu and GetElementSpaceSize bug

* add instance

* start add to ckprofiler

* ckprofiler finish code

* change input parameter for ckProfiler

* fix host bias+gelu bug

* show help for ckProfiler

* fix bug for lunch kernel ignore parametes

* add pad and fix about bug

* mutiple d0

* add dynamic d0_element_op

* change profiler and  instance to mutiple d0

* example have 2 d0

* remove some comments not using

* change 2 d0 have self  parameters

* change d element_op name

* change class name(multiple_d)

* fix bug

* fix bug that don't find file

* update profiler

* refactor

* update profiler

* clean

* revert example change

* add gon layout

* optimize parameter for gno

* add gon to gemm+gemm

* change helping input parameters

* change to GemmPadder_v2

* using ForEach

* fix gb_per_sec
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: ltqin <letaoqin@amd.com>

370efa6c

adding conv_bwd multiple d · 93dc71e7
Chao Liu authored 2 years ago

93dc71e7
adding conv_bwd multiple d · 92990a17
Chao Liu authored 2 years ago

92990a17

13 Sep, 2022 1 commit

Upgrade the OS and ROCM versions. (#411) · b22ebd44

Illia Silin authored 2 years ago

* upgrade the OS and ROCM versions in CK docker

* add cxx flags to link code with rocm5.2 and ck-9110 compiler

* rename the docker image

* run ONNX gemms using init=1

b22ebd44

12 Sep, 2022 2 commits
- adding conv_bwd_data multile_d · a9c78409
  Chao Liu authored 2 years ago
  
  a9c78409
- adding conv_bwd_data multile_d · 4d85cd39
  Chao Liu authored 2 years ago
  
  4d85cd39
10 Sep, 2022 5 commits
- fix instances · d5fc5357
  shaojiewang authored 2 years ago
  
  d5fc5357
- add more instances · c97808d0
  shaojiewang authored 2 years ago
  
  c97808d0
- merge develop · 44d080c1
  shaojiewang authored 2 years ago
  
  44d080c1
- format example file · 0cbbeb26
  shaojiewang authored 2 years ago
  
  0cbbeb26
- clang-format · f829d54b
  shaojiewang authored 2 years ago
  
  f829d54b
09 Sep, 2022 2 commits

add more instances in example and get right results · f85fa4da
shaojiewang authored 2 years ago

f85fa4da

embedding fuse layernorm (#405) · efd1d257

carlushuang authored 2 years ago


* add gridwise/device sparse embedding

* update code

* update code

* remove useless makefile

* code fix

* workable

* work properly

* emb add

* add more instance

* format

* remove useless code

* fix format

* fix clang-tidy

* clean

* fix a compile error
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

efd1d257

08 Sep, 2022 1 commit

Fix gemm-softmax-gemm-permute padding cases (#409) · d6709dc3

Anthony Chang authored 2 years ago

* fix example; make padding on by default in example; fix argument checks

* fix Gemm1KPacK which has since regressed from PR #399

d6709dc3

07 Sep, 2022 1 commit

Add stderr to QA logfiles, process splitK and ONNX gemm kernels (#402) · ce74cea4

Illia Silin authored 2 years ago

* add processing for the onng_gemm and splitK_gemm

* add profile_onnx_gemm.sh

* add stderr to logfiles, add splitK and onnx gemm parsing

* enable splitK gemm wresults posting to db

ce74cea4

06 Sep, 2022 4 commits

Fused attention instances & padding tests (#395) · 868e5c55

Anthony Chang authored 2 years ago

* modify comment

* trim unnecessary check

* add gemm spec in kernel name

* add TNTT gemm_gemm + atten kernel instances

* refactor attention padding to better fit in unit tests

This streamlines usage where "ResetNaNToMinusInf" is now hidden from user facing device op.
Also added compile-time conditionals that load OOB value as NaN only after padding is enabled

* add adhoc padding test for atten

* shrink input value range for attention kernel validation to avoid occasional error by 1e-3

Still unsure whether this kind of deterministic floating point accurary issue is expected
or not. May want to try exact same approach as the GPU kernel in the host reference
GEMM+Softmax+GEMM function to see if the accuracy discrepancy goes away. Until then,
shrink the input value range as it is less likely to produce errors of around ~1e-3.

* attention kernel proper granular padding for all 4 dims

* IsSupportedArgument checks

* test more padded cases

* block PadK specialization in attention kernels

* workaround clang crash for gfx908

(gfx908 only) workaround for compiler crash in fused kernels on mainline #9110; #10738 seems ok
error message was "fatal error: error in backend: Error while trying to spill VGPR0 from class
VGPR_32: Cannot scavenge register without an emergency spill slot!"
this fall back to less ideal way of handle NPadding in fused attention kernel

* comment out kernels giving wrong results on MI100; MI200 doesn't seem affected

868e5c55

GemmGemm TNNT instances (#399) · fe52c94c

Anthony Chang authored 2 years ago

* add gemm_gemm TNNT instance

* sanitize Gemm1KPack

* disable instances that failed validation on mi100

fe52c94c

Softmax client example (#396) · 3da5c19e

Adam Osewski authored 2 years ago


* Update Softmax device operation interface.

* Update ckProfiler.

* Update Softmax UT.

* Update example.

* Client example.

* Clang format
Co-authored-by: Adam Osewski <aosewski@amd.com>

3da5c19e

refine device level launch kernel code · 24fbc9db
shaojiewang authored 2 years ago

24fbc9db

05 Sep, 2022 1 commit
- have a first version that produce a right result · d05518db
  shaojiewang authored 2 years ago
  
  d05518db

GitLab

Menu