Commits · refactor_splitK · liyinrong / composable_kernel

09 May, 2022 2 commits
- clean · 91b19b3c
  Chao Liu authored 3 years ago
  
  91b19b3c
- refactor profiler · 4511f877
  Chao Liu authored 3 years ago
  
  4511f877
08 May, 2022 4 commits
- clean up · 519b6aaf
  Chao Liu authored 3 years ago
  
  519b6aaf
- clean up · c5b5d2e4
  Chao Liu authored 3 years ago
  
  c5b5d2e4
- clean up · 9685fed2
  Chao Liu authored 3 years ago
  
  9685fed2
- clean up · 11b83234
  Chao Liu authored 3 years ago
  
  11b83234
07 May, 2022 2 commits
- format · 8e3aef3b
  Chao Liu authored 3 years ago
  
  8e3aef3b
- Merge remote-tracking branch 'origin/develop' into refactor_splitK · 4a76bc07
  Chao Liu authored 3 years ago
  
  4a76bc07
05 May, 2022 1 commit
- regress to using 1 grid_desc · 06f57782
  Jianfeng yan authored 3 years ago
  
  06f57782
03 May, 2022 2 commits
- turned on other operations · 308146e7
  Jianfeng yan authored 3 years ago
  
  308146e7
- minor changes · 8e3c41a5
  Jianfeng yan authored 3 years ago
  
  8e3c41a5
01 May, 2022 1 commit
- DeviceGemmXdlSplit and DeviceGemmXdlSplitKCShuffle both work for arbitrary K · 7910f486
  Jianfeng yan authored 3 years ago
  
  7910f486
30 Apr, 2022 6 commits
- Introduce GoogleTest framework. (#204) · 8eca05a6
  Adam Osewski authored 3 years ago
```
* Use googletest for tests. Add conv2d_fwd UT.

* Add conv1D/3D to gtest UT.

* Fix: not duplicate test with CTest.

* Convert more tests to googltests.

* Fix: GIT_SHALLOW is not allowed for git commit hash.

* Clang-format

* use integer value for GEMM test
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
```
  8eca05a6
- use integer value for GEMM test (#219) · 8a2c69ee
  Chao Liu authored 3 years ago
  
  8a2c69ee
- fix compile errors · b5a9f642
  Jianfeng yan authored 3 years ago
  
  b5a9f642
- fix compile errors · 73daa6d6
  Jianfeng yan authored 3 years ago
  
  73daa6d6
- removed A/B/CGridDesc from DeviceOps that use gridwise_gemm_v2r3 and gridwise_gemm_cshuffle · 328ab6f3
  Jianfeng yan authored 3 years ago
  
  328ab6f3
- refactor DeviceGemmXdlSplitK to support arbitrary K; remove template parameter... · e739c577
  Jianfeng yan authored 3 years ago
```
refactor DeviceGemmXdlSplitK to support arbitrary K; remove template parameter A/B/CGridDesc from gridwise_gemm_v2r3; start rewriting conv2d_backward
```
  e739c577
29 Apr, 2022 9 commits
- recover all ops in profiler · 22d63c05
  Jianfeng yan authored 3 years ago
  
  22d63c05
- move kenel_batched_gemm to file gridwise_gemm.hpp; formatting · 7e16a2cd
  Jianfeng yan authored 3 years ago
  
  7e16a2cd
- minor changes · 6d30bdb7
  Jianfeng yan authored 3 years ago
  
  6d30bdb7
- minor changes · 7af0e314
  Jianfeng yan authored 3 years ago
  
  7af0e314
- fixed segm fault caused by a typo in creating a/b_grid_desc · c521ee62
  Jianfeng yan authored 3 years ago
  
  c521ee62
- splitK works with gridwise_gemm_v2r3 · 09b9ddb3
  Jianfeng yan authored 3 years ago
  
  09b9ddb3
- Update to gemm_reduce and batched_gemm_reduce (#213) · c77ae65d
  Qianfeng authored 3 years ago
```
* [Experimental] Change to gemm+reduce and batched-gemm+reduce

* Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel

* Tiny fix in device_batched_gemm_xdl.hpp

* clang-format library/src/utility/conv_fwd_util.cpp
```
  c77ae65d
- Add gfx90a CI stage for tests (#208) · 97d8c504
  JD authored 3 years ago
```
* Add gfx90a CI stage

* upgrade to ROCm 5.1 and fix formatting
```
  97d8c504
- Hotfix for gemm test (#214) · 95e93430
  Anthony Chang authored 3 years ago
```
* pass by ref to avoid throwing away initialization results

* EOL CRLF -> LF
```
  95e93430
25 Apr, 2022 1 commit

add comments to batched_gemm (#186) · 3956085d

Jianfeng Yan authored 3 years ago

* add comments to batched_gemm

* formatting

* fix a typo in batched_gemm_documentation

* fix naming

3956085d

22 Apr, 2022 3 commits
- profiler: fix fp32 c-shuffle gemm tuning parameter (#194) · 7c0b1498
  Anthony Chang authored 3 years ago
  
  7c0b1498
- Clang-format only modified files. (#181) · 31d869ad
  Adam Osewski authored 3 years ago
  
  31d869ad
- use inline asm for 4x4 int8 transposition (#187) · 08a979f1
  Anthony Chang authored 3 years ago
  
  08a979f1
21 Apr, 2022 4 commits

Convolution FWD profiler refactor. (#183) · 1a0cd5d1

Adam Osewski authored 3 years ago

* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent ...

1a0cd5d1

Fix `clang-format` (#189) · 7353ec0c
JD authored 3 years ago
```
* Fix clang-format filepath

* update docker and fix format
```
7353ec0c
removed unused lds loads (#196) · 860e291c
zjing14 authored 3 years ago

860e291c

Use ck::half_t for Host Reduction (#195) · c1ef7319

Qianfeng authored 3 years ago

* Add math functions for host

* Change to host reduction to use ck::math:

* Remove the using of half_float::half and half.hpp from reduction example/profiler/ctest

c1ef7319

15 Apr, 2022 1 commit

Compile CK for all targets (#188) · 4221505d

Illia Silin authored 3 years ago


* compile ck for all targets

* update the target criteria

* change the target condition

* fixed some typos

* fixed missed file

* revert changes in README

* revert device_conv3d_fwd_xdl_...

* update device_conv3d_fwd_xdl_...

* update device_batched_gemm_reduce...

* test the unused arguments fix

* test the warning suppression

* try suppress warnings in device_batched_gemm_reduce_xdl...

* fix the last warnings

* replace UNUSED with std::ignore

* fix a typo

* replaced std::ignore with ignore

* add igonre header to common_header

* refactor atomicAdd
Co-authored-by: Chao Liu <chao.liu2@amd.com>

4221505d

07 Apr, 2022 1 commit
- Fix typo in batched gemm profiler (#176) · ac0d8066
  Jianfeng Yan authored 3 years ago
```
* forgot passing BatchedCount in some profiler_batched_gemm

* delete default BatchCount
```
  ac0d8066
05 Apr, 2022 3 commits

Common forward convolution utility refactor. (#141) · abf4bdb9

Adam Osewski authored 3 years ago

* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example t...

abf4bdb9

Patch for bwd data comments (#174) · 6717168c
ltqin authored 3 years ago
```
* change function name and way to set input zero

* change enable if
```
6717168c

NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166) · 781cacd2

ltqin authored 3 years ago

* change backward weight name

* start add bwd weight lib and profiler

* change tuning paramter

* change output info

* add bwd weight test

* change test info

* using conv_util

* change wgt to weight

* add }

* add fp32

781cacd2