Commits · illsilin-lwpck-141 · liyinrong / composable_kernel

15 Apr, 2022 1 commit
- refactor atomicAdd · c4bc8da2
  Chao Liu authored 3 years ago
  
  c4bc8da2
13 Apr, 2022 4 commits
- add igonre header to common_header · 708b7bf2
  illsilin authored 3 years ago
  
  708b7bf2
- replaced std::ignore with ignore · 4dac0984
  illsilin authored 3 years ago
  
  4dac0984
- fix a typo · c77ce0d5
  illsilin authored 3 years ago
  
  c77ce0d5
- replace UNUSED with std::ignore · 0a4ece66
  illsilin authored 3 years ago
  
  0a4ece66
12 Apr, 2022 1 commit
- fix the last warnings · cddfa5c2
  illsilin authored 3 years ago
  
  cddfa5c2
11 Apr, 2022 6 commits
- try suppress warnings in device_batched_gemm_reduce_xdl... · 0c767007
  illsilin authored 3 years ago
  
  0c767007
- test the warning suppression · b5e62fde
  illsilin authored 3 years ago
  
  b5e62fde
- test the unused arguments fix · 1a685834
  illsilin authored 3 years ago
  
  1a685834
- update device_batched_gemm_reduce... · e456283f
  illsilin authored 3 years ago
  
  e456283f
- update device_conv3d_fwd_xdl_... · b1395f77
  illsilin authored 3 years ago
  
  b1395f77
- revert device_conv3d_fwd_xdl_... · cfe2e353
  illsilin authored 3 years ago
  
  cfe2e353
10 Apr, 2022 1 commit
- revert changes in README · 7f168b10
  illsilin authored 3 years ago
  
  7f168b10
09 Apr, 2022 3 commits
- fixed missed file · 98ef3297
  illsilin authored 3 years ago
  
  98ef3297
- fixed some typos · 6b0f1b2f
  illsilin authored 3 years ago
  
  6b0f1b2f
- change the target condition · 3b573bb9
  illsilin authored 3 years ago
  
  3b573bb9
08 Apr, 2022 1 commit
- update the target criteria · 1ca54630
  illsilin authored 3 years ago
  
  1ca54630
07 Apr, 2022 1 commit
- compile ck for all targets · 4e075420
  illsilin authored 3 years ago
  
  4e075420
01 Apr, 2022 1 commit
- fix build (#171) · 64687816
  Chao Liu authored 3 years ago
  
  64687816
31 Mar, 2022 7 commits

Tune & add conflict-free LDS gemm kernels (#159) · 7db48f90

Anthony Chang authored 3 years ago

* retune & add conflict-free bf16/fp16 c-shuffle gemm instances

amend wrong K1 value in some fp16/bf16 kernel instances

* make gemm cshuffle's timing behavior consistent with all other functions

* clang-format

* retune & add conflict-free fp32 c-shuffle gemm instances

* retune & add conflict-free int8 c-shuffle gemm instances

* update the underlying gridwise gemm of all c-shuffle gemm kernels

* typo

7db48f90

Patch for bwd data #134 (#168) · c0e95f62

ltqin authored 3 years ago

* remove switch for NDimSpatial

* change in, out and wei name

* rename reference thumb function name

* remove test

c0e95f62

Compile for gfx908 and gfx90a (#130) · cd167e49

Chao Liu authored 3 years ago

* adding compilation for multiple targets

* fix build

* clean

* update Jekinsfile

* update readme

* update Jenkins

* use ck::half_t instead of ushort for bf16

* rename enum classes

* clean

* rename

* clean

cd167e49

fixed issue164 (#165) · ecf337ba
Jianfeng Yan authored 3 years ago
```
* fixed issue164

* removed prints
```
ecf337ba
use single threaded tensor generator (#161) · f015c776
Anthony Chang authored 3 years ago

f015c776
batched_gemm: use profiler in ctest (#163) · c8f3acf9
Jianfeng Yan authored 3 years ago

c8f3acf9
Fix return type to be conformant with CTest. (#160) · 982f8bbc
Adam Osewski authored 3 years ago
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
982f8bbc

30 Mar, 2022 1 commit

Batched gemm and reduction (#156) · 34c661e7

Jianfeng Yan authored 3 years ago

* adding batched_gemm_and_reduction

* batched_gemm_reduce works with bactch_count=1

* fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1

* adding profiler for batched_gemm_fp16

* fixed a bug in declaration of d1 and d0; both example and profiler work

* clang-format

* cleanup

* batched_gemm_reduce: add test

* minor change

* fixed some typo in function names

34c661e7

29 Mar, 2022 2 commits

Refine kernel parameter of int8 (ScalarPerVector) (#155) · 98e1e2d0
rocking5566 authored 3 years ago
```
* Change int8 ScalarPerVector

* Modify vector width of C
```
98e1e2d0

Unified implementation of 1d/2d/3d conv bwd-data. fp32/fp16/bfp16/int8 (#134) · 0536f2b3

ltqin authored 3 years ago


* start convnd bwd data

* add 3d laoyout name

* add conv1d reference

* add con3d reference

* finished example client code

* conv1d kernel finished

* fix input error

* add conv3d

* add 3d layout in conv_utils.hpp

* fix sepecial check

* addconvnd lib

* add test for bwd data

* finished test

* add check slice length

* convnd bwd data start

* profiler can be compiled

* fix some bug

* set input to zero

* modify readme for example

* fix test_convnd_bwd_data bug

* test_convnd_bwd_data parameter desc

* workaround for 1d

* workaroud for 2d

* change init value

* workaround for 3d int8

* fix init value bug

* remove workaround

* fix acc data type

* add int32

* change select function to template

* tilda to tilde

* remove int32 instance

* fix commit for device hpp

* fix comments for profiler

* using profile imp to test

* add pass verification

* fix conv2d reference

* fix conflict

* remove double batched_gemm

* fix exampel conv2d data and test convnd

* format

* change conv2d_bwd_data return value

* remove repeat = 1

* remove conv bwd data
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

0536f2b3

28 Mar, 2022 1 commit
- Grouped gemm test fix (#150) · fe6ce55c
  zjing14 authored 3 years ago
```
* fixed test: return res; rand gemm shapes

* fixed return
```
  fe6ce55c
25 Mar, 2022 1 commit
- ctest of batched_gemm returns 0 or 1 (#149) · 313bbea5
  Jianfeng Yan authored 3 years ago
```
* ctest of batched_gemm returns 0 or 1

* minor change
```
  313bbea5
24 Mar, 2022 3 commits

Gemm test return value (#148) · 3ba14932

rocking5566 authored 3 years ago

* Add return value

* Replace _Float16 to ck::half_t

* A test should return 0 if success and return non-zero if fail

3ba14932

fixed alloc mem size (#145) · 12f4cfce
zjing14 authored 3 years ago

12f4cfce

Gemm+Reduce Fusion (#128) · f95267f1

Chao Liu authored 3 years ago

* add gridwise gemm v4r1

* rename

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* use sfc in shuffling

* remove hardcode

* remove hardcode

* refactor

* fix build

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* adding gemm+reduce

* format

* clean

* adding gemm+reduce

* adding profiler for gemm+reduce

* adding gemm+reduce profiler

* fix build

* clean up

* gemm+reduce

* fix build

* update DeviceGemm_Xdl_CShuffle; update enum to enum class

* clean up

* add test for gemm+reduce

* clean up

* refactor

* fix build

* fix build

f95267f1

23 Mar, 2022 2 commits

Unified conv3D API + support for all data types. (#133) · f91579aa

Adam Osewski authored 3 years ago

* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename examp...

f91579aa

clean (#143) · 22061366
Chao Liu authored 3 years ago

22061366

22 Mar, 2022 3 commits

Batched gemm bf16 (#142) · d91f9f11

Jianfeng Yan authored 3 years ago

* add bf16 for batched gemm

* batched_gemm_bf16 works

* recover accidently changed files

d91f9f11

Grouped GEMM for fp16 (#126) · 716f1c7f

zjing14 authored 3 years ago

* init of grouped_gemm

* 2 gemm test

* perf test

* clean

* wrap desc into a struct

* test cast static_arr to pointer

* add ptr to GemmDesc

* add grouped gemm profiler

* fixed mem issue with unique_ptr

* clean

* clean

* finished ckprofiler

* Update README.md

* readme

* fixed readme

* add example

* improve code

* fixed comments: reserve, seperate ptr and gemm_shapes

* merge group and non-group

* fixed comments: replace push_back with emplace_back to avoid copy constructor

* fixed comments: unified blk2ctile; add test

* ci fix

* fixed ci

* fixed ci

* fixed ci

716f1c7f

Reduction for int8 and bfloat16 (#125) · 9a8ee8a3

Qianfeng authored 3 years ago


* Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction

* Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter

* Rename the folder name for the pool2d and reduce examples

* Update to reduction test scripts

* Add Readme for pool2d_fwd and reduce_blockwise examples

* Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX)

* Tiny fix in reduce profiler and tiny update in reduce testing scripts

* Tiny fix in testing script profile_reduce_no_index.sh

* Tiny fix in testing script profile_reduce_no_index.sh

* Add support for bfp16 reduction (using bhalf_t = ushort)

* Tiny fix in amd_buffer_addressing.hpp

* Tiny change in script/profile_reduce_with_index.sh

* Use AccDataType for Beta value and use element_wise::PassThrough

* Use type_convert for type converting in host layer reduction

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming all NumReduceDims to NumReduceDim

* Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2

* Update to testing scripts to add bf16 support

* added more static_assert

* Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp

* Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations

* minor change

* Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass

* Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp

* Tiny fix in script/profile_reduce_no_index.sh

* Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims

* Generic renaming in host reduction and DeviceReduce layer

* Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances

* Use multi-thread and simplification for host Reduction implementation

* Add ctest for reduction

* Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/

* Update to the reduce CTest executables to enable default testing behavior when no command argument

* Renaming
Co-authored-by: Jianfeng yan <jfyan008@gmail.com>

9a8ee8a3

21 Mar, 2022 1 commit
- refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120) · cb87b049
  Jianfeng Yan authored 3 years ago
```
changed long_index_t to index_t when computing memory offset

uncomment other ops in profiler

added test for batched_gemm
```
  cb87b049

GitLab

Menu