Commits · external_interface · liyinrong / composable_kernel

24 Jun, 2022 8 commits
- clean · ea819f32
  Chao Liu authored 3 years ago
  
  ea819f32
- clean · 01c777e9
  Chao Liu authored 3 years ago
  
  01c777e9
- clean · 7d1698eb
  Chao Liu authored 3 years ago
  
  7d1698eb
- temporary disable client test on Jenkins · f214a8fc
  Chao Liu authored 3 years ago
  
  f214a8fc
- fix build · c7ce4cc6
  Chao Liu authored 3 years ago
  
  c7ce4cc6
- clean build · 2e5cd79a
  Chao Liu authored 3 years ago
  
  2e5cd79a
- revert client app example · 460c3c47
  Chao Liu authored 3 years ago
  
  460c3c47
- fix header path · 47bb0e65
  Chao Liu authored 3 years ago
  
  47bb0e65
23 Jun, 2022 13 commits

fix header path · 35d22632
Chao Liu authored 3 years ago

35d22632
fix header path · f3e542a5
Chao Liu authored 3 years ago

f3e542a5
fix header path · c102dc14
Chao Liu authored 3 years ago

c102dc14
fix header path · 54442e12
Chao Liu authored 3 years ago

54442e12
fix header path · 0dab8c83
Chao Liu authored 3 years ago

0dab8c83
fix header path · 3ae04c7b
Chao Liu authored 3 years ago

3ae04c7b
fix header path · 9a9cd972
Chao Liu authored 3 years ago

9a9cd972
fix header path · 4cdf46ab
Chao Liu authored 3 years ago

4cdf46ab
Merge remote-tracking branch 'origin/develop' into external_interface · 9228f8f0
Chao Liu authored 3 years ago

9228f8f0

Chao Liu authored 3 years ago

* update license

* update license

* update license

* update license

a49115b9

fix header path for all examples · 2e181fbd
Chao Liu authored 3 years ago

2e181fbd
fix header path · 294b27b8
Chao Liu authored 3 years ago

294b27b8

Testing all fwd convolution specializations. (#259) · a2edd7d8

Adam Osewski authored 3 years ago


* UniforFill with integer values.

* Log tested instance type string.

* Add UT for all convolution specializations.

* debugging conv

* Fix dangling reference bug.

* Small refinements.

* Fix call to error checking function.

* Small refinements to tests.

* Configure error tolerance
* Change problem size.
* Remove OddC case from types that do not support it.

* Add helper traits for AccumulatorDataType.

* Print first 5 errs in check_err for integral types.

* Rename FillUniform to FillUniformDistribution

* Refactor

* Do not use typed tests.
* Instead use plain fixture class with templatized member functions.
* Initialize tensors with integer values.

* Refine test instances.

* Properly set accumulator data type.
* Add another "big" instance.

* Refactor convolution tests.

* Revert "debugging conv"

This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.

* Add pragma once + format + small refinement.

* Fix some unwanted changes.

* Clang-format

* Fix profile_convnd to use renamed tensor initializer.

* Add instances for ConvFWDND kernel case 2D

* Helpers to get ConvNDFwd 2D instances.

* Refactoring.

* Remove "small block" instance as it was generating compiler errors.
* Remove default template parameters values.

* Refine and fix test.

* Fix problem with default template parameter types.
* Adjust error thresholds for floating point values test.
* Use integer values initialization for instances test.
* Add tests for ConvNDFwd 2D case.

* Remove AccumulatorDataType type trait.

* Update unit-tests.

* Remove operator<< overload.

* Unlock conv1d/3d nd fwd instances.

* Enable skipping calculating reference using flag.

* Fix number of channels for first ResNet50 layer.

* Clang-format.
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

a2edd7d8

21 Jun, 2022 13 commits

fix header path · 174b1ba3
Chao Liu authored 3 years ago

174b1ba3
fix header path · d685ba24
Chao Liu authored 3 years ago

d685ba24
fix header path · dd2c9e09
Chao Liu authored 3 years ago

dd2c9e09
Merge remote-tracking branch 'origin/develop' into external_interface · bddda297
Chao Liu authored 3 years ago

bddda297
fix Issue 291 (#294) · 4634b120
Shaojie WANG authored 3 years ago
```
* rename for typeconvert functor

* refine code
```
4634b120
fix header path · 996ff99c
Chao Liu authored 3 years ago

996ff99c
remove half · bfeb09b0
Chao Liu authored 3 years ago

bfeb09b0
clean · 8074c853
Chao Liu authored 3 years ago

8074c853

Standalone softmax kernel (#284) · 15c89e81

Anthony Chang authored 3 years ago

* initial stub for standalone softmax

* start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m

* host softmax validates

* compiles; to implement beta scaling

* use NaN trick to efficiently ignore OOB values during sum of exponentials

* freeload device_reduce's utility functions

* clean up interface

* adding prior value (beta scaling)

* remove restriction related to perf considerations

* apply clang-format

* clean; disable diagnostics

* resolve conflicts

* add exp wrapper

* honor HostTensorDesc interface; allow implicit cast from different vector<T> type

* test softmax for fp16/fp32

* update readme

* amend commit NaN trick

* remove redundant param added during development

* format

* replace ScalarDataType with AccDataType

* separate out test programs by precision type

* move softmax sample code to its own folder

* format

* keep up with recent changes in reduction API

* remove extra header

15c89e81

Create MIT LICENSE (#229) · be60d60d

Chao Liu authored 3 years ago

* Create LICENSE

* add contributors, add license into config.hpp

* update

be60d60d

Merge remote-tracking branch 'origin/develop' into external_interface · fe090b93
Chao Liu authored 3 years ago

fe090b93
bring up to date with the usage of __builtin_amdgcn_sched_barrier (#293) · 1ae24109
Anthony Chang authored 3 years ago

1ae24109
update readme and script (#290) · ccbd8d90
Chao Liu authored 3 years ago

ccbd8d90

19 Jun, 2022 1 commit

GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241) · 56adf7e9

Chao Liu authored 3 years ago

* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add comment

* use type_convert

* clean

* clean element wise op

56adf7e9

17 Jun, 2022 5 commits

cleaning · fa40d2a4
Chao Liu authored 3 years ago

fa40d2a4

Don't look up the /sys/module/amdgpu/version file. (#287) · e4584d91

Illia Silin authored 3 years ago


* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

* dos2unix for run_perf_tests.sh

* try the new git credentials

* use env var for git credentials

* don't look up /sys/module/amdgpu/version
Co-authored-by: Chao Liu <chao.liu2@amd.com>

e4584d91

Regulate reduction accumulator operations and Element-wise operations (#274) · 1f543bfa

Qianfeng authored 3 years ago

* Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces

* Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers

* Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers

* Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation

* Use struct-scope operator template instantiation for binary and unary element-wise operations

* Change a few more elementwise operations to use template for operator()

* Tiny correction in Normalize operator

* Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons

* Correction in some examples with regard to using ReduceAccDataType

* Use static_assert for UnaryDivide

* Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly

* Tiny fix with regard to SetWorkSpacePointer()

1f543bfa

use universal workspace pointer in bwd-weight (#286) · 63cdd923
Shaojie WANG authored 3 years ago

63cdd923
add p_workspace to baseargument (#275) · c7a96ed5
ltqin authored 3 years ago

c7a96ed5

GitLab

Menu