- 18 Sep, 2022 13 commits
- 17 Sep, 2022 2 commits
- 16 Sep, 2022 2 commits
- 15 Sep, 2022 3 commits
- 14 Sep, 2022 3 commits
-
-
ltqin authored
* refactor * start * add device gemm file * add BatchStrideD0 * add stridd0 * add gridwise file * add d0 parameters to gridwise gemm * add c layout transformer * add d0 threadwise copy * init kernel * init kernel * regular code * nm desc put to out * kernel parameter can not use reference * host add bias+gelu * run right for bias+gelu * change AddFastGelu into another file * interface add d1 bias parameters * add d1 parameter to argument * add d1 parameter to gridwise * first all code,not verify * gelu change to relu and GetElementSpaceSize bug * add instance * start add to ckprofiler * ckprofiler finish code * change input parameter for ckProfiler * fix host bias+gelu bug * show help for ckProfiler * fix bug for lunch kernel ignore parametes * add pad and fix about bug * mutiple d0 * add dynamic d0_element_op * change profiler and instance to mutiple d0 * example have 2 d0 * remove some comments not using * change 2 d0 have self parameters * change d element_op name * change class name(multiple_d) * fix bug * fix bug that don't find file * update profiler * refactor * update profiler * clean * revert example change * add gon layout * optimize parameter for gno * add gon to gemm+gemm * change helping input parameters * change to GemmPadder_v2 * using ForEach * fix gb_per_sec Co-authored-by:
Chao Liu <lc.roy86@gmail.com> Co-authored-by:
ltqin <letaoqin@amd.com>
-
Chao Liu authored
-
Chao Liu authored
-
- 13 Sep, 2022 1 commit
-
-
Illia Silin authored
* upgrade the OS and ROCM versions in CK docker * add cxx flags to link code with rocm5.2 and ck-9110 compiler * rename the docker image * run ONNX gemms using init=1
-
- 12 Sep, 2022 2 commits
- 10 Sep, 2022 5 commits
-
-
shaojiewang authored
-
shaojiewang authored
-
shaojiewang authored
-
shaojiewang authored
-
shaojiewang authored
-
- 09 Sep, 2022 2 commits
-
-
shaojiewang authored
-
carlushuang authored
* add gridwise/device sparse embedding * update code * update code * remove useless makefile * code fix * workable * work properly * emb add * add more instance * format * remove useless code * fix format * fix clang-tidy * clean * fix a compile error Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
- 08 Sep, 2022 1 commit
-
-
Anthony Chang authored
* fix example; make padding on by default in example; fix argument checks * fix Gemm1KPacK which has since regressed from PR #399
-
- 07 Sep, 2022 1 commit
-
-
Illia Silin authored
* add processing for the onng_gemm and splitK_gemm * add profile_onnx_gemm.sh * add stderr to logfiles, add splitK and onnx gemm parsing * enable splitK gemm wresults posting to db
-
- 06 Sep, 2022 4 commits
-
-
Anthony Chang authored
* modify comment * trim unnecessary check * add gemm spec in kernel name * add TNTT gemm_gemm + atten kernel instances * refactor attention padding to better fit in unit tests This streamlines usage where "ResetNaNToMinusInf" is now hidden from user facing device op. Also added compile-time conditionals that load OOB value as NaN only after padding is enabled * add adhoc padding test for atten * shrink input value range for attention kernel validation to avoid occasional error by 1e-3 Still unsure whether this kind of deterministic floating point accurary issue is expected or not. May want to try exact same approach as the GPU kernel in the host reference GEMM+Softmax+GEMM function to see if the accuracy discrepancy goes away. Until then, shrink the input value range as it is less likely to produce errors of around ~1e-3. * attention kernel proper granular padding for all 4 dims * IsSupportedArgument checks * test more padded cases * block PadK specialization in attention kernels * workaround clang crash for gfx908 (gfx908 only) workaround for compiler crash in fused kernels on mainline #9110; #10738 seems ok error message was "fatal error: error in backend: Error while trying to spill VGPR0 from class VGPR_32: Cannot scavenge register without an emergency spill slot!" this fall back to less ideal way of handle NPadding in fused attention kernel * comment out kernels giving wrong results on MI100; MI200 doesn't seem affected
-
Anthony Chang authored
* add gemm_gemm TNNT instance * sanitize Gemm1KPack * disable instances that failed validation on mi100
-
Adam Osewski authored
* Update Softmax device operation interface. * Update ckProfiler. * Update Softmax UT. * Update example. * Client example. * Clang format Co-authored-by:
Adam Osewski <aosewski@amd.com>
-
shaojiewang authored
-
- 05 Sep, 2022 1 commit
-
-
shaojiewang authored
-