1. 24 Jun, 2022 8 commits
  2. 23 Jun, 2022 13 commits
    • Chao Liu's avatar
      fix header path · 35d22632
      Chao Liu authored
      35d22632
    • Chao Liu's avatar
      fix header path · f3e542a5
      Chao Liu authored
      f3e542a5
    • Chao Liu's avatar
      fix header path · c102dc14
      Chao Liu authored
      c102dc14
    • Chao Liu's avatar
      fix header path · 54442e12
      Chao Liu authored
      54442e12
    • Chao Liu's avatar
      fix header path · 0dab8c83
      Chao Liu authored
      0dab8c83
    • Chao Liu's avatar
      fix header path · 3ae04c7b
      Chao Liu authored
      3ae04c7b
    • Chao Liu's avatar
      fix header path · 9a9cd972
      Chao Liu authored
      9a9cd972
    • Chao Liu's avatar
      fix header path · 4cdf46ab
      Chao Liu authored
      4cdf46ab
    • Chao Liu's avatar
    • Chao Liu's avatar
      update license (#297) · a49115b9
      Chao Liu authored
      * update license
      
      * update license
      
      * update license
      
      * update license
      a49115b9
    • Chao Liu's avatar
      fix header path for all examples · 2e181fbd
      Chao Liu authored
      2e181fbd
    • Chao Liu's avatar
      fix header path · 294b27b8
      Chao Liu authored
      294b27b8
    • Adam Osewski's avatar
      Testing all fwd convolution specializations. (#259) · a2edd7d8
      Adam Osewski authored
      
      * UniforFill with integer values.
      
      * Log tested instance type string.
      
      * Add UT for all convolution specializations.
      
      * debugging conv
      
      * Fix dangling reference bug.
      
      * Small refinements.
      
      * Fix call to error checking function.
      
      * Small refinements to tests.
      
      * Configure error tolerance
      * Change problem size.
      * Remove OddC case from types that do not support it.
      
      * Add helper traits for AccumulatorDataType.
      
      * Print first 5 errs in check_err for integral types.
      
      * Rename FillUniform to FillUniformDistribution
      
      * Refactor
      
      * Do not use typed tests.
      * Instead use plain fixture class with templatized member functions.
      * Initialize tensors with integer values.
      
      * Refine test instances.
      
      * Properly set accumulator data type.
      * Add another "big" instance.
      
      * Refactor convolution tests.
      
      * Revert "debugging conv"
      
      This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.
      
      * Add pragma once + format + small refinement.
      
      * Fix some unwanted changes.
      
      * Clang-format
      
      * Fix profile_convnd to use renamed tensor initializer.
      
      * Add instances for ConvFWDND kernel case 2D
      
      * Helpers to get ConvNDFwd 2D instances.
      
      * Refactoring.
      
      * Remove "small block" instance as it was generating compiler errors.
      * Remove default template parameters values.
      
      * Refine and fix test.
      
      * Fix problem with default template parameter types.
      * Adjust error thresholds for floating point values test.
      * Use integer values initialization for instances test.
      * Add tests for ConvNDFwd 2D case.
      
      * Remove AccumulatorDataType type trait.
      
      * Update unit-tests.
      
      * Remove operator<< overload.
      
      * Unlock conv1d/3d nd fwd instances.
      
      * Enable skipping calculating reference using flag.
      
      * Fix number of channels for first ResNet50 layer.
      
      * Clang-format.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      a2edd7d8
  3. 21 Jun, 2022 13 commits
  4. 19 Jun, 2022 1 commit
    • Chao Liu's avatar
      GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241) · 56adf7e9
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add comment
      
      * use type_convert
      
      * clean
      
      * clean element wise op
      56adf7e9
  5. 17 Jun, 2022 5 commits
    • Chao Liu's avatar
      cleaning · fa40d2a4
      Chao Liu authored
      fa40d2a4
    • Illia Silin's avatar
      Don't look up the /sys/module/amdgpu/version file. (#287) · e4584d91
      Illia Silin authored
      
      * use pre-built docker instead of building a new one
      
      * try docker.image.pull
      
      * change syntax in docker.image()
      
      * add 30 min timeout
      
      * increase timeout to 3 hours
      
      * move performance tests to first stage for testing
      
      * set image variable to the new container name
      
      * update image name
      
      * check available images
      
      * check available images in both places
      
      * try different image name
      
      * use image ID to refer to image
      
      * run performance on gfx90a
      
      * fix the gpu_arch labeling, add parameter
      
      * move env vars out of stages
      
      * add stand-alone performance script, MI200 tests, CU numbers
      
      * dos2unix for run_perf_tests.sh
      
      * try the new git credentials
      
      * use env var for git credentials
      
      * don't look up /sys/module/amdgpu/version
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      e4584d91
    • Qianfeng's avatar
      Regulate reduction accumulator operations and Element-wise operations (#274) · 1f543bfa
      Qianfeng authored
      * Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces
      
      * Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers
      
      * Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers
      
      * Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation
      
      * Use struct-scope operator template instantiation for binary and unary element-wise operations
      
      * Change a few more elementwise operations to use template for operator()
      
      * Tiny correction in Normalize operator
      
      * Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons
      
      * Correction in some examples with regard to using ReduceAccDataType
      
      * Use static_assert for UnaryDivide
      
      * Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly
      
      * Tiny fix with regard to SetWorkSpacePointer()
      1f543bfa
    • Shaojie WANG's avatar
      63cdd923
    • ltqin's avatar
      add p_workspace to baseargument (#275) · c7a96ed5
      ltqin authored
      c7a96ed5