1. 25 Sep, 2020 2 commits
  2. 23 Sep, 2020 1 commit
    • Andrew Kerr's avatar
      CUTLASS 2.3 initial commit (#134) · c53f3339
      Andrew Kerr authored
      CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.
      c53f3339
  3. 13 Jul, 2020 1 commit
    • hwu36's avatar
      Typoes (#107) · 4dac7490
      hwu36 authored
      * Update splitk_gemm.cu
      
      * Update gemm_bias_relu.cu
      
      * Update mma_sm75.h
      4dac7490
  4. 17 Jun, 2020 1 commit
  5. 15 Jun, 2020 1 commit
    • Andrew Kerr's avatar
      Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100) · 1ab10279
      Andrew Kerr authored
      - Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
      - Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
      - Added test_examples target to build and test all CUTLASS examples
      - Minor edits to documentation to point to GTC 2020 webinar
      1ab10279
  6. 08 Jun, 2020 1 commit
    • Andrew Kerr's avatar
      CUTLASS 2.2 (#96) · 86931fef
      Andrew Kerr authored
      Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.
      86931fef
  7. 08 Apr, 2020 1 commit
  8. 07 Apr, 2020 1 commit
    • Andrew Kerr's avatar
      CUTLASS 2.1 (#83) · 96dab34a
      Andrew Kerr authored
      CUTLASS 2.1 contributes:
      - BLAS-style host-side API added to CUTLASS Library
      - Planar Complex GEMM kernels targeting Volta and Turing Tensor Cores
      - Minor enhancements and bug fixes
      96dab34a
  9. 22 Nov, 2019 1 commit
  10. 20 Nov, 2019 4 commits
    • Andrew Kerr's avatar
    • Andrew Kerr's avatar
      Improved formatting, clarity, and content of several documents. (#64) · 8aca98f9
      Andrew Kerr authored
      * Improved formatting, clarity, and content of several documents.
      8aca98f9
    • Dustyn Blasig's avatar
    • Andrew Kerr's avatar
      CUTLASS 2.0 (#62) · fb335f6a
      Andrew Kerr authored
      CUTLASS 2.0
      
      Substantially refactored for
      
      - Better performance, particularly for native Turing Tensor Cores
      - Robust and durable templates spanning the design space
      - Encapsulated functionality embodying modern C++11 programming techniques
      - Optimized containers and data types for efficient, generic, portable device code
      
      Updates to:
      - Quick start guide
      - Documentation
      - Utilities
      - CUTLASS Profiler
      
      Native Turing Tensor Cores
      - Efficient GEMM kernels targeting Turing Tensor Cores
      - Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands
      
      Coverage of existing CUTLASS functionality:
      - GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs
      - Volta Tensor Cores through native mma.sync and through WMMA API
      - Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions
      - Batched GEMM operations
      - Complex-valued GEMMs
      
      Note: this commit and all that follow require a host compiler supporting C++11 or greater.
      fb335f6a
  11. 10 Jul, 2019 1 commit
  12. 13 May, 2019 1 commit
  13. 10 May, 2019 1 commit
  14. 09 May, 2019 1 commit
  15. 02 May, 2019 1 commit
    • Artem Belevich's avatar
      Make CUTLASS compileable with Clang. · e18292db
      Artem Belevich authored
      Requires a recent clang build (r359248 or newer).
      
      Enable compilation with clang with these options:
      cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=/path/to/clang++
      e18292db
  16. 19 Apr, 2019 1 commit
    • Timmy's avatar
      cutlass 1.3.1 (#46) · fe3438a3
      Timmy authored
      CUTLASS 1.3.1 patch resolves failing text with NVRTC.
      fe3438a3
  17. 20 Mar, 2019 1 commit
    • Andrew Kerr's avatar
      Cutlass 1.3 Release (#42) · 877bdcac
      Andrew Kerr authored
      CUTLASS 1.3 Release
      - Efficient GEMM kernel targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.
      877bdcac
  18. 19 Dec, 2018 3 commits
  19. 26 Oct, 2018 4 commits
  20. 29 Sep, 2018 2 commits
  21. 28 Sep, 2018 2 commits
  22. 21 Sep, 2018 1 commit
  23. 20 Sep, 2018 6 commits
  24. 19 Sep, 2018 1 commit