- 25 Sep, 2020 2 commits
-
-
Dustyn Blasig authored
CUTLASS 2.3.0
-
akerr authored
-
- 23 Sep, 2020 1 commit
-
-
Andrew Kerr authored
CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.
-
- 13 Jul, 2020 1 commit
-
-
hwu36 authored
* Update splitk_gemm.cu * Update gemm_bias_relu.cu * Update mma_sm75.h
-
- 17 Jun, 2020 1 commit
-
-
Andrew Kerr authored
* Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.
-
- 15 Jun, 2020 1 commit
-
-
Andrew Kerr authored
- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. - Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out - Added test_examples target to build and test all CUTLASS examples - Minor edits to documentation to point to GTC 2020 webinar
-
- 08 Jun, 2020 1 commit
-
-
Andrew Kerr authored
Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.
-
- 08 Apr, 2020 1 commit
-
-
Vijay Thakkar authored
#70 only updates the documentation. This commit reflects this bump in python version to the CMake configuration as well.
-
- 07 Apr, 2020 1 commit
-
-
Andrew Kerr authored
CUTLASS 2.1 contributes: - BLAS-style host-side API added to CUTLASS Library - Planar Complex GEMM kernels targeting Volta and Turing Tensor Cores - Minor enhancements and bug fixes
-
- 22 Nov, 2019 1 commit
-
-
Andrew Kerr authored
-
- 20 Nov, 2019 4 commits
-
-
Andrew Kerr authored
-
Andrew Kerr authored
* Improved formatting, clarity, and content of several documents.
-
Dustyn Blasig authored
-
Andrew Kerr authored
CUTLASS 2.0 Substantially refactored for - Better performance, particularly for native Turing Tensor Cores - Robust and durable templates spanning the design space - Encapsulated functionality embodying modern C++11 programming techniques - Optimized containers and data types for efficient, generic, portable device code Updates to: - Quick start guide - Documentation - Utilities - CUTLASS Profiler Native Turing Tensor Cores - Efficient GEMM kernels targeting Turing Tensor Cores - Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands Coverage of existing CUTLASS functionality: - GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs - Volta Tensor Cores through native mma.sync and through WMMA API - Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions - Batched GEMM operations - Complex-valued GEMMs Note: this commit and all that follow require a host compiler supporting C++11 or greater.
-
- 10 Jul, 2019 1 commit
-
-
Andrew Kerr authored
* Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement. * Updated patch version and changelog. * Updated patch version and changelog. * Added link to changelog in readme. * Fixed markdown link
-
- 13 May, 2019 1 commit
-
-
Timmy authored
Make CUTLASS compileable with Clang.
-
- 10 May, 2019 1 commit
-
-
Artem Belevich authored
-
- 09 May, 2019 1 commit
-
-
gthomascollignon authored
-
- 02 May, 2019 1 commit
-
-
Artem Belevich authored
Requires a recent clang build (r359248 or newer). Enable compilation with clang with these options: cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=/path/to/clang++
-
- 19 Apr, 2019 1 commit
-
-
Timmy authored
CUTLASS 1.3.1 patch resolves failing text with NVRTC.
-
- 20 Mar, 2019 1 commit
-
-
Andrew Kerr authored
CUTLASS 1.3 Release - Efficient GEMM kernel targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.
-
- 19 Dec, 2018 3 commits
-
-
Andrew Kerr authored
Removed patch version from README.
-
Andrew Kerr authored
Resolved issue for incorrect SGEMM on Maxwell architecture.
-
akerr authored
-
- 26 Oct, 2018 4 commits
-
-
Andrew Kerr authored
CUTLASS 1.2
-
Andrew Kerr authored
-
Andrew Kerr authored
-
akerr authored
-
- 29 Sep, 2018 2 commits
-
-
Andrew Kerr authored
Fixed cutlass_utilities example.
-
akerr authored
CUDA 9 lacks host-side conversions from float=>half. Instead, we must reinterpret_cast<> from cutlass::half_t => half.
-
- 28 Sep, 2018 2 commits
-
-
Andrew Kerr authored
Fixed typeo
-
Andrew Kerr authored
Fixed typeo
-
- 21 Sep, 2018 1 commit
-
-
Andrew Kerr authored
Clarification to README
-
- 20 Sep, 2018 6 commits
-
-
akerr authored
-
Andrew Kerr authored
Updated CUTLASS.md
-
akerr authored
-
Andrew Kerr authored
Cutlass 1.1
-
akerr authored
-
akerr authored
-
- 19 Sep, 2018 1 commit
-
-
akerr authored
-