Commits · v2.3.0 · liyinrong / cutlass

25 Sep, 2020 2 commits
- Merge pull request #135 from NVIDIA/cutlass_2.3_final · c2b80ad4
  Dustyn Blasig authored 4 years ago
```
CUTLASS 2.3.0
```
  c2b80ad4
- CUTLASS 2.3.0 final. · 37a8f9e5
  akerr authored 4 years ago
  
  37a8f9e5
23 Sep, 2020 1 commit

CUTLASS 2.3 initial commit (#134) · c53f3339

Andrew Kerr authored 4 years ago

CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.

c53f3339

13 Jul, 2020 1 commit

Typoes (#107) · 4dac7490

hwu36 authored 5 years ago

* Update splitk_gemm.cu

* Update gemm_bias_relu.cu

* Update mma_sm75.h

4dac7490

17 Jun, 2020 1 commit

Added examples to enable the unity build (#102) · fd7e058d

Andrew Kerr authored 5 years ago

* Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.

fd7e058d

15 Jun, 2020 1 commit

Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100) · 1ab10279

Andrew Kerr authored 5 years ago

- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
- Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
- Added test_examples target to build and test all CUTLASS examples
- Minor edits to documentation to point to GTC 2020 webinar

1ab10279

08 Jun, 2020 1 commit

CUTLASS 2.2 (#96) · 86931fef

Andrew Kerr authored 5 years ago

Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.

86931fef

08 Apr, 2020 1 commit
- update tools/library/CMakeLists to require python 3.6 according to #70 (#82) · e33d90b3
  Vijay Thakkar authored 5 years ago
```
#70 only updates the documentation. This commit reflects this bump in python version to the CMake configuration as well.
```
  e33d90b3
07 Apr, 2020 1 commit

CUTLASS 2.1 (#83) · 96dab34a

Andrew Kerr authored 5 years ago

CUTLASS 2.1 contributes:
- BLAS-style host-side API added to CUTLASS Library
- Planar Complex GEMM kernels targeting Volta and Turing Tensor Cores
- Minor enhancements and bug fixes

96dab34a

22 Nov, 2019 1 commit
- Need Python 3.6 to use enum.auto() (#70) · 7c0cd26d
  Andrew Kerr authored 5 years ago
  
  7c0cd26d
20 Nov, 2019 4 commits

Removed redundant conjugation operations from matrix_traits. (#65) · 45ecbc88
Andrew Kerr authored 5 years ago

45ecbc88
Improved formatting, clarity, and content of several documents. (#64) · 8aca98f9
Andrew Kerr authored 5 years ago
```
* Improved formatting, clarity, and content of several documents.
```
8aca98f9
Clang GPU compilation requires explicit CUDACC version flags (#63) · f4d9c8f7
Dustyn Blasig authored 5 years ago

f4d9c8f7

CUTLASS 2.0 (#62) · fb335f6a

Andrew Kerr authored 5 years ago

CUTLASS 2.0

Substantially refactored for

- Better performance, particularly for native Turing Tensor Cores
- Robust and durable templates spanning the design space
- Encapsulated functionality embodying modern C++11 programming techniques
- Optimized containers and data types for efficient, generic, portable device code

Updates to:
- Quick start guide
- Documentation
- Utilities
- CUTLASS Profiler

Native Turing Tensor Cores
- Efficient GEMM kernels targeting Turing Tensor Cores
- Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands

Coverage of existing CUTLASS functionality:
- GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs
- Volta Tensor Cores through native mma.sync and through WMMA API
- Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions
- Batched GEMM operations
- Complex-valued GEMMs

Note: this commit and all that follow require a host compiler supporting C++11 or greater.

fb335f6a

10 Jul, 2019 1 commit

Performance enhancement for Volta Tensor Cores TN layout (#53) · b5cab177

Andrew Kerr authored 6 years ago

* Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement.

* Updated patch version and changelog.

* Updated patch version and changelog.

* Added link to changelog in readme.

* Fixed markdown link

b5cab177

13 May, 2019 1 commit
- Merge pull request #47 from Artem-B/cutlass-1.3-clang · eb417359
  Timmy authored 6 years ago
```
Make CUTLASS compileable with Clang.
```
  eb417359
10 May, 2019 1 commit
- Addressed code review comments. · fb8b3a98
  Artem Belevich authored 6 years ago
  
  fb8b3a98
09 May, 2019 1 commit
- Added missing file (#48) · d9d35787
  gthomascollignon authored 6 years ago
  
  d9d35787
02 May, 2019 1 commit

Make CUTLASS compileable with Clang. · e18292db

Artem Belevich authored 6 years ago

Requires a recent clang build (r359248 or newer).

Enable compilation with clang with these options:
cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=/path/to/clang++

e18292db

19 Apr, 2019 1 commit
- cutlass 1.3.1 (#46) · fe3438a3
  Timmy authored 6 years ago
```
CUTLASS 1.3.1 patch resolves failing text with NVRTC.
```
  fe3438a3
20 Mar, 2019 1 commit

Cutlass 1.3 Release (#42) · 877bdcac

Andrew Kerr authored 6 years ago

CUTLASS 1.3 Release
- Efficient GEMM kernel targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.

877bdcac

19 Dec, 2018 3 commits
- Removed patch version from README. · 19a9d64e
  Andrew Kerr authored 6 years ago
```
Removed patch version from README.
```
  19a9d64e
- Merge pull request #38 from NVIDIA/resolve_maxwell · 80e6f7c8
  Andrew Kerr authored 6 years ago
```
Resolved issue for incorrect SGEMM on Maxwell architecture.
```
  80e6f7c8
- Resolved issue for incorrect SGEMM on Maxwell architecture. · 822b0952
  akerr authored 6 years ago
  
  822b0952
26 Oct, 2018 4 commits
- Merge pull request #33 from NVIDIA/cutlass_1.2 · ed2ed4d6
  Andrew Kerr authored 6 years ago
```
CUTLASS 1.2
```
  ed2ed4d6
- Minor edit to CHANGELOG. · 4db423c4
  Andrew Kerr authored 6 years ago
  
  4db423c4
- Updating Doxygen docs · b2bc0d3b
  Andrew Kerr authored 6 years ago
  
  b2bc0d3b
- CUTLASS 1.2 · 74df0331
  akerr authored 6 years ago
  
  74df0331
29 Sep, 2018 2 commits
- Merge pull request #30 from NVIDIA/fix_utilities_example · 2332df49
  Andrew Kerr authored 6 years ago
```
Fixed cutlass_utilities example. 
```
  2332df49
- CUDA 9 lacks host-side conversions from float=>half. Instead, we must... · cfe4b933
  akerr authored 6 years ago
```
CUDA 9 lacks host-side conversions from float=>half. Instead, we must reinterpret_cast<> from cutlass::half_t => half.
```
  cfe4b933
28 Sep, 2018 2 commits
- Merge pull request #28 from NVIDIA/cutlass_1.1 · 6877595a
  Andrew Kerr authored 6 years ago
```
Fixed typeo
```
  6877595a
- Fixed typeo · 69e3709d
  Andrew Kerr authored 6 years ago
```
Fixed typeo
```
  69e3709d
21 Sep, 2018 1 commit
- Merge pull request #26 from NVIDIA/cutlass_1.1 · d419094c
  Andrew Kerr authored 6 years ago
```
Clarification to README
```
  d419094c
20 Sep, 2018 6 commits
- Clarification to README · 1a7ac522
  akerr authored 6 years ago
  
  1a7ac522
- Merge pull request #25 from NVIDIA/cutlass_1.1 · bf6eec53
  Andrew Kerr authored 6 years ago
```
Updated CUTLASS.md
```
  bf6eec53
- Updated copyright of CUTLASS.md · 206e38da
  akerr authored 6 years ago
  
  206e38da
- Merge pull request #24 from NVIDIA/cutlass_1.1 · d85f6a1c
  Andrew Kerr authored 6 years ago
```
Cutlass 1.1
```
  d85f6a1c
- Reduced range of random values to avoid bit-level inconsistencies for large matrices. · 0826572c
  akerr authored 6 years ago
  
  0826572c
- Updated README and CHANGELOG. · 77d1e0ca
  akerr authored 6 years ago
  
  77d1e0ca
19 Sep, 2018 1 commit
- Updated doxygen · d7137f9c
  akerr authored 6 years ago
  
  d7137f9c

GitLab

Menu