{"id":13418,"date":"2026-04-27T10:00:50","date_gmt":"2026-04-27T17:00:50","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/directx\/?p=13418"},"modified":"2026-04-27T10:55:01","modified_gmt":"2026-04-27T17:55:01","slug":"d3d12-linalg-preview","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/directx\/d3d12-linalg-preview\/","title":{"rendered":"D3D12 LinAlg Matrix Preview"},"content":{"rendered":"<p>Welcome to the D3D12 LinAlg Matrix Preview release!<\/p>\n<p>Today, we are excited to announce the preview release for the D3D12 Linear Algebra APIs! This feature set unlocks comprehensive hardware acceleration for Matrix-oriented operations across various use cases. Previously, we announced the <a href=\"https:\/\/devblogs.microsoft.com\/directx\/agility-sdk-1-711\/\">WaveMMA<\/a> and <a href=\"https:\/\/devblogs.microsoft.com\/directx\/cooperative-vector\/\">Cooperative Vectors<\/a> features which supported narrow matrix operation use cases; the LinAlg feature set being announced today subsumes these APIs into a singular set of orthogonal APIs. With today&#8217;s announcement, we are enabling developers to both efficiently drive neural rendering techniques directly from individual shader threads in real-time graphics pipelines and utilize higher bandwidth matrix MMA operations for ML and image processing applications, all in a singular combined API.<\/p>\n<p>The application of machine learning techniques is now ubiquitous across the industry. For graphics development, neural network based rendering methods, which we\u2019ve been calling neural rendering, are quickly growing in popularity. At the same time, offloading high bandwidth matrix compute onto the GPU is unquestionably at an all-time high. As such, GPU vendors continue to adopt and expand specialized hardware for matrix operations, and the new LinAlg Matrix APIs put the power of that hardware into your hands!<\/p>\n<p>This blog post is part of the larger SM6.10 preview announcement. See the <a href=\"https:\/\/devblogs.microsoft.com\/directx\/shader-model-6-10-agilitysdk-720-preview\/\">parent blog post<\/a> for the full feature set. Also<span class=\"TextRun SCXW196244453 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW196244453 BCX8\">\u00a0see the GDC 2026 blog putting this feature in context of the overall ML story for DirectX\u00a0<\/span><\/span><a class=\"Hyperlink SCXW196244453 BCX8\" href=\"https:\/\/devblogs.microsoft.com\/directx\/evolving-directx-for-the-ml-era-on-windows\/\" target=\"_blank\" rel=\"noreferrer noopener\"><span class=\"TextRun Underlined SCXW196244453 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW196244453 BCX8\" data-ccp-charstyle=\"Hyperlink\">here<\/span><\/span><\/a><span class=\"TextRun SCXW196244453 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW196244453 BCX8\">.<\/span><\/span><\/p>\n<hr \/>\n<h3 id=\"motivation\">Motivation<\/h3>\n<p>Unlocking efficient use of the GPU\u2019s specialized matrix hardware is the core motivation for the introduction of the LinAlg Matrix APIs. Thanks to the preview process, we were able to go back to the drawing board and evolve the previous design. Thank you for all the feedback and please keep it coming! We will continue to evolve the LinAlg Matrix APIs over the preview period in response to real world feedback.<\/p>\n<p>The new API supports three modes of operations (called Matrix Scopes), motivating different key matrix use cases:<\/p>\n<p><span style=\"text-decoration: underline;\">MatrixScope::Thread (previously previewed as Cooperative Vectors)<\/span><\/p>\n<p>A thread-scope matrix is able to fit into a traditional rendering pipeline in place of another shader. A potential example of such is running inference on a neural network trained to compute lighting. Replacing only the single shader enables easy adoption of ML techniques. Inference is understood at a high level by the driver, so it can be mapped to dedicated hardware acceleration.<\/p>\n<p><span style=\"text-decoration: underline;\">MatrixScope::Wave (previously previewed as WaveMMA)<\/span><\/p>\n<p>High bandwidth dedicated matrix multiplication hardware is increasingly available in contemporary GPUs. A wave-scope matrix surfaces access to this hardware for complex machine learning and image processing applications. A typical application may employ smaller matrices or manually tiled larger matrices for hardware accelerated matrix-matrix multiplications.<\/p>\n<p><span style=\"text-decoration: underline;\">MatrixScope::ThreadGroup<\/span><\/p>\n<p><span data-teams=\"true\">MatrixScope::ThreadGroup is new to the LinAlg Matrix API.<\/span> It is compatible with all the operations of a wave-scope matrix above serving a different use case. The inputs and weight matrices used in LLM-like networks are much larger than allowed sizes for wave-scope matrices. To serve this case with a wave-scope matrix, manual tiling is mandatory, and for cross-hardware performance, multiple different kernels would be required. Conversely, a threadgroup-scope matrix&#8217;s larger size avoids manual tiling. The tiling decision is shifted to the driver, allowing you to ship a single implementation while still retaining optimal tiling.<\/p>\n<hr \/>\n<h3 id=\"feature-overview\">Feature Overview<\/h3>\n<p>Shader Model 6.10 introduces high level linear algebra APIs building on top of the <a href=\"https:\/\/github.com\/microsoft\/hlsl-specs\/blob\/main\/proposals\/0026-hlsl-long-vector-type.md\">Long Vectors<\/a> and <a href=\"https:\/\/github.com\/microsoft\/hlsl-specs\/blob\/main\/proposals\/0030-dxil-vectors.md\">Native DXIL Vectors<\/a> features released as part of SM6.9. The high level API is converted into a &#8220;mid level&#8221; API consumed by the driver.\u00a0 The mid level API maintains high levels of context, enabling the driver to take advantage of the underlying hardware capabilities. Meanwhile, the high level API enables better source level usage rules and fast iteration. This API is centered on the new Matrix type provided as a permissively licensed HLSL source header. Depending on the declaration of a specific Matrix instance, various operations are enabled or disabled at compilation time. For example, <span data-teams=\"true\">MatrixScope::Thread<\/span> is roughly limited to matrix-vector operations, while <span data-teams=\"true\">MatrixScope::Wave<\/span> and <span data-teams=\"true\">MatrixScope::<\/span>ThreadGroup are roughly limited to matrix-matrix operations. You can view the full table of available operations in the <a href=\"https:\/\/github.com\/microsoft\/hlsl-specs\/blob\/main\/proposals\/0035-linalg-matrix.md\">LinAlg spec<\/a>.<\/p>\n<h3 id=\"feature-overview\">Code Examples<\/h3>\n<p>Below are examples of primary use cases for the <code>Matrix<\/code> header, each serving a different goal with different available operations. You can find some of these examples\u00a0<a href=\"https:\/\/github.com\/llvm-beanz\/linalg-examples\">here<\/a> on GitHub.<\/p>\n<h4>Cooperative Vectors Example<\/h4>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">\/\/ Compiled with the line below\r\n\/\/   bin\/dxc -I .\/include\/hlsl -T cs_6_10 -enable-16bit-types coop-vec-example.hlsl\r\n\r\n\/\/ System header containing the LinAlg Matrix APIs\r\n#include &lt;dx\/linalg.h&gt;\r\n\r\n\/\/ The API is nested under dx::linalg. Simplify the example by using it\r\nusing namespace dx::linalg;\r\n\r\n\/\/ Byte Address Buffer to load\/store the matrices\r\nByteAddressBuffer InBuff : register(t0);\r\n\r\n[numthreads(8, 1, 1)]\r\n[shader(\"compute\")]\r\nvoid main() {\r\n  \/\/ The Matrix type names can get quite long. Alias them for readability\r\n  \/\/ Looking at the template arguments we have:\r\n  \/\/   ComponentType::F16 - The matrix holds and F16 type\r\n  \/\/   16 - The M dimension of the matrix is 16\r\n  \/\/   16 - The N dimension of the matrix is 16\r\n  \/\/   MatrixUse::A - The Matrix is an \"A\" matrix, so it only fits into the \"A\"\r\n  \/\/     slot of various functions\r\n  \/\/   MatrixScope::Thread - The Matrix is a \"Thread\" matrix, so it may only be\r\n  \/\/     used with \"Thread Matrix\" operations. These are the operations \r\n  \/\/     previously covered under the Cooperative Vector API\r\n  using MatrixATy =\r\n      Matrix&lt;ComponentType::F16, 16, 16, MatrixUse::A, MatrixScope::Thread&gt;;\r\n\r\n  \/\/ Setup data for later by loading the matrix and creating null vectors\r\n  vector&lt;float16_t, 16&gt; Vec = (vector&lt;float16_t, 16&gt;)0;\r\n  vector&lt;float16_t, 16&gt; Bias = (vector&lt;float16_t, 16&gt;)0;\r\n  MatrixATy MatA = MatrixATy::Load&lt;MatrixLayout::RowMajor&gt;(\r\n      InBuff, 0, \/* Row stride = number of columns * element size *\/ 16 * 2);\r\n\r\n  \/\/ Do a F16 Matrix x Vector multiply\r\n  vector&lt;float16_t, 16&gt; Layer1 = Multiply&lt;float16_t&gt;(MatA, Vec);\r\n\r\n  \/\/ Do a F16 Matrix x Vector multiply with a bias Vector\r\n  vector&lt;float16_t, 16&gt; Layer2 = MultiplyAdd&lt;float16_t&gt;(MatA, Layer1, Bias);\r\n\r\n  \/\/ Create a reference to an in-memory vector at offset 4096 in InBuff\r\n  \/\/ without actually loading it in\r\n  VectorRef&lt;ComponentType::F8_E4M3FN, 16&gt; MemBias = {InBuff,\r\n                                                     \/*start offset*\/ 4096};\r\n\r\n  \/\/ Do a F16 Matrix x Vector multiply with a bias vector stored in memory\r\n  vector&lt;float16_t, 16&gt; Layer3 = MultiplyAdd&lt;float16_t&gt;(MatA, Layer2, MemBias);\r\n\r\n  \/\/ Create some packed data\r\n  vector&lt;uint8_t4_packed, 4&gt; SomeData = (vector&lt;uint8_t4_packed, 4&gt;)0;\r\n\r\n  \/\/ Do a MatVecMulAdd but reinterpret the Vec data as F8_F8_E4M3FN with a bias\r\n  \/\/ stored in memory\r\n  vector&lt;float16_t, 16&gt; Layer4 = MultiplyAdd&lt;float16_t&gt;(\r\n      MatA, MakeInterpretedVector&lt;ComponentType::F8_E4M3FN&gt;(SomeData), MemBias);\r\n  \/\/ Do a MatVecMulAdd but reinterpret the Vec data as F8_E4M3FN with a regular\r\n  \/\/ bias vector\r\n  vector&lt;float16_t, 16&gt; Layer5 = MultiplyAdd&lt;float16_t&gt;(\r\n      MatA, MakeInterpretedVector&lt;ComponentType::F8_E4M3FN&gt;(SomeData), Bias);\r\n\r\n  \/\/ Create some uint data\r\n  vector&lt;uint, 16&gt; SomeData2 = (vector&lt;uint, 16&gt;)0;\r\n\r\n  \/\/ Do a MatVecMulAdd but convert SomeData2 from a U32 to a F8_E4M3FN first\r\n  vector&lt;float16_t, 16&gt; Layer6 = MultiplyAdd&lt;float16_t&gt;(\r\n      MatA, Convert&lt;ComponentType::F8_E4M3FN, ComponentType::U32&gt;(SomeData2),\r\n      MemBias);\r\n}<\/code><\/pre>\n<h4>OuterProduct and InterlockedAccumulate Example<\/h4>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">\/\/ Compiled with the line below\r\n\/\/   bin\/dxc -I .\/include\/hlsl -T cs_6_10 -enable-16bit-types outerproduct-example.hlsl\r\n\r\n\/\/ System header containing the LinAlg Matrix APIs\r\n#include &lt;dx\/linalg.h&gt;\r\n\r\n\/\/ The API is nested under dx::linalg. Simplify the example by using it\r\nusing namespace dx::linalg;\r\n\r\n\/\/ Byte Address Buffer to load\/store from\r\nRWByteAddressBuffer OutBuff : register(u0);\r\n\r\n[numthreads(8, 1, 1)]\r\n[shader(\"compute\")]\r\nvoid main() {\r\n  \/\/ The Matrix type names can get quite long. Alias them for readability\r\n  \/\/ Looking at the template arguments we have:\r\n  \/\/   ComponentType::F16 - The matrix holds and F16 type\r\n  \/\/   16 - The M dimension of the matrix is 16\r\n  \/\/   8 - The N dimension of the matrix is 8\r\n  \/\/   MatrixUse::Accumulator - The Matrix is an \"Accumulator\" matrix, so it\r\n  \/\/     only fits into the \"Accumulator\" slot of various functions\r\n  \/\/   MatrixScope::Thread - The Matrix is a \"Thread\" matrix, so it may only be\r\n  \/\/     used with \"Thread Matrix\" operations. \r\n  using MatrixAccumTy = Matrix&lt;ComponentType::F16, 16, 8,\r\n                               MatrixUse::Accumulator, MatrixScope::Thread&gt;;\r\n\r\n  \/\/ Create some F16 vectors with placeholder data\r\n  vector&lt;float16_t, 16&gt; VecA = (vector&lt;float16_t, 16&gt;)0;\r\n  vector&lt;float16_t, 8&gt; VecB = (vector&lt;float16_t, 8&gt;)0;\r\n\r\n  \/\/ Create an Accum matrix by outer producting the two vectors\r\n  MatrixAccumTy MatAcc =\r\n      OuterProduct&lt;ComponentType::F16&gt;(VecA, VecB);\r\n\r\n  \/\/ Atomically accumulate the result into the output buffer\r\n  MatAcc.InterlockedAccumulate(OutBuff, 0);\r\n}<\/code><\/pre>\n<h4>Wave Matrix Example<\/h4>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">\/\/ Compiled with the line below\r\n\/\/   bin\/dxc -I .\/include\/hlsl -T cs_6_10 -enable-16bit-types linalg-wave.hlsl\r\n\r\n\/\/ System header containing the LinAlg Matrix APIs\r\n#include &lt;dx\/linalg.h&gt;\r\n\r\n\/\/ The API is nested under dx::linalg. Simplify the example by using it\r\nusing namespace dx::linalg;\r\n\r\n\/\/ This shader performs matrix multiplication C = \u03b1*A*B + \u03b2*C\r\n\/\/ where A, B, and C are matrices of dimensions MxK, KxN, and MxN respectively.\r\n\/\/ The shader uses wave-level parallelism to compute tiles of the output matrix\r\n\/\/ C. Each wave computes a TILE_SIZExTILE_SIZE tile of C. The dispatch must\r\n\/\/ allocate waves for each tile of the MxN output matrix.\r\n\r\n\/\/ GEMM constants\r\ncbuffer GemmConstants : register(b0)\r\n{\r\n    float alpha;    \/\/ Scalar multiplier for A*B\r\n    float beta;     \/\/ Scalar multiplier for existing C\r\n}\r\n\r\nByteAddressBuffer MatrixA;\r\nByteAddressBuffer MatrixB;\r\nRWByteAddressBuffer MatrixC;\r\n\r\n\/\/ Matrix dimensions - can be configured as needed\r\n#define M 1024    \/\/ Rows in A and C\r\n#define N 1024    \/\/ Columns in B and C\r\n#define K 1024    \/\/ Columns in A, rows in B\r\n#define TILE_SIZE 16\r\n\r\n\/\/ Optimized GEMM using wave-level parallelism\r\n[numthreads(TILE_SIZE, 1, 1)]\r\nvoid main(uint3 group_id : SV_GroupID)\r\n{\r\n    \/\/ Matrix type definitions for wave scope\r\n    using MatrixATy = Matrix&lt;ComponentType::F16, TILE_SIZE, TILE_SIZE, MatrixUse::A, MatrixScope::Wave&gt;;\r\n    using MatrixBTy = Matrix&lt;ComponentType::F16, TILE_SIZE, TILE_SIZE, MatrixUse::B, MatrixScope::Wave&gt;;\r\n    using MatrixResultTy = Matrix&lt;ComponentType::F32, TILE_SIZE, TILE_SIZE, MatrixUse::Accumulator, MatrixScope::Wave&gt;;\r\n\r\n    \/\/ Calculate tile coordinates for this thread group\r\n    uint tile_row = group_id.y;\r\n    uint tile_col = group_id.x;\r\n\r\n    \/\/ Initialize accumulator\r\n    MatrixResultTy c_tile = MatrixResultTy::Splat(0.0f);\r\n\r\n    \/\/ Perform tiled matrix multiplication across K dimension\r\n    for (uint k = 0; k &lt; K; k += TILE_SIZE)\r\n    {\r\n        \/\/ Calculate byte offsets for A and B tiles\r\n        uint a_offset = ((tile_row * TILE_SIZE) * K + k) * sizeof(half);\r\n        uint b_offset = (k * N + (tile_col * TILE_SIZE)) * sizeof(half);\r\n\r\n        \/\/ Load A and B tiles for this K iteration using ByteAddressBuffer\r\n        MatrixATy a_k_tile = MatrixATy::Load(\r\n            MatrixA, a_offset, K * sizeof(half), MatrixLayout::RowMajor);\r\n\r\n        MatrixBTy b_k_tile = MatrixBTy::Load(\r\n            MatrixB, b_offset, N * sizeof(half), MatrixLayout::RowMajor);\r\n\r\n        \/\/ Multiply and accumulate with mixed precision (half inputs -&gt; float accumulation)\r\n        c_tile.MultiplyAccumulate(a_k_tile, b_k_tile);\r\n    }\r\n\r\n    \/\/ Calculate output offset for GEMM equation: C = \u03b1*A*B + \u03b2*C\r\n    uint c_offset = ((tile_row * TILE_SIZE) * N + (tile_col * TILE_SIZE)) * sizeof(float);\r\n\r\n    \/\/ Load existing C tile\r\n    MatrixResultTy c_existing = MatrixResultTy::Load(MatrixC, c_offset, N * sizeof(float), MatrixLayout::RowMajor);\r\n\r\n    \/\/ Apply GEMM scaling element-wise: \u03b1*A*B + \u03b2*C\r\n    for (uint i = 0; i &lt; c_tile.Length(); i++) {\r\n        float ab_val = c_tile.Get(i);\r\n        float c_val = c_existing.Get(i);\r\n        float result = alpha * ab_val + beta * c_val;\r\n        c_tile.Set(i, result);\r\n    }\r\n\r\n    c_tile.Store(MatrixC, c_offset, N * sizeof(float), MatrixLayout::RowMajor);\r\n}<\/code><\/pre>\n<h4>ThreadGroup Matrix Example<\/h4>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">\/\/ Compiled with the line below\r\n\/\/   bin\/dxc -I .\/include\/hlsl -T cs_6_10 -enable-16bit-types linalg-threadgroup.hlsl\r\n\r\n\/\/ System header containing the LinAlg Matrix APIs\r\n#include &lt;dx\/linalg.h&gt;\r\n\r\n\/\/ The API is nested under dx::linalg. Simplify the example by using it\r\nusing namespace dx::linalg;\r\n\r\n\/\/ This shader performs matrix multiplication C = \u03b1*A*B + \u03b2*C\r\n\/\/ where A, B, and C are matrices of dimensions MxK, KxN, and MxN respectively.\r\n\/\/ The shader uses threadgroup-level parallelism to compute tiles of the output\r\n\/\/ matrix C. The GPU driver will generate code to split the matrix into optimal\r\n\/\/ tiles based on the hardware capabilities.\r\n\r\n\/\/ GEMM constants\r\ncbuffer GemmConstants : register(b0)\r\n{\r\n    float alpha;    \/\/ Scalar multiplier for A*B\r\n    float beta;     \/\/ Scalar multiplier for existing C\r\n}\r\n\r\nByteAddressBuffer MatrixA;\r\nByteAddressBuffer MatrixB;\r\nRWByteAddressBuffer MatrixC;\r\n\r\n\/\/ Matrix dimensions - can be configured as needed\r\n#define M 1024    \/\/ Rows in A and C\r\n#define N 1024    \/\/ Columns in B and C\r\n#define K 1024    \/\/ Columns in A, rows in B\r\n\r\n\/\/ Optimized GEMM using threadgroup-level parallelism\r\n[numthreads(1024, 1, 1)]\r\nvoid main()\r\n{\r\n    \/\/ Matrix type definitions for threadgroup scope\r\n    using MatrixATy = Matrix&lt;ComponentType::F16, M, K, MatrixUse::A, MatrixScope::ThreadGroup&gt;;\r\n    using MatrixBTy = Matrix&lt;ComponentType::F16, N, K, MatrixUse::B, MatrixScope::ThreadGroup&gt;;\r\n    using MatrixResultTy = Matrix&lt;ComponentType::F32, M, N, MatrixUse::Accumulator, MatrixScope::ThreadGroup&gt;;\r\n\r\n    MatrixATy a_matrix = MatrixATy::Load(MatrixA, 0, K * sizeof(half), MatrixLayout::RowMajor);\r\n\r\n    MatrixBTy b_matrix = MatrixBTy::Load(MatrixB, 0, N * sizeof(half), MatrixLayout::RowMajor);\r\n\r\n    \/\/ Load existing C matrix for GEMM equation: C = \u03b1*A*B + \u03b2*C\r\n    MatrixResultTy c_existing = MatrixResultTy::Load(MatrixC, 0, N * sizeof(float), MatrixLayout::RowMajor);\r\n\r\n    \/\/ Compute A*B\r\n    MatrixResultTy ab_result = Multiply&lt;ComponentType::F32&gt;(a_matrix, b_matrix);\r\n\r\n    \/\/ Apply GEMM scaling element-wise: \u03b1*A*B + \u03b2*C\r\n    for (uint i = 0; i &lt; ab_result.Length(); i++) {\r\n        float ab_val = ab_result.Get(i);\r\n        float c_val = c_existing.Get(i);\r\n        float result = alpha * ab_val + beta * c_val;\r\n        ab_result.Set(i, result);\r\n    }\r\n\r\n    ab_result.Store(MatrixC, 0, N * sizeof(float), MatrixLayout::RowMajor);\r\n}<\/code><\/pre>\n<hr \/>\n<h3 id=\"data-preparation\">Data Preparation<\/h3>\n<p>There are a couple of D3D methods for converting weight and bias matrix data between formats:<\/p>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">enum D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT {\r\n\u00a0 \u00a0 D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_ROW_MAJOR,\r\n\u00a0 \u00a0 D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_COLUMN_MAJOR,\r\n\u00a0 \u00a0 D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_MUL_OPTIMAL,\r\n\u00a0 \u00a0 D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_OUTER_PRODUCT_OPTIMAL\r\n}<\/code><\/pre>\n<p>For instance,\u00a0<code class=\" prettyprinted\"><span class=\"pln\">D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_MUL_OPTIMAL<\/span><\/code> is a device-specific layout for optimal use with the Matrix-Vector operations such as <code class=\" prettyprinted\"><span class=\"typ\">MultiplyAdd<\/span><\/code>\u00a0in the code example above.<\/p>\n<p>See\u00a0<code class=\"language-cpp prettyprinted\"><span class=\"pln\">ID3D12DevicePreview<\/span><span class=\"pun\">::<\/span><span class=\"typ\">GetLinearAlgebraMatrixConversionDestinationInfo<\/span><span class=\"pun\">()<\/span><\/code>\u00a0and\u00a0<code class=\"language-cpp prettyprinted\"><span class=\"pln\">ID3D12CommandListPreview<\/span><span class=\"pun\">::<\/span><span class=\"typ\">ConvertLinearAlgebraMatrix<\/span><span class=\"pun\">()<\/span><\/code> in the D3D LinAlg spec <a href=\"https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/D3D12LinearAlgebraRuntimeFeatureSupport.html#convert-matrix-to-desired-layout-and-type\">here<\/a>.<\/p>\n<hr \/>\n<h3>Get Running<\/h3>\n<p>LinAlg is part of Shader Model 6.10, currently in preview. This requires:<\/p>\n<ul>\n<li>AgilitySDK 1.720.1-preview available <a href=\"https:\/\/devblogs.microsoft.com\/directx\/directx12agility\/\">here<\/a>.<\/li>\n<li>Preview Shader Model 6.10 support in DXC available <a href=\"https:\/\/github.com\/microsoft\/DirectXShaderCompiler\/releases\/tag\/v1.10.2605.2\">here<\/a>.<\/li>\n<\/ul>\n<p><strong>Device Support<\/strong>:<\/p>\n<table style=\"border-collapse: collapse; width: 100%; height: 96px;\">\n<tbody>\n<tr style=\"height: 24px;\">\n<td style=\"width: 3.02804%; height: 24px;\"><strong>NVIDIA:<\/strong><\/td>\n<td style=\"width: 96.972%; height: 24px;\"><span class=\"TextRun SCXW181737100 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW181737100 BCX8\">Contact your developer relations representative for in-development driver access.<\/span><\/span><span class=\"EOP Selected SCXW181737100 BCX8\" data-ccp-props=\"{}\">\u00a0<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 3.02804%; height: 24px;\"><strong>Intel:<\/strong><\/td>\n<td style=\"width: 96.972%; height: 24px;\"><span class=\"TextRun SCXW132095758 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW132095758 BCX8\">Support planned\u00a0<\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW132095758 BCX8\">in<\/span><span class=\"NormalTextRun SCXW132095758 BCX8\"> an upcoming release.<\/span><\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 3.02804%; height: 24px;\"><strong>AMD:<\/strong><\/td>\n<td style=\"width: 96.972%; height: 24px;\"><span data-teams=\"true\"><a class=\"Hyperlink SCXW192558975 BCX8\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.amd.com%2Fen%2Fresources%2Fsupport-articles%2Frelease-notes%2FRN-RAD-MS-AGILITY-SDK-25-30-41-02.html&amp;data=05%7C02%7Cserenatang%40microsoft.com%7C5ba4181003e84d9cdd7708de9fb8ef38%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639123813888051663%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=68OLcbvKSYEDGqFWlcEKgc2CGRDR48oFdcnXEY2iCcU%3D&amp;reserved=0\" target=\"_blank\" rel=\"noreferrer noopener\"><span class=\"TextRun Underlined SCXW192558975 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW192558975 BCX8\" data-ccp-charstyle=\"Hyperlink\">AMD Software: AgilitySDK Developer Preview Edition 25.30.41.02<\/span><\/span><\/a><span class=\"EOP Selected SCXW192558975 BCX8\" data-ccp-props=\"{}\">\u00a0<\/span><\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 3.02804%; height: 24px;\"><strong>WARP:<\/strong><\/td>\n<td style=\"width: 96.972%; height: 24px;\">Available on latest WARP software rasterizer preview, available <a href=\"https:\/\/www.nuget.org\/packages\/Microsoft.Direct3D.WARP\/1.0.20-preview\">here<\/a>.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<hr \/>\n<h3>Checking for Support<\/h3>\n<p>To enable the LinAlg preview with the AgilitySDK from above, in code turn on experimental feature support before creating a D3D12 device:<\/p>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">UUID Features[] = { D3D12ExperimentalShaderModels };\r\nThrowIfFailed(D3D12EnableExperimentalFeatures(_countof(Features), Features, nullptr, nullptr));\r\n<\/code><\/pre>\n<p>The API provides many different dimensions of hardware support. To fully explore the granular API, see the documentation <a href=\"https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/D3D12LinearAlgebraRuntimeFeatureSupport.html#granular-capability-query-api\">here<\/a>.<\/p>\n<p>To quickly get started with LinAlg, query the device for Tier 1 support:<\/p>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">D3D12_FEATURE_DATA_LINEAR_ALGEBRA_SUPPORT linearAlgebraSupport = {};\r\nHRESULT hr = device-&gt;CheckFeatureSupport(\r\n    D3D12_FEATURE_LINEAR_ALGEBRA_SUPPORT,\r\n    &amp;linearAlgebraSupport,\r\n    sizeof(linearAlgebraSupport));\r\n\r\nif (SUCCEEDED(hr) &amp;&amp; linearAlgebraSupport.LinearAlgebraTier &gt;= D3D12_LINEAR_ALGEBRA_TIER_1)\r\n{\r\n    \/\/ Device supports Tier 1 linear algebra operations\r\n}<\/code><\/pre>\n<p>Supported Tier 1 features are listed <a href=\"https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/D3D12LinearAlgebraRuntimeFeatureSupport.html#tier-1-support\">here<\/a>. Other tiered levels of support are also found in that spec.<\/p>\n<hr \/>\n<h3>PIX<\/h3>\n<p>As usual, Day One PIX support is available. Check <a href=\"https:\/\/devblogs.microsoft.com\/pix\/\">here<\/a> for the latest information.<\/p>\n<hr \/>\n<h3>Content from GPU Vendors<\/h3>\n<h4>AMD<\/h4>\n<p><span class=\"TextRun SCXW22075353 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW22075353 BCX8\">Linear Algebra Matrix is supported on\u202fAMD Radeon\u2122 RX 9000 series graphics products using the\u202f<\/span><\/span><a class=\"Hyperlink SCXW22075353 BCX8\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.amd.com%2Fen%2Fresources%2Fsupport-articles%2Frelease-notes%2FRN-RAD-MS-AGILITY-SDK-25-30-41-02.html&amp;data=05%7C02%7Cserenatang%40microsoft.com%7C5ba4181003e84d9cdd7708de9fb8ef38%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C639123813888051663%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=68OLcbvKSYEDGqFWlcEKgc2CGRDR48oFdcnXEY2iCcU%3D&amp;reserved=0\" target=\"_blank\" rel=\"noreferrer noopener\"><span class=\"TextRun Underlined SCXW22075353 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW22075353 BCX8\" data-ccp-charstyle=\"Hyperlink\">AMD Software: AgilitySDK Developer Preview Edition 25.30.41.02<\/span><\/span><\/a><span class=\"TextRun SCXW22075353 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW22075353 BCX8\">\u202f<\/span><\/span><span class=\"TextRun SCXW22075353 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW22075353 BCX8\">driver<\/span><span class=\"NormalTextRun SCXW22075353 BCX8\">.<\/span><\/span><span class=\"EOP Selected SCXW22075353 BCX8\" data-ccp-props=\"{&quot;134245418&quot;:true,&quot;134245529&quot;:true}\">\u00a0<\/span><\/p>\n<h4>Intel<\/h4>\n<blockquote><p><span data-contrast=\"auto\">We\u2019re working on a Linear Algebra implementation leveraging our XMX cores and are expect it to share it with ISVs later this year. This new API replaces cooperative vectors and enables efficient use of vector-matrix and matrix-matrix multiplication. It\u2019s a key enabler for neural rendering techniques like texture set neural compression and more. We\u2019re excited to see how developers will leverage this capability and can\u2019t wait to see all the new cool rendering algorithms that will be developed on top of it!<\/span><\/p>\n<p><span data-contrast=\"auto\">\u2013\u00a0Matth\u00e4us\u00a0Chajdas, Senior Principal Engineer<\/span><\/p><\/blockquote>\n<h4>NVIDIA<\/h4>\n<p><span class=\"TextRun SCXW25245764 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW25245764 BCX8\">Contact your developer relations representative for in-development driver access.<\/span><\/span><\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome to the D3D12 LinAlg Matrix Preview release! Today, we are excited to announce the preview release for the D3D12 Linear Algebra APIs! This feature set unlocks comprehensive hardware acceleration for Matrix-oriented operations across various use cases. Previously, we announced the WaveMMA and Cooperative Vectors features which supported narrow matrix operation use cases; the LinAlg [&hellip;]<\/p>\n","protected":false},"author":211849,"featured_media":12651,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-13418","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-directx"],"acf":[],"blog_post_summary":"<p>Welcome to the D3D12 LinAlg Matrix Preview release! Today, we are excited to announce the preview release for the D3D12 Linear Algebra APIs! This feature set unlocks comprehensive hardware acceleration for Matrix-oriented operations across various use cases. Previously, we announced the WaveMMA and Cooperative Vectors features which supported narrow matrix operation use cases; the LinAlg [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/13418","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/users\/211849"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/comments?post=13418"}],"version-history":[{"count":2,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/13418\/revisions"}],"predecessor-version":[{"id":13637,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/13418\/revisions\/13637"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media\/12651"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media?parent=13418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/categories?post=13418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/tags?post=13418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}