<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Chip Log]]></title><description><![CDATA[Deep dives into memory, compute, and all things ASIC/SoC engineering]]></description><link>https://www.chiplog.io</link><image><url>https://substackcdn.com/image/fetch/$s_!fd69!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681da9cd-7cda-4874-a53a-49921c6fb514_180x180.png</url><title>Chip Log</title><link>https://www.chiplog.io</link></image><generator>Substack</generator><lastBuildDate>Mon, 04 May 2026 14:13:53 GMT</lastBuildDate><atom:link href="https://www.chiplog.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Chip Log]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[chiplog@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[chiplog@substack.com]]></itunes:email><itunes:name><![CDATA[Subbu]]></itunes:name></itunes:owner><itunes:author><![CDATA[Subbu]]></itunes:author><googleplay:owner><![CDATA[chiplog@substack.com]]></googleplay:owner><googleplay:email><![CDATA[chiplog@substack.com]]></googleplay:email><googleplay:author><![CDATA[Subbu]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why speculative decoding wants two kinds of silicon: NVIDIA, Groq, d-Matrix, Gimlet Labs, NVLink Fusion]]></title><description><![CDATA[Disaggregated speculative decoding and why the new class of accelerators are a better fit than GPUs]]></description><link>https://www.chiplog.io/p/why-speculative-decoding-wants-two</link><guid isPermaLink="false">https://www.chiplog.io/p/why-speculative-decoding-wants-two</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sat, 14 Mar 2026 12:02:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1d3e2c1e-ddd2-4ae5-bd5c-fce884ea1c90_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the last few years, most of the conversation in AI hardware has understandably centered on training. Bigger clusters, higher bandwidth fabrics, more FLOPs, and ever-larger memory.</p><p>Inference is now forcing a different kind of conversation.</p><p>LLM inference has two phases, prefill and decode. Prefill processes the prompt and builds the initial KV cache. Decode is what happens next: the model generates one token, feeds that token back in, then generates the next token, and so on. That loop (one token at a time) is inherently serial, which is why decode often becomes the limiting step in production inference. It shows up directly as <strong>user-visible latency</strong>, because it determines how long it takes to produce a complete response. The technical term for this is <strong>autoregressive decoding</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O5ag!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O5ag!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 424w, https://substackcdn.com/image/fetch/$s_!O5ag!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 848w, https://substackcdn.com/image/fetch/$s_!O5ag!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 1272w, https://substackcdn.com/image/fetch/$s_!O5ag!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O5ag!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png" width="569" height="172.23296354992075" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf5e6c7c-3748-4624-8391-5d87371d81d7_1262x382.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:382,&quot;width&quot;:1262,&quot;resizeWidth&quot;:569,&quot;bytes&quot;:58873,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf5e6c7c-3748-4624-8391-5d87371d81d7_1262x382.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O5ag!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 424w, https://substackcdn.com/image/fetch/$s_!O5ag!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 848w, https://substackcdn.com/image/fetch/$s_!O5ag!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 1272w, https://substackcdn.com/image/fetch/$s_!O5ag!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ab0a6f-5295-4340-addd-e1832a631812_1262x382.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Two phases of LLM inference - Prefill and Decode. Decode is serial by nature.</figcaption></figure></div><p>One of the most promising responses to the serial nature of the decode phase is <strong>speculative decoding</strong>. Instead of asking an expensive trillion-parameter model to do all the work, it splits the decode phase into two parts:</p><ul><li><p>A smaller, faster &#8220;<strong>draft</strong> <strong>model</strong>&#8221; that proposes tokens quickly, and</p></li><li><p>A larger &#8220;<strong>target</strong> <strong>model&#8221; </strong>that verifies these proposals and decides which tokens to accept and commit.</p></li></ul><p>Once you look at inference in this way, the hardware implications become clear. <strong>Drafting and verification have different memory, compute, and latency requirements.</strong> The <em>target model</em> (larger parent) benefits from the high-throughput characteristics of large GPU systems. The <em>draft model</em> (smaller child) often benefits from hardware that is optimized for low-latency, small batches, and fast token-by-token generation. As a result, the two models can benefit from <strong>very different silicon architectures,</strong> each tuned to its part of the pipeline.</p><p>That&#8217;s why I find the recent NVIDIA&#8211;Groq deal strategically interesting. <em>It&#8217;s less about any single chip, and more about what it signals</em><strong>.</strong> The industry is treating draft-model-class inference seriously, and speculative decoding is one of the cleanest system-level mechanisms to turn that into lower latency and better economics.</p><p>This post systematically explores this topic and is organized as follows:</p><ul><li><p>A simple explanation of speculative decoding, a metric called <em>acceptance rate,</em> and why it matters</p></li><li><p>Why disaggregating the draft and target models is a system-level win</p></li><li><p>Why running draft models on Blackwell or Rubin GPUs can be an awkward fit, and why specialized architectures like d-Matrix and Groq are a strong alternative</p></li><li><p>3D DRAM based XPUs versus HBM based GPUs</p></li><li><p>NVLink Fusion and how it can enable heterogeneous chip ecosystems</p></li><li><p>What speculative decoding means for this era of Agentic AI</p></li><li><p>Conclusion &#8212; SRAM vs 3D DRAM vs HBM</p></li></ul><p><em>Disclosure: I work at d-Matrix. This post reflects my personal views and does not represent the views of the company.</em></p><h1>Speculative decoding, simply explained</h1><p>In traditional LLM serving, you deploy a single model, say <strong>Llama-70B</strong>, and it generates <em>every</em> token itself. During the decode phase, it repeats the same serial loop over and over. Generate a token, append it, run again, and continue until the answer is complete.</p><p><strong>Speculative decoding</strong>, an idea introduced by <a href="https://arxiv.org/pdf/2211.17192">Google</a> and <a href="https://arxiv.org/pdf/2302.01318">DeepMind</a> in 2023, discovered that this large model does not need to generate <em>every</em> token by itself. Instead, the end-to-end latency can be reduced while keeping output quality high by splitting decode into two phases.</p><ul><li><p><strong>Draft phase:</strong> A small model that closely approximates the large model (for example, Llama-7B) generates the next <em>K </em>tokens sequentially, but quickly.</p></li><li><p><strong>Target phase:</strong> The larger, more accurate model (for example, Llama-70B) verifies those <em>K</em> draft tokens in a single pass, treating them as a batch, and decides which tokens to accept and commit. The rest are rejected.</p></li></ul><p>Then the process repeats. Draft a few more tokens, verify the next chunk, commit what&#8217;s accepted. This <strong>draft &#8594; verify loop</strong> continues until the full response is generated.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zYeY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zYeY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 424w, https://substackcdn.com/image/fetch/$s_!zYeY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 848w, https://substackcdn.com/image/fetch/$s_!zYeY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 1272w, https://substackcdn.com/image/fetch/$s_!zYeY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zYeY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png" width="1402" height="462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b66a0ea1-3dbb-457c-889a-0881cec4124d_1402x462.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:462,&quot;width&quot;:1402,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85496,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66a0ea1-3dbb-457c-889a-0881cec4124d_1402x462.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zYeY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 424w, https://substackcdn.com/image/fetch/$s_!zYeY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 848w, https://substackcdn.com/image/fetch/$s_!zYeY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 1272w, https://substackcdn.com/image/fetch/$s_!zYeY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F656007a1-15f7-4409-9947-d7637ed991b4_1402x462.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The figure below, from Google&#8217;s paper, shows how speculative decoding reduces wall-clock time compared to standard inference. This is quite promising because it can deliver meaningful gains in both latency and throughput <strong>without sacrificing output quality</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ceyq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ceyq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 424w, https://substackcdn.com/image/fetch/$s_!ceyq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 848w, https://substackcdn.com/image/fetch/$s_!ceyq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!ceyq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ceyq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png" width="1456" height="514" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11e19174-cdb7-4e7b-be21-a57bb87e1ea0_2844x1004.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:514,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:332316,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e19174-cdb7-4e7b-be21-a57bb87e1ea0_2844x1004.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ceyq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 424w, https://substackcdn.com/image/fetch/$s_!ceyq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 848w, https://substackcdn.com/image/fetch/$s_!ceyq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!ceyq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3b0a697-dcd5-4399-8e6e-51ad30d2c16a_2844x1004.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>K</em>, the number of draft tokens,<em> </em>cannot be increased indefinitely. There is a point of diminishing returns, as the rejection rate increases. [Source: <a href="https://arxiv.org/pdf/2211.17192">Google</a>. Annotations by Chiplog]</figcaption></figure></div><h3>The promise of speculative decoding</h3><p>A <a href="https://developer.nvidia.com/blog/tensorrt-llm-speculative-decoding-boosts-inference-throughput-by-up-to-3-6x/">technical post by </a><strong><a href="https://developer.nvidia.com/blog/tensorrt-llm-speculative-decoding-boosts-inference-throughput-by-up-to-3-6x/">NVIDIA</a></strong><a href="https://developer.nvidia.com/blog/tensorrt-llm-speculative-decoding-boosts-inference-throughput-by-up-to-3-6x/"> from </a><strong><a href="https://developer.nvidia.com/blog/tensorrt-llm-speculative-decoding-boosts-inference-throughput-by-up-to-3-6x/">Dec 2024</a></strong> shows just how dramatic speculative decoding can be in practice.</p><ul><li><p>Llama-70B on one Hopper H200 yields about <strong>51 tokens/sec</strong> without speculative decoding. But when they pair it with a tiny Llama-1B draft model, still on the same GPU, the throughput jumps to about <strong>146 tokens/sec</strong>.</p></li><li><p>For a much larger model, Llama-405B on 4&#215; H200, they show about <strong>33 tokens/sec</strong> without speculative decoding. With speculative decoding using a Llama-3B draft model, throughput jumps roughly <strong>3.6&#215;</strong> to about <strong>120 tokens/sec</strong>.</p></li></ul><p><em>A 3x improvement in inference throughput from one technique is quite incredible.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KY35!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KY35!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 424w, https://substackcdn.com/image/fetch/$s_!KY35!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 848w, https://substackcdn.com/image/fetch/$s_!KY35!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 1272w, https://substackcdn.com/image/fetch/$s_!KY35!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KY35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png" width="1456" height="424" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:424,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:353499,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KY35!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 424w, https://substackcdn.com/image/fetch/$s_!KY35!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 848w, https://substackcdn.com/image/fetch/$s_!KY35!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 1272w, https://substackcdn.com/image/fetch/$s_!KY35!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014b2264-75f0-4478-a120-dcfe7f30881f_2226x648.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Speculative Decoding inference with Llama-70B as the target model and Llama-1B as the draft yield <em>146 tps</em>, compared to <em>51 tps</em> with just vanilla Llama-70B. Deployed on 1 x H200 [Source: <a href="https://developer.nvidia.com/blog/tensorrt-llm-speculative-decoding-boosts-inference-throughput-by-up-to-3-6x/">NVIDIA</a>]</figcaption></figure></div><h3>Acceptance Rate</h3><p>The original intuition with speculative decoding was that there are simple, predictable continuations where a draft model can easily guess what comes next, like completing:</p><blockquote><p><em>&#8220;The quick brown fox &#8230;&#8221;</em></p></blockquote><p>But in more esoteric cases (rare subject domains, weird proper nouns, highly constrained formatting, tricky reasoning steps) the draft may diverge quickly, and most of the tokens get rejected by the target model.</p><p>A key metric to notice here is the <strong>acceptance rate</strong>. The draft model still has to do a decent job. If the target model accepts something like <strong>80&#8211;90%</strong> of the draft tokens, then speculative decoding starts to feel useful. It will reduce end-to-end latency and raise throughput. But if most of the draft tokens are rejected then the whole trick falls apart. You&#8217;re doing extra work to draft tokens that you end up throwing away.</p><h3>Medusa &amp; EAGLE: Alternative mechanisms of speculative decoding</h3><p>Since its introduction in 2023, speculative decoding has remained an active research area and has gotten better over time. The core idea is still the same: keep output quality high while reducing how often the expensive target model has to step through the serial decode loop.</p><p>A lot of follow-on work focuses on improvements such as <strong>higher acceptance rates, lower overhead, </strong>and<strong> simpler deployment</strong>. </p><p>An important approach is avoiding the requirement to load two completely separate models. <a href="https://arxiv.org/abs/2401.10774">Medusa</a> and <a href="https://arxiv.org/pdf/2503.01840v1">EAGLE</a> take this approach. They add a draft stage inside the target model itself so you get speculative style candidates without standing up a full separate draft model. This is equivalent to <em>grafting</em> a small draft head onto the LLM&#8217;s pipeline.</p><p>A useful reference here is the recent <a href="https://arxiv.org/pdf/2502.19732v4">survey of speculative decoding techniques</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kGby!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kGby!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 424w, https://substackcdn.com/image/fetch/$s_!kGby!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 848w, https://substackcdn.com/image/fetch/$s_!kGby!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 1272w, https://substackcdn.com/image/fetch/$s_!kGby!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kGby!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1983725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kGby!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 424w, https://substackcdn.com/image/fetch/$s_!kGby!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 848w, https://substackcdn.com/image/fetch/$s_!kGby!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 1272w, https://substackcdn.com/image/fetch/$s_!kGby!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44837bd4-f06d-479c-8e6e-4c136cf2014d_1280x720.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/">NVIDIA</a></figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p><h1>Why disaggregating draft and target models is a system-level win</h1><p>In June 2025, the <strong>ByteDance</strong> team published a remarkable paper called <strong><a href="https://arxiv.org/pdf/2506.11309v1">SwiftSpec</a></strong> and argued that existing speculative decoding systems were leaving significant performance on the table. They traced it to two main issues.</p><p>First, the draft stage sits idle when the target stage is performing verification, i.e., the <em>draft &#8594; verify</em> loop still happens sequentially. The <em>N<sup>th</sup></em> draft stage cannot execute until the <em>(N-1)<sup>th</sup></em> verify step has completed. So the end-to-end latency is not optimized.</p><p>Second, large target models usually require multiple GPUs with model parallelism. Earlier deployments often co-located draft and target, and then tried to scale them both the same way across GPUs. SwiftSpec points out why this is inefficient in small-batch serving. The draft and target have very different compute needs, and when you force both through the same tensor-parallel setup, you run into compute imbalance, KV cache consistency issues, and communication overhead that is hard to amortize at low batch sizes.</p><p>They solved these two problems by first <strong>disaggregating</strong> <strong>the draft and the target</strong> <strong>models</strong> and running them on different groups of NVIDIA Hopper H200 GPUs, as shown in the figure below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eNz-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eNz-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 424w, https://substackcdn.com/image/fetch/$s_!eNz-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 848w, https://substackcdn.com/image/fetch/$s_!eNz-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 1272w, https://substackcdn.com/image/fetch/$s_!eNz-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eNz-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png" width="1456" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3786de7-478e-43d2-b224-ec8333c75f8c_4282x1422.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1120475,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3786de7-478e-43d2-b224-ec8333c75f8c_4282x1422.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eNz-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 424w, https://substackcdn.com/image/fetch/$s_!eNz-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 848w, https://substackcdn.com/image/fetch/$s_!eNz-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 1272w, https://substackcdn.com/image/fetch/$s_!eNz-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9112359a-559e-40bd-b40d-2f4a9593fb51_4282x1422.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">SwiftSpec - Disaggregated speculative decoding. [Source: <a href="https://arxiv.org/pdf/2506.11309v1">ByteDance</a>, annotation by Chiplog]</figcaption></figure></div><p>This allowed each side to scale differently depending upon the models deployed, whether it&#8217;s Llama, Qwen, or DeepSeek. The figure below shows how end-to-end performance changes as you vary how many H200 GPUs are allocated to the target versus the draft model.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iXlx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iXlx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 424w, https://substackcdn.com/image/fetch/$s_!iXlx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 848w, https://substackcdn.com/image/fetch/$s_!iXlx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 1272w, https://substackcdn.com/image/fetch/$s_!iXlx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iXlx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png" width="570" height="330.4120879120879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1456,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:1022830,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iXlx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 424w, https://substackcdn.com/image/fetch/$s_!iXlx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 848w, https://substackcdn.com/image/fetch/$s_!iXlx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 1272w, https://substackcdn.com/image/fetch/$s_!iXlx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e9019a5-f6ad-4bba-b58f-23420687b195_3424x1984.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Target TP = 2, 4, 6 indicates the number of H200 GPUs allocated to the Target Model. The remaining portion of the 8 x H200s in the DGX system were allocated to Draft. [Source: <a href="https://arxiv.org/pdf/2506.11309v1">ByteDance</a>]</figcaption></figure></div><p>They also developed a novel scheme to <strong>let the two models run asynchronously</strong>. The draft model generates a tree of tokens and can move ahead and generate candidate tokens for the <em>next iteration</em> while the target stage is still verifying the <em>previous one</em>. After each step, the two models synchronize state, and the draft &#8220;<em>re-roots</em>&#8221; by keeping only the portion of its token tree that the target has verified. This is represented in the figure below.</p><p>With this setup, they report an average <strong>1.75X speedup</strong> over previous speculative decoding systems. The highlight of the paper is their claim that SwiftSpec serves Llama3-70B at<strong> 348 tokens/s </strong>on 8 NVIDIA Hopper GPUs, which they describe as the <strong>fastest known system for low-latency LLM serving at this scale</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!72NY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!72NY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 424w, https://substackcdn.com/image/fetch/$s_!72NY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 848w, https://substackcdn.com/image/fetch/$s_!72NY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 1272w, https://substackcdn.com/image/fetch/$s_!72NY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!72NY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png" width="508" height="393.21153846153845" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec3f696a-5456-4ff4-8c31-3a4689b8b26f_2822x2184.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1127,&quot;width&quot;:1456,&quot;resizeWidth&quot;:508,&quot;bytes&quot;:1135935,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/189813953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec3f696a-5456-4ff4-8c31-3a4689b8b26f_2822x2184.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!72NY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 424w, https://substackcdn.com/image/fetch/$s_!72NY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 848w, https://substackcdn.com/image/fetch/$s_!72NY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 1272w, https://substackcdn.com/image/fetch/$s_!72NY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78780945-5042-45b4-91e5-bfa9d4aee181_2822x2184.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Draft model re-rooting. [Source <a href="https://arxiv.org/pdf/2506.11309v1">ByteDance</a>, annotation by Chiplog]</figcaption></figure></div><h2>Why Blackwell and Rubin are awkward draft engines and why heterogeneous XPUs can win</h2><h3>Decode is memory bound and draft decoding is the worst-case version of that</h3><p><a href="https://arxiv.org/pdf/2506.11309v1">SwiftSpec</a> has a section called <strong>&#8220;GPU Constraints for Low Batch Size&#8221;</strong>. The basic point is that GPUs are incredible when you can keep them busy with large batches and big, regular kernels. </p>
      <p>
          <a href="https://www.chiplog.io/p/why-speculative-decoding-wants-two">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Arista, AMD Pensando, and the Making of a Billionaire CEO]]></title><description><![CDATA[How Cisco and its culture of spin-ins led to the creation of Pensando and Silicon Valley's richest Indian-American CEO]]></description><link>https://www.chiplog.io/p/arista-amd-pensando-and-the-making</link><guid isPermaLink="false">https://www.chiplog.io/p/arista-amd-pensando-and-the-making</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 01 Mar 2026 18:18:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/338db969-e454-4ea4-8314-97b73167f680_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>The networking world is often viewed as a battle of specs &#8212; bandwidth, latency, port density and power efficiency. But one of the more consequential stories of the past two decades had less to do with silicon and more to do with culture.</strong></p><p>To understand why Arista Networks exists today, and how it grew into a company valued at over $100 billion, you have to rewind to the mid-2000s and step inside  <a href="https://en.wikipedia.org/wiki/John_T._Chambers">John Chambers&#8217;</a> Cisco. You have to understand this unique, highly lucrative, and ultimately controversial business model known as the <strong>&#8220;Spin-In&#8221;.</strong></p><p>This is the story of the team behind Pensando (acquired by AMD in 2022), the rise of Jayshree Ullal, and how internal conflicts inadvertently funded the creation of Cisco&#8217;s fiercest competitor, Arista.</p><p>I worked at Cisco from the mid-2000s to early 2010s and witnessed some of these dynamics firsthand.</p><h2>The &#8220;Spin-In&#8221; Innovation Machine</h2><p>In the 2000s, Cisco CEO John Chambers had a problem. Cisco was becoming a behemoth, and as with many other big companies, it started moving slowly and losing its competitive edge. Chambers worried that top talent would leave to start their own companies, or that Cisco would miss the next big technology wave because of bureaucratic inertia.</p><p>His solution was the &#8220;Spin-In.&#8221;</p><p>Instead of forcing his best engineers to navigate HR and finance approvals to build a new product, Cisco would fund them to leave. They would form a &#8220;startup&#8221; just down the street. Cisco would own the rights to buy them back later at a pre-determined, massive valuation if they hit their milestones.</p><p>It was genius, but it created a two-tier system: the chosen few got startup equity and autonomy; everyone else got a salary and corporate red tape.</p><h2>The &#8220;MPLS&#8221; Team</h2><p>The primary beneficiaries of this model were a legendary quartet of executives. Inside Cisco, they were known by the moniker <strong>MPLS</strong>. It was a cheeky pun on the famous networking protocol (<a href="https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching">Multiprotocol Label Switching</a>), but it stood for their first names:</p><ul><li><p><strong>M</strong>ario Mazzola</p></li><li><p><strong>P</strong>rem Jain</p></li><li><p><strong>L</strong>uca Cafiero</p></li><li><p><strong>S</strong>oni Jiandani</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ezvv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ezvv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ezvv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ezvv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ezvv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ezvv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg" width="619" height="407.6557142857143" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:922,&quot;width&quot;:1400,&quot;resizeWidth&quot;:619,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ezvv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ezvv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ezvv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ezvv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d67005-4e77-4938-8f74-53cf50e5d6f0_1400x922.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">L-to-R: Soni, Mario, Luca, Prem [Image Source:<a href="https://medium.com/the-technews/ciscos-prime-engineers-team-known-as-the-heart-soul-and-brain-of-the-company-have-resigned-8718b6987625"> Medium - The Tech News</a>]</figcaption></figure></div><p>This team had a Midas touch. Every few years, the MPLS team would &#8220;leave,&#8221; build the next big thing, and get bought back by Cisco for hundreds of millions. Of course, to their credit, the products developed by each of their companies went on to become important pieces in Cisco&#8217;s roadmap.</p><ol><li><p><strong>Crescendo (1993):</strong> Acquired for ~$95M. This became the <strong>Catalyst 5000/6000</strong> series, arguably the most successful networking product in history.</p></li><li><p><strong>Andiamo (2002):</strong> A storage spin-in acquired for ~$750M. This became the <strong>MDS 9000</strong> (SAN switching).</p></li><li><p><strong>Nuova Systems (2008):</strong> A data center spin-in acquired for ~$678M. This became the <strong>Nexus</strong> switch and the <strong>UCS </strong>server line.</p></li><li><p><strong>Insieme (2013):</strong> An SDN spin-in acquired for ~$863M. This became <strong>Cisco ACI </strong>(Application Centric Infrastructure).</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dENN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dENN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 424w, https://substackcdn.com/image/fetch/$s_!dENN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 848w, https://substackcdn.com/image/fetch/$s_!dENN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!dENN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dENN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png" width="474" height="471.3956043956044" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/780a4aa8-96cb-474a-9483-998860add751_1456x1448.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd851023-2bc1-4ed2-9e34-88a6f91ba82c_1456x1448.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1448,&quot;width&quot;:1456,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:348728,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/187536884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd851023-2bc1-4ed2-9e34-88a6f91ba82c_1456x1448.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dENN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 424w, https://substackcdn.com/image/fetch/$s_!dENN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 848w, https://substackcdn.com/image/fetch/$s_!dENN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!dENN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F780a4aa8-96cb-474a-9483-998860add751_1456x1448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">News report from 2013 capturing the mood when Cisco acquired Insieme, the MPLS team&#8217;s third spin-in (Source: <a href="https://www.businessinsider.com/cisco-buys-insieme-for-863-million-2013-11">Business Insider</a>)</figcaption></figure></div><h2>The Collision Course</h2><p>One of the key figures who had joined Cisco via the Crescendo acquisition in 1993 was Jayshree Ullal. She went onto become a seminal figure within Cisco, loved and admired by the majority of Cisco employees. By 2008, she was Senior VP of the Data Center, Switching, and Services Group. She was responsible for over <strong>$10 billion in revenue</strong>. </p><p>So, while the MPLS team was operating in their lucrative semi-private bubble, <strong>Jayshree Ullal</strong> was running the actual business within Cisco. She was the one executing, scaling, and managing the massive Catalyst portfolio that paid the bills.</p><p>The friction reached a breaking point with <strong>Nuova Systems</strong>.</p><p>Nuova (the MPLS team&#8217;s 2008 spin-in) was building the Nexus switch. This wasn&#8217;t just a complementary product; it was the future of the data center. It was designed to handle a unified fabric of Ethernet and storage.</p><p>The problem was that it directly cannibalized the Catalyst business Ullal was running.</p><p>Ullal found herself in an impossible position. She was managing the company&#8217;s biggest vertical, yet the future roadmap was being built by a &#8220;startup&#8221; team that operated outside her control. A team that was guaranteed a massive payout while her own team was bound by standard corporate compensation.</p><p>Realizing that the structure favored the &#8220;spin-in&#8221; inner circle over operational leadership, Ullal resigned in May 2008.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eDJu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eDJu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 424w, https://substackcdn.com/image/fetch/$s_!eDJu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 848w, https://substackcdn.com/image/fetch/$s_!eDJu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 1272w, https://substackcdn.com/image/fetch/$s_!eDJu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eDJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png" width="516" height="114.66666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:900,&quot;resizeWidth&quot;:516,&quot;bytes&quot;:29961,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/187536884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eDJu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 424w, https://substackcdn.com/image/fetch/$s_!eDJu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 848w, https://substackcdn.com/image/fetch/$s_!eDJu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 1272w, https://substackcdn.com/image/fetch/$s_!eDJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb69f9d6-c9e0-48ad-913a-64b517621778_900x200.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">In 2008 Jayashree Ullal surprised many by leaving Cisco. Just months later she re-emerged as Arista&#8217;s new CEO (Source: <a href="https://www.networkworld.com/article/901388/cisco-subnet-cisco-s-jayshree-ullal-quits.html">Network World</a>)</figcaption></figure></div><h2>The Vindication: Arista Networks</h2><p>Five months later, Ullal joined <strong>Arista Networks</strong> as CEO, teaming up with founders Andy Bechtolsheim and David Cheriton.</p><p>At Arista, Ullal didn&#8217;t just compete with Cisco; she exploited the weakness of the MPLS philosophy.</p><p>The MPLS team (and Cisco generally) believed in <strong>Custom ASICs</strong>. They believed that to get the best performance, you had to design your own chips. While custom chips offer differentiation and have legitimate advantages, they are proprietary and time consuming to produce.</p><p>Ullal and Arista took a different bet: <strong>Merchant Silicon</strong>. They believed that generic chips from Broadcom would eventually catch up to custom silicon in speed, but at a fraction of the cost. Arista focused their engineering power not on chips, but on software building a programmable, Linux-based stack that the hyperscalers (Google, Microsoft, Meta) were desperate for.</p><p>She was right.</p><p>While Cisco was busy integrating their expensive Nuova/Insieme acquisitions and fighting complexity, Arista&#8217;s <em>&#8220;commodity chip + superior software&#8221;</em> approach swept through the cloud market.</p><blockquote><p><em>The piece recounts the bitter rivalry between <strong>Cisco</strong> and <strong>Jayshree Ullal</strong>. What started as respect between them, turned personal as Arista&#8217;s success eroded Cisco&#8217;s market share in switching &#8212; a business Jayashree Ullal once ran. </em></p><p><em>Cisco responded with internal &#8220;war room&#8221; efforts and legal battles, accusing Arista of copying its technology, while Arista characterized those tactics as defensiveness from a legacy company losing ground to a faster, more innovative competitor.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E6Gn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E6Gn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 424w, https://substackcdn.com/image/fetch/$s_!E6Gn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 848w, https://substackcdn.com/image/fetch/$s_!E6Gn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 1272w, https://substackcdn.com/image/fetch/$s_!E6Gn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E6Gn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png" width="544" height="136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:173,&quot;width&quot;:692,&quot;resizeWidth&quot;:544,&quot;bytes&quot;:33169,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/187536884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E6Gn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 424w, https://substackcdn.com/image/fetch/$s_!E6Gn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 848w, https://substackcdn.com/image/fetch/$s_!E6Gn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 1272w, https://substackcdn.com/image/fetch/$s_!E6Gn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F812721e9-7b34-464e-a7bd-04b982c9ff43_692x173.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.foxbusiness.com/features/ciscos-feud-with-former-star-executive-is-ugly">Fox Business</a></figcaption></figure></div></blockquote><h2>The Aftermath: End of the spin-in model</h2><p>The spin-in era eventually ended. When Chuck Robbins took over as Cisco CEO in 2015, he dismantled the model, recognizing that it destroyed internal morale and siloed innovation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WMrj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WMrj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 424w, https://substackcdn.com/image/fetch/$s_!WMrj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 848w, https://substackcdn.com/image/fetch/$s_!WMrj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!WMrj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WMrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png" width="555" height="277.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:555,&quot;bytes&quot;:549076,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/187536884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WMrj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 424w, https://substackcdn.com/image/fetch/$s_!WMrj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 848w, https://substackcdn.com/image/fetch/$s_!WMrj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!WMrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7529873c-5f2b-41b7-929d-baa9dc7d6ee8_2048x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">News article dates June 6, 2016 [Source: <a href="https://siliconangle.com/2016/06/06/four-top-cisco-execs-collectively-known-as-mpls-leave-the-company/">Silicon Angle</a>]</figcaption></figure></div><p>The MPLS team left Cisco one final time to found <strong>Pensando Systems</strong> (focused on DPUs/SmartNICs). Cisco did <em>not</em> buy them back this time. Instead, <strong>AMD</strong> acquired Pensando for <strong>$1.9 billion</strong> in 2022.</p><p>Meanwhile, <strong>Jayshree Ullal</strong> went on to build Arista into one of the most successful networking companies of the cloud era. Today she is widely regarded as one of Silicon Valley&#8217;s most effective CEOs. And at a net worth north of $5 billion, if you assumed the richest Indian-American CEO in the Valley was Satya Nadella or Sundar Pichai &#8212; you&#8217;d be mistaken.</p><p><strong>It&#8217;s Jayshree.</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/p/arista-amd-pensando-and-the-making?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/p/arista-amd-pensando-and-the-making?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"><em>Chip Log is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</em></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Read this next &#8230;</h1><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;acbac7ab-3b58-4c7e-b75e-a23d0688e673&quot;,&quot;caption&quot;:&quot;If I had to summarize the NVIDIA + Mellanox relationship in a single sentence, it would be this: Every time GPUs got faster, the network became the bottleneck &#8212; and NVIDIA and Mellanox kept removing that bottleneck layer by layer.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How Mellanox set the stage for NVIDIA&#8217;s AI dominance and signaled the Storage Supercycle&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:91270315,&quot;name&quot;:&quot;Subbu&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6aad272-4260-4006-8b7e-97c8deb711c0_1267x1267.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2026-02-06T03:57:41.844Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61982b4e-1f92-4d47-9a6b-331e72bfef29_2752x1536.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.chiplog.io/p/how-mellanox-set-the-stage-for-nvidias&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184659153,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:2033567,&quot;publication_name&quot;:&quot;Chip Log&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!fd69!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681da9cd-7cda-4874-a53a-49921c6fb514_180x180.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7389811e-4365-4f46-882e-9767b9c80dac&quot;,&quot;caption&quot;:&quot;At CES 2026, NVIDIA announced the Context Memory Storage Platform, a new appliance designed to expand KV cache capacity beyond the GPU rack. The fanfare around this device is definitely warranted &#8212; but, in my view, is also partially misplaced.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Analysis of NVIDIA&#8217;s Bluefield-4 DPU and KV-Cache Context Memory Storage Platform (CES 2026): Architecture, Strategy, Dynamo, WEKA, Enfabrica&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:91270315,&quot;name&quot;:&quot;Subbu&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6aad272-4260-4006-8b7e-97c8deb711c0_1267x1267.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;post_date&quot;:&quot;2026-01-11T07:01:21.930Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1eae3740-1a32-49a9-a6d7-011637767a06_2848x1504.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.chiplog.io/p/analysis-of-nvidias-bluefield-4-dpu&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:184085884,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:19,&quot;comment_count&quot;:1,&quot;publication_id&quot;:2033567,&quot;publication_name&quot;:&quot;Chip Log&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!fd69!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F681da9cd-7cda-4874-a53a-49921c6fb514_180x180.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[How Mellanox set the stage for NVIDIA’s AI dominance and signaled the Storage Supercycle]]></title><description><![CDATA[Deep dive of the MLNX+NVIDIA partnership, GPUDirect RDMA, GPUDirect Storage, Bluefield DPU, Spectrum-X, and what could be coming next]]></description><link>https://www.chiplog.io/p/how-mellanox-set-the-stage-for-nvidias</link><guid isPermaLink="false">https://www.chiplog.io/p/how-mellanox-set-the-stage-for-nvidias</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Fri, 06 Feb 2026 03:57:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/61982b4e-1f92-4d47-9a6b-331e72bfef29_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If I had to summarize the NVIDIA + Mellanox relationship in a single sentence, it would be this: <strong>Every time GPUs got faster, the network became the bottleneck &#8212; and NVIDIA and Mellanox kept removing that bottleneck layer by layer.</strong></p><p>This article traces the pivotal innovations that emerged from this partnership and how it put NVIDIA in the dominant position it is today. We&#8217;ll dissect how Jensen played his hand in five phases, each one pushing the GPU further out of its role as a graphics &#8220;sidekick&#8221;, to becoming the heart of the modern AI data center.</p><ol><li><p><strong>2009: GPUDirect Shared Memory &#8212;</strong> Breaking the ice between the GPU and the NIC.</p></li><li><p><strong>2013: GPUDirect RDMA &#8212;</strong> NICs and GPUs learn to talk directly, accelerating the GPU-to-GPU path.</p></li><li><p><strong>2019: GPUDirect Storage &#8212;</strong> The storage bottleneck is removed, accelerating the path from SSD to GPU.</p></li><li><p><strong>2022&#8211;Present: Spectrum-X &amp; BlueField.</strong> Building AI factories, and taking over the data center to challenge Broadcom.</p></li><li><p><strong>Retrospective analysis and the Future &#8212;</strong> What&#8217;s next for the GPU-centric architecture?</p></li></ol><p>By the end, you&#8217;ll see how the signal for companies like <a href="https://substack.com/discover/stocks/SNDK">SNDK 0.00%&#8593;</a> , <a href="https://substack.com/discover/stocks/WD">WD 0.00%&#8593;</a> were visible much earlier than most of us realized &#8212; it was in front of us all along.</p><p>The story goes like this&#8230;</p><div><hr></div><h1>2009-10: Tianhe-1A and China&#8217;s Emergence in Supercomputing</h1><p>In 2009, during the <strong>Tesla and Fermi era</strong> of NVIDIA GPUs, there was a surge of interest from the scientific and high-performance computing (HPC) communities in using GPUs to accelerate computation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vK03!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vK03!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 424w, https://substackcdn.com/image/fetch/$s_!vK03!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 848w, https://substackcdn.com/image/fetch/$s_!vK03!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 1272w, https://substackcdn.com/image/fetch/$s_!vK03!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vK03!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png" width="1456" height="474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5a03362-249a-4915-b48b-cbb8fb1f2b18_2222x724.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:474,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:434563,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5a03362-249a-4915-b48b-cbb8fb1f2b18_2222x724.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vK03!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 424w, https://substackcdn.com/image/fetch/$s_!vK03!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 848w, https://substackcdn.com/image/fetch/$s_!vK03!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 1272w, https://substackcdn.com/image/fetch/$s_!vK03!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe513e2fb-43e4-4fec-a1d6-74f8c4c1b0bf_2222x724.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Performance evaluation of scientific applications on heterogenous clusters (<a href="http://impact.crhc.illinois.edu/shared/papers/lci09_paper.pdf">Source: 2009</a>)</figcaption></figure></div><p>This interest wasn&#8217;t speculative. Real work was already showing that GPUs could fundamentally reshape the computing landscape. For instance, <a href="https://dl.acm.org/doi/epdf/10.1145/1513895.1513901">NVIDIA demonstrated that LINPACK</a>, the benchmark used to rank the world&#8217;s fastest supercomputers, could be accelerated with GPUs. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-Z2T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-Z2T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 424w, https://substackcdn.com/image/fetch/$s_!-Z2T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 848w, https://substackcdn.com/image/fetch/$s_!-Z2T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 1272w, https://substackcdn.com/image/fetch/$s_!-Z2T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-Z2T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png" width="420" height="378.1730769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1311,&quot;width&quot;:1456,&quot;resizeWidth&quot;:420,&quot;bytes&quot;:609356,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-Z2T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 424w, https://substackcdn.com/image/fetch/$s_!-Z2T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 848w, https://substackcdn.com/image/fetch/$s_!-Z2T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 1272w, https://substackcdn.com/image/fetch/$s_!-Z2T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fc6083e-d2a3-443d-a3ed-2588349db37c_1804x1624.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://dl.acm.org/doi/10.1145/1513895.1513901">Accelerating LINKPACK with CUDA on heterogenous clusters (2009)</a></figcaption></figure></div><p>At the same time, China&#8217;s ambition to enter the supercomputing arena had become obvious. System by system, they climbed the <a href="https://top500.org/">TOP500</a> rankings. Then, in November 2010, they did it. <strong><a href="https://top500.org/lists/top500/2010/11/">Tianhe-1A</a></strong>, installed at the National Supercomputing Center in Tianjin, surpassed the American <strong><a href="https://top500.org/lists/top500/2009/11/">Cray XT5</a></strong> supercomputer, installed at the Oak Ridge National Laboratory to claim the #1 spot. It marked the end of a decade-long period of U.S. dominance.</p><p>What&#8217;s remarkable isn&#8217;t just that China took the top position &#8212; it&#8217;s how <em>quickly</em> the underlying architecture shifted. If you compare the Top 5 supercomputers from November 2009 to November 2010, we see that in the span of a single year, <strong>three of the top five supercomputers are now GPU-accelerated</strong>, using NVIDIA GPUs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JvV7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JvV7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 424w, https://substackcdn.com/image/fetch/$s_!JvV7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 848w, https://substackcdn.com/image/fetch/$s_!JvV7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 1272w, https://substackcdn.com/image/fetch/$s_!JvV7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JvV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png" width="1456" height="574" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfe73949-ea73-470c-9213-e3f0d495e11e_4924x1942.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:574,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2331600,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfe73949-ea73-470c-9213-e3f0d495e11e_4924x1942.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JvV7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 424w, https://substackcdn.com/image/fetch/$s_!JvV7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 848w, https://substackcdn.com/image/fetch/$s_!JvV7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 1272w, https://substackcdn.com/image/fetch/$s_!JvV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe625feaa-e596-4f8e-bf77-a3d6ea7cb3ac_4924x1942.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparing the top 5 supercomputers from 2009 and 2010. (Source: <a href="https://top500.org/lists/top500/2010/11/">top500.org</a>)</figcaption></figure></div><p>The promises that GPUs were living up to in this era had nothing to do with AlexNet, or deep learning, or transformers. This was pure high-performance computing. Which makes me think that Jensen&#8217;s conviction in GPUs as a foundational computing primitive long pre-dated AI as we know it today<strong>.</strong></p><p>It&#8217;s not hard to draw architectural parallels between Tianhe-1A&#8217;s compute trays and something like a modern GB200. The machine was composed of:</p><ul><li><p><strong>14,336 Xeon processors </strong>and<strong> 7,168 NVIDIA GPUs. </strong>Arranged in <strong>3,548 compute trays</strong>, each containing <strong>two</strong> nodes.</p></li><li><p>Each node had <strong>2&#215; Intel Xeon X5670 </strong>6-core CPUs and <strong>1&#215; NVIDIA Tesla M2050 </strong>GPU.</p></li><li><p>The interconnect was proprietary, which delivered roughly 2&#215; InfiniBand bandwidth at the time. </p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J6Pn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J6Pn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 424w, https://substackcdn.com/image/fetch/$s_!J6Pn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 848w, https://substackcdn.com/image/fetch/$s_!J6Pn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 1272w, https://substackcdn.com/image/fetch/$s_!J6Pn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J6Pn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png" width="506" height="313.8173076923077" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c58ada0c-5367-4571-a992-048d6fa6c18f_2482x1540.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:903,&quot;width&quot;:1456,&quot;resizeWidth&quot;:506,&quot;bytes&quot;:165373,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58ada0c-5367-4571-a992-048d6fa6c18f_2482x1540.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J6Pn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 424w, https://substackcdn.com/image/fetch/$s_!J6Pn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 848w, https://substackcdn.com/image/fetch/$s_!J6Pn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 1272w, https://substackcdn.com/image/fetch/$s_!J6Pn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e148ea8-06aa-4104-8116-92348ca2c12c_2482x1540.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tianhe-1A compute tray</figcaption></figure></div><p>While Tianhe-1A and several other signs showed that GPUs were the future of high-performance computing<strong>,</strong> there was a <strong>fundamental problem</strong>.</p><p>In order for large compute clusters to work together cohesively as one unit, they have to communicate through high-speed links. But, <strong>GPUs were stuck behind CPUs</strong>. Every movement of data had to be orchestrated by the host CPU. GPUs couldn&#8217;t talk to the network directly. They couldn&#8217;t talk to storage directly. They were fast, but fenced in.</p><p>That bottleneck set the stage for what came next.</p><p>In 2009, NVIDIA and Mellanox formed a partnership to address exactly this problem &#8212; how to liberate GPUs from the CPU-centric data path and make them first-class citizens in the data center.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p><h1>2010: GPUDirect &#8220;Shared Memory&#8221;  &#8212; Fermi + Connect-X2</h1><p>At <strong>SC09 </strong>(Supercomputing 2009), NVIDIA and Mellanox formally announced their partnership.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yojw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yojw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 424w, https://substackcdn.com/image/fetch/$s_!Yojw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 848w, https://substackcdn.com/image/fetch/$s_!Yojw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!Yojw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yojw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png" width="687" height="333.59134615384613" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5b774d6-345a-4bed-93d5-5fa775f72974_2562x1244.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:707,&quot;width&quot;:1456,&quot;resizeWidth&quot;:687,&quot;bytes&quot;:1095896,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5b774d6-345a-4bed-93d5-5fa775f72974_2562x1244.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yojw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 424w, https://substackcdn.com/image/fetch/$s_!Yojw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 848w, https://substackcdn.com/image/fetch/$s_!Yojw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!Yojw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b1856c4-8a42-4f3a-8011-23a67be0a6a6_2562x1244.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the time, a typical GPU-accelerated server looked like this: a CPU running Linux acted as the master, with a GPU and an InfiniBand NIC attached to it over PCIe. During PCIe enumeration, the GPU and the Mellanox card were assigned <strong>separate address regions in CPU&#8217;s memory </strong>(host memory). As a result, the CPU was responsible for initiating and orchestrating <em>all</em> data movement between the GPU and the InfiniBand network. </p><p>This led to an especially wasteful stage in the data path called <strong>buffer copy</strong>. The GPU puts data in System RAM (Buffer A). The CPU reads Buffer A and copies it to Buffer B (which the Network card can see). Their first goal with this partnership was quite simple &#8212; eliminate this buffer copy step. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XC4G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XC4G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 424w, https://substackcdn.com/image/fetch/$s_!XC4G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 848w, https://substackcdn.com/image/fetch/$s_!XC4G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 1272w, https://substackcdn.com/image/fetch/$s_!XC4G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XC4G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png" width="1456" height="650" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:650,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:345468,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XC4G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 424w, https://substackcdn.com/image/fetch/$s_!XC4G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 848w, https://substackcdn.com/image/fetch/$s_!XC4G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 1272w, https://substackcdn.com/image/fetch/$s_!XC4G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad120f37-b696-4629-9c54-0f75b408d5aa_2200x982.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://developer.download.nvidia.com/devzone/devcenter/cuda/docs/GPUDirect_Technology_Overview.pdf">NVIDIA</a></figcaption></figure></div><p>Sounds deceptively simple, but in order to make this happen there were 3 changes necessary:</p><ul><li><p><strong>Linux kernel updates</strong> to allow NVIDIA and Mellanox drivers to <em>share host memory</em>. This made it possible for the InfiniBand NIC to directly access buffers allocated by the CUDA library, eliminating the need for buffer copies.</p></li><li><p><strong>Driver updates on both sides</strong>. The NVIDIA and Mellanox card drivers had to shake hands and agree to use <strong>Buffer A</strong> for everything.The Mellanox driver registered callbacks that allowed the GPU to notify any changes performed during run time in the shared buffers. </p></li></ul><p>This simple update <em>accelerated GPU communications over Infiniband by 30%. </em>More importantly, it broke the ice between the GPU and the NIC. This was the first step toward direct, coordinated data movement.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3t8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3t8T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 424w, https://substackcdn.com/image/fetch/$s_!3t8T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 848w, https://substackcdn.com/image/fetch/$s_!3t8T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 1272w, https://substackcdn.com/image/fetch/$s_!3t8T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3t8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png" width="1456" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/552086cb-98f1-42e5-b844-263771115b5f_2444x582.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:343770,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3t8T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 424w, https://substackcdn.com/image/fetch/$s_!3t8T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 848w, https://substackcdn.com/image/fetch/$s_!3t8T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 1272w, https://substackcdn.com/image/fetch/$s_!3t8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552086cb-98f1-42e5-b844-263771115b5f_2444x582.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">The original GPUDirect paper. Worth reading in entirety. [<a href="https://www.osti.gov/servlets/purl/1120826">Source</a>]</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YjDi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YjDi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 424w, https://substackcdn.com/image/fetch/$s_!YjDi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 848w, https://substackcdn.com/image/fetch/$s_!YjDi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 1272w, https://substackcdn.com/image/fetch/$s_!YjDi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YjDi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png" width="1456" height="1231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9da59f48-9311-4925-a918-4d335d840277_2464x2084.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1231,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2695340,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da59f48-9311-4925-a918-4d335d840277_2464x2084.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YjDi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 424w, https://substackcdn.com/image/fetch/$s_!YjDi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 848w, https://substackcdn.com/image/fetch/$s_!YjDi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 1272w, https://substackcdn.com/image/fetch/$s_!YjDi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4e555-f3a2-4557-b672-9387df79b88e_2464x2084.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A node in the Japanese TSUBAME2.0 supercomputer. It used Mellanox NICs, Infiniband switches, NVIDIA GPUs and leveraged GPUDirect. [<a href="https://www.gsic.titech.ac.jp/sites/default/files/TSUBAME_SPECIFICATIONS_en_0.pdf">Source</a>]</figcaption></figure></div><h1>2013: GPUDirect RDMA &#8212; Kepler, Connect-X3, accelerating GPU-GPU path</h1><p>Even with the 2010 GPUDirect improvements, data still had to detour through the CPU&#8217;s system memory. That extra hop added latency and capped bandwidth at the speed of the host memory bus.</p><p>Which naturally raised the question &#8212; <strong>why can&#8217;t the NIC just read directly from the GPU&#8217;s memory?</strong></p><p>That question is exactly what <strong>GPUDirect RDMA </strong>(Remote Direct Memory Access) answered and it became possible with the Kepler-class GPUs (NVIDIA Tesla K40).</p><h3>Why this wasn&#8217;t possible before Kepler</h3><p>On older GPUs (Fermi and earlier), the interface between PCIe and GPU VRAM was fundamentally constrained. Although a GPU might have had 6 GB of VRAM, only a relatively small aperture (typically ~256 MB, exposed via BAR1) was addressable from the PCIe bus at any given time.</p><p>To access data outside that window, the NVIDIA driver (running on the CPU) had to reprogram the BAR1 mapping to point to a different region of VRAM. The GPU would slide the window, and only then could the CPU read the data. This remapping was entirely CPU and driver-controlled, and devices like a Network Card are not smart enough to issue these &#8220;Move Window&#8221; commands. It just sends a &#8220;Read&#8221; request to a physical address. Without a stable, fully mapped view of GPU memory, direct access was impossible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3n4B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3n4B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 424w, https://substackcdn.com/image/fetch/$s_!3n4B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 848w, https://substackcdn.com/image/fetch/$s_!3n4B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!3n4B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3n4B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png" width="1456" height="664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2bf51967-89f2-4bd2-9a68-a08bb51040eb_2464x1124.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:664,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:687203,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bf51967-89f2-4bd2-9a68-a08bb51040eb_2464x1124.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3n4B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 424w, https://substackcdn.com/image/fetch/$s_!3n4B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 848w, https://substackcdn.com/image/fetch/$s_!3n4B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 1272w, https://substackcdn.com/image/fetch/$s_!3n4B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5a1a07-2a73-4d1c-946d-39237dc9e2e1_2464x1124.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">With a universal virtual address, each GPU&#8217;s VRAM is uniquely addressable from PCIe. (Source: <a href="https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/">NVIDIA</a>)</figcaption></figure></div><h3>What changed with Kepler</h3><p>Kepler fundamentally changed this model.</p><p>With Kepler, NVIDIA exposed <strong>the entire GPU VRAM address space directly to the PCIe bus</strong>. This meant that a NIC could now access <em>any</em> location in GPU memory at any time without CPU involvement and without remapping windows.</p><p>For the first time, the network could treat GPU memory as a first-class RDMA target.</p><h3>The software stack catches up</h3><p>On the software side, NVIDIA introduced GPUDirect RDMA support in <strong>CUDA 4.0</strong>, while <strong>Mellanox</strong> updated their <strong><a href="https://github.com/larrystevenwise/nvidia_peer_memory">MLNX_OFED</a> </strong>drivers (i.e., the infiniband drivers) to enable true peer-to-peer RDMA paths between GPU memory and Mellanox adapters such as the <strong>ConnectX-3</strong>. Together, this enabled a direct data path between the GPU and the NIC.</p><pre><code>GPU VRAM &#8644; Mellanox NIC &#8644; Network</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L-x6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L-x6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 424w, https://substackcdn.com/image/fetch/$s_!L-x6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 848w, https://substackcdn.com/image/fetch/$s_!L-x6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 1272w, https://substackcdn.com/image/fetch/$s_!L-x6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L-x6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png" width="1456" height="706" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:706,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1397087,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L-x6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 424w, https://substackcdn.com/image/fetch/$s_!L-x6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 848w, https://substackcdn.com/image/fetch/$s_!L-x6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 1272w, https://substackcdn.com/image/fetch/$s_!L-x6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7fb38295-422a-4ab2-b9d9-9b52bc0b5e95_3302x1602.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!au21!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!au21!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 424w, https://substackcdn.com/image/fetch/$s_!au21!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 848w, https://substackcdn.com/image/fetch/$s_!au21!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!au21!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!au21!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png" width="1456" height="688" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:688,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:791523,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!au21!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 424w, https://substackcdn.com/image/fetch/$s_!au21!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 848w, https://substackcdn.com/image/fetch/$s_!au21!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!au21!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F924e0ebd-bf41-469a-b7d5-d0a60da741fe_2544x1202.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GPUDirect RDMA (Source: <a href="https://images.nvidia.com/content/gtc-kr/part_6_mellanox.pdf?utm_source=chatgpt.com">NVIDIA</a>)</figcaption></figure></div><h3>How programmers actually used it: CUDA-Aware MPI</h3><p>Next, it&#8217;s important to understand how exactly programmers used this feature. The real unsung heroes are the developers/maintainers of <strong>MPI</strong> (Message Passing Interface), which is a standard library used in HPC. It&#8217;s how partial results of computations are exchanged between nodes of a cluster, when trying to solve one giant physics problem. </p><p>Before GPUDirect RDMA, the MPI code had to copy over data from the GPU to the CPU&#8217;s memory using <code>cudaMemCpy(*gpu_mem_ptr, *host_mem_ptr)</code>, and then sent the data to the peer using  <code>MPI_Send(*host_mem_ptr).</code></p><p>With GPUDirect RDMA, the MPI libraries were updated so that you could invoke <code>MPI_Send(*gpu_mem_ptr)</code> directly. Underneath the hood, this would tell the NIC to go read the GPU memory directly. The <code>cudaMemCpy(gpu2host)</code> step was eliminated entirely.</p><p>Once again, the change to MPI looks deceptively simple but the implications and work involved to enable it were profound. In 2013, the MPI core developers <a href="https://www.youtube.com/watch?v=AxXfqTRC3ZU">presented this update publicly</a>, showing concrete improvements in both latency and throughput enabled by GPUDirect RDMA. It&#8217;s well worth watching that presentation to understand just how significant the shift was.</p><p>NVIDIA also published a detailed developer blog walking through how to use <a href="https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/">CUDA-aware MPI in practice</a>, and <a href="https://developer.nvidia.com/blog/benchmarking-gpudirect-rdma-on-modern-server-platforms/#:~:text=Oct%2007%2C%202014,exposed%20via%20CUDA%2Daware%20MPI.">benchmarking</a> <a href="https://developer.nvidia.com/blog/benchmarking-cuda-aware-mpi/">CUDA-Aware MPI</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4elF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4elF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 424w, https://substackcdn.com/image/fetch/$s_!4elF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 848w, https://substackcdn.com/image/fetch/$s_!4elF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!4elF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4elF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png" width="1456" height="688" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:688,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:865668,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184659153?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4elF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 424w, https://substackcdn.com/image/fetch/$s_!4elF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 848w, https://substackcdn.com/image/fetch/$s_!4elF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!4elF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F450c9b08-bda7-4306-9f87-8999204995fb_2544x1202.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>2019: GPUDirect Storage &#8212; Accelerating GPU to SSD data path</h1><p>By the end of the 2010s, you could see NVIDIA&#8217;s attention shifting to <strong>storage</strong>.</p>
      <p>
          <a href="https://www.chiplog.io/p/how-mellanox-set-the-stage-for-nvidias">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Analysis of NVIDIA’s Bluefield-4 DPU and KV-Cache Context Memory Storage Platform (CES 2026): Architecture, Strategy, Dynamo, WEKA, Enfabrica]]></title><description><![CDATA[At CES 2026, NVIDIA announced the Context Memory Storage Platform, a new appliance designed to expand KV cache capacity beyond the GPU rack.]]></description><link>https://www.chiplog.io/p/analysis-of-nvidias-bluefield-4-dpu</link><guid isPermaLink="false">https://www.chiplog.io/p/analysis-of-nvidias-bluefield-4-dpu</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 11 Jan 2026 07:01:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1eae3740-1a32-49a9-a6d7-011637767a06_2848x1504.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At CES 2026, NVIDIA announced the <strong>Context Memory Storage Platform</strong>, a new appliance designed to expand KV cache capacity beyond the GPU rack. The fanfare around this device is definitely warranted &#8212; but, in my view, is also <em>partially</em> misplaced.</p><p>From a hardware perspective, Supermicro sells a product, called a JBOF (Just a Bunch of Flash), much like this one, <a href="https://www.supermicro.com/solutions/Solution_Brief_JBOF-Petascale-NVIDIA-BlueField3-DPU.pdf">which can be purchased </a><strong><a href="https://www.supermicro.com/solutions/Solution_Brief_JBOF-Petascale-NVIDIA-BlueField3-DPU.pdf">today</a></strong>. This device uses the previous generation<strong> NVIDIA</strong> <strong>BlueField-3 DPU</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O-3f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O-3f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 424w, https://substackcdn.com/image/fetch/$s_!O-3f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 848w, https://substackcdn.com/image/fetch/$s_!O-3f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 1272w, https://substackcdn.com/image/fetch/$s_!O-3f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O-3f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png" width="678" height="347.8475274725275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af965e3d-e1d6-4584-acb2-fb47454d5a4d_2464x1264.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:747,&quot;width&quot;:1456,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:1560812,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf965e3d-e1d6-4584-acb2-fb47454d5a4d_2464x1264.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O-3f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 424w, https://substackcdn.com/image/fetch/$s_!O-3f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 848w, https://substackcdn.com/image/fetch/$s_!O-3f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 1272w, https://substackcdn.com/image/fetch/$s_!O-3f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67d41457-9be2-41f6-ac48-cfef6f81fd82_2464x1264.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Supermicro storage appliance with previous generation Bluefield-3 DPU</figcaption></figure></div><p>From a software perspective, NVIDIA has also invested in and partnered with <strong><a href="http://weka.io/">WEKA</a></strong>, which recently introduced a software offering called the <strong><a href="https://www.weka.io/product/augmented-memory-grid/">Augmented Memory Grid</a></strong>. This system leverages <strong>NVIDIA Dynamo</strong> and <strong>NIXL</strong> to connect GPU clusters like GB200 to petabyte-scale NVMe systems and provide fast, <strong>persistent KV cache storage</strong>. Crucially, it enables KV cache to move directly in and out of GPU HBM with minimal overhead. They call it the WEKA Token Warehouse (more on that later).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aCEn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aCEn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 424w, https://substackcdn.com/image/fetch/$s_!aCEn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 848w, https://substackcdn.com/image/fetch/$s_!aCEn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 1272w, https://substackcdn.com/image/fetch/$s_!aCEn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aCEn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png" width="660" height="375.936" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:712,&quot;width&quot;:1250,&quot;resizeWidth&quot;:660,&quot;bytes&quot;:545962,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aCEn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 424w, https://substackcdn.com/image/fetch/$s_!aCEn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 848w, https://substackcdn.com/image/fetch/$s_!aCEn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 1272w, https://substackcdn.com/image/fetch/$s_!aCEn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4a6ca8b-de4f-4908-a872-29365794f298_1250x712.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: weka.io</figcaption></figure></div><p>In the rest of this article, I&#8217;ll take a step back and unpack what all of this means, focusing on <strong>three key dimensions</strong>.</p><p>First, we&#8217;ll put some numbers in context and examine <strong>how big this problem is</strong> with KV cache in modern inference workloads and why is it a limiting factor for scale and utilization. </p><p>Next, I&#8217;ll walk through the <strong>system architecture</strong>. This includes what a DPU actually is, how the platform integrates with NVIDIA&#8217;s GB200, GB300, and Vera Rubin systems, and why the combination of this hardware with NVIDIA&#8217;s Dynamo software stack is the real differentiator, and not just the hardware alone.</p><p>Finally, I&#8217;ll focus on what I see as the most important part of the discussion: <strong>the</strong> <strong>strategy</strong>. I&#8217;ll zoom out and speculate on NVIDIA&#8217;s broader play here. Specifically exploring how this mirrors AWS&#8217;s strategy with Annapurna Labs, and how the $900M Enfabrica deal from last September might be the key to what comes next in this product&#8217;s roadmap. I&#8217;ll also look at how this fits into <strong>NVIDIA&#8217;s continued expansion across the data-center stack</strong> through its growing portfolio of six core chips and investments in companies such as <a href="https://www.vastdata.com/">VAST Data</a> and <a href="http://weka.io">WEKA</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W6Mv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W6Mv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 424w, https://substackcdn.com/image/fetch/$s_!W6Mv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 848w, https://substackcdn.com/image/fetch/$s_!W6Mv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 1272w, https://substackcdn.com/image/fetch/$s_!W6Mv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W6Mv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png" width="603" height="233.16552197802199" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77ab04e4-f34b-4f0e-bb58-1356a253bcf0_3222x1246.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1456,&quot;resizeWidth&quot;:603,&quot;bytes&quot;:632221,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ab04e4-f34b-4f0e-bb58-1356a253bcf0_3222x1246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W6Mv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 424w, https://substackcdn.com/image/fetch/$s_!W6Mv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 848w, https://substackcdn.com/image/fetch/$s_!W6Mv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 1272w, https://substackcdn.com/image/fetch/$s_!W6Mv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F986519c9-4473-45cd-b9a7-841ca61ee88e_3222x1246.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Source: NVIDIA</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p><h1>The Problem</h1><h4><em>How bad is the KV-Cache memory footprint?</em></h4><p>Let&#8217;s consider a classic LLM like GPT-3 with 175B parameters. </p><ul><li><p>Every token that the model generates has a memory footprint of roughly <strong>~4.5 MB</strong>. </p></li><li><p>For a user chat session with around 2,048 tokens (that&#8217;s roughly <strong>1,500 words</strong>), this translates to about <strong>10 GB</strong> of memory just for KV cache.</p></li></ul><p>In a <strong>GB200</strong> compute tray with 4&#215; Blackwell GPUs and <strong>744 GB total HBM3e</strong>, storing the model weights alone takes about <strong>350 GB</strong> (175B parameters &#215; 2 bytes for FP16). That leaves roughly <strong>400 GB </strong>for everything else &#8212; including KV cache.</p><p>At ~10 GB per user session, that&#8217;s <strong>only ~40 concurrent users per tray</strong>. Scaling to millions of users with these numbers is unrealistic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M5Ye!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M5Ye!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 424w, https://substackcdn.com/image/fetch/$s_!M5Ye!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 848w, https://substackcdn.com/image/fetch/$s_!M5Ye!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 1272w, https://substackcdn.com/image/fetch/$s_!M5Ye!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M5Ye!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png" width="1456" height="559" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:464947,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M5Ye!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 424w, https://substackcdn.com/image/fetch/$s_!M5Ye!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 848w, https://substackcdn.com/image/fetch/$s_!M5Ye!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 1272w, https://substackcdn.com/image/fetch/$s_!M5Ye!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F288ac873-3a24-4beb-8605-b37428335f9c_1890x726.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA</figcaption></figure></div><p>Of course, GPT-3 is old by today&#8217;s standards. Since then, there have been significant optimizations in attention mechanisms. <strong>Grouped-Query Attention (GQA)</strong>, introduced by Google and popularized by Meta&#8217;s Llama 2, and more recently DeepSeek&#8217;s <strong>Multi-Head Latent Attention (MLA)</strong>, have dramatically reduced KV cache size. These techniques bring the per-token KV cache footprint down from ~4.5 MB (4608 KB to be precise) to around <strong>~71 KB.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fdM4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fdM4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 424w, https://substackcdn.com/image/fetch/$s_!fdM4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 848w, https://substackcdn.com/image/fetch/$s_!fdM4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 1272w, https://substackcdn.com/image/fetch/$s_!fdM4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fdM4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png" width="504" height="193.84615384615384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db03a6cf-668c-40ae-b762-fe45f69dbd1b_2142x824.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:560,&quot;width&quot;:1456,&quot;resizeWidth&quot;:504,&quot;bytes&quot;:536601,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb03a6cf-668c-40ae-b762-fe45f69dbd1b_2142x824.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fdM4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 424w, https://substackcdn.com/image/fetch/$s_!fdM4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 848w, https://substackcdn.com/image/fetch/$s_!fdM4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 1272w, https://substackcdn.com/image/fetch/$s_!fdM4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9251af5a-c49e-4274-8421-46dd1224cd3d_2142x824.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Source: DeepSeek</figcaption></figure></div><p><em>But this doesn&#8217;t actually make the problem go away.</em></p><p>We are now firmly in the era of <strong>reasoning</strong> models. Even if the user-visible conversation is only a few hundred words, the model&#8217;s internal chain of thought can generate tens of thousands of tokens. All of that intermediate state becomes part of the KV cache, and it has to be stored somewhere.</p><p>If we assume a more realistic average of 10,000 tokens per session, a GB200 tray can <strong>still support fewer than 1,000 users</strong> per compute tray. This is why Jensen repeatedly emphasized in the keynote that KV cache management is <strong>one of the biggest pain points</strong> for AI labs, cloud service providers, and customers.</p><p>There&#8217;s also a product-level issue here. If I think about my own usage of ChatGPT or Gemini, I often pick up a chat session from days or weeks ago. That means KV cache can&#8217;t just live in GPU memory and disappear. It needs to be offloaded, stored, and retrieved later when the session resumes.</p><p><strong>KV cache is no longer a temporary artifact. It&#8217;s becoming persistent state.</strong></p><blockquote><p><em><strong>Note:</strong> I&#8217;ll be diving deep into the history of KV Cache optimization in my next post &#8212; including MHA, GQA, MLA, quantization, and compression. Subscribe to get notified when that drops.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p></blockquote><h1>The Solution</h1><h3>Expanding KV-Cache outside the rack</h3><p>Today, there is <em>some</em> breathing room. On a GB200 compute tray, alongside the 4 GPUs, you have <strong>2 Vera CPUs</strong> with a combined <strong>~1 TB of LPDDR5 memory</strong>. KV cache can be evicted from GPU HBM into CPU DRAM.</p><p>This is what Jensen was referring to in the keynote when he said, <em>&#8220;Right now, the GPUs in each node have 1 TB of space.&#8221;</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-m31!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-m31!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 424w, https://substackcdn.com/image/fetch/$s_!-m31!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 848w, https://substackcdn.com/image/fetch/$s_!-m31!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 1272w, https://substackcdn.com/image/fetch/$s_!-m31!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-m31!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png" width="1456" height="1109" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b5a2021-a9da-42a8-8631-99a0e445344e_2568x1956.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1109,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5705564,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5a2021-a9da-42a8-8631-99a0e445344e_2568x1956.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-m31!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 424w, https://substackcdn.com/image/fetch/$s_!-m31!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 848w, https://substackcdn.com/image/fetch/$s_!-m31!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 1272w, https://substackcdn.com/image/fetch/$s_!-m31!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33b96007-c9d0-4ad4-8c05-dcf422199ae4_2568x1956.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image source: NVIDIA. Annotation by Chip Log.</figcaption></figure></div><p>But this is still nowhere near enough. CPU memory is just another tier in the same rack, and it doesn&#8217;t solve the problem of large-scale, persistent context storage.</p><p>NVIDIA&#8217;s solution is to expand KV cache outside the rack entirely, using network-attached storage. To make this viable at scale, NVIDIA needs both a<strong> hardware solution </strong>and a <strong>software stack</strong> that understands <em>KV cache as something that can move fluidly across the memory hierarchy</em>.</p><h3>Hardware: BlueField-4 and the role of a DPU</h3><p>At the center of this solution is the <strong>BlueField-4 DPU</strong>. But, <em>what exactly is a DPU?</em></p><p>At a high level, a DPU lets you attach large amounts of storage (typically NVMe SSDs) on one side, and connect that storage to a GPU or CPU cluster on the other side using high-speed networking. The goal is to make this <strong>remote storage appear</strong> to the compute node <strong>as if it were locally attached</strong>.</p><p>For this to work, the connection must be both low latency and high bandwidth. Without a DPU, you would need a separate NIC, a CPU, and a PCIe switch to achieve something similar, and the result would be slower, more complex, and harder to scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wJWU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wJWU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 424w, https://substackcdn.com/image/fetch/$s_!wJWU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 848w, https://substackcdn.com/image/fetch/$s_!wJWU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 1272w, https://substackcdn.com/image/fetch/$s_!wJWU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wJWU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png" width="1456" height="422" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:422,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:266230,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wJWU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 424w, https://substackcdn.com/image/fetch/$s_!wJWU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 848w, https://substackcdn.com/image/fetch/$s_!wJWU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 1272w, https://substackcdn.com/image/fetch/$s_!wJWU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F307383bd-af78-4df3-a912-aaaf1d0b8aad_2568x744.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Supermicro</figcaption></figure></div><p>A DPU collapses all of this into a <strong>single ASIC</strong>. The BlueField-4 combines networking, general-purpose processing, PCIe switching, and hardware acceleration for features that are <strong>mandatory</strong> for storage expansion, such as VXLAN, encryption and decryption, and traffic management. Remember, these <strong>racks and pods are multi-tenant environments</strong>.</p><p>The BlueField line came to NVIDIA through the Mellanox acquisition in 2020, and BlueField-4 is the latest generation of that architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4PM7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4PM7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 424w, https://substackcdn.com/image/fetch/$s_!4PM7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 848w, https://substackcdn.com/image/fetch/$s_!4PM7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 1272w, https://substackcdn.com/image/fetch/$s_!4PM7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4PM7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png" width="548" height="393.6868131868132" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1046,&quot;width&quot;:1456,&quot;resizeWidth&quot;:548,&quot;bytes&quot;:2370170,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4PM7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 424w, https://substackcdn.com/image/fetch/$s_!4PM7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 848w, https://substackcdn.com/image/fetch/$s_!4PM7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 1272w, https://substackcdn.com/image/fetch/$s_!4PM7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59dba569-37ad-43d8-bc9d-283b5a9ae6da_2052x1474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">This is a die shot of the previous generation Bluefield-3 DPU</figcaption></figure></div><p>In the Inference Context Memory Storage Platform, NVIDIA uses <strong>four BlueField-4 DPUs</strong>. Each DPU is connected to roughly 150 TB of storage, for a <strong>total of about 600 TB</strong> per appliance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mZdp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mZdp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!mZdp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!mZdp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!mZdp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mZdp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2850671,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mZdp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!mZdp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!mZdp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!mZdp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda2c123b-adef-4228-933e-5bfe5a9a2953_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">NVIDIA Inference Context Memory Expander Platform. Source: NVIDIA</figcaption></figure></div><blockquote><p><em>Other notable companies that made DPUs were <strong>Fungible</strong> (acquired by Microsoft in 2023) and <strong>Pensando</strong> (acquired by AMD in 2022). Fungible was a failure, while Pensando was a success. We&#8217;ll explore these DPUs in a separate article.</em></p></blockquote><h3>Software: NVIDIA Dynamo, NIXL, KV Block Management, and DOCA</h3><p>Hardware alone isn&#8217;t sufficient. As I mentioned in the introduction, <em>you can already buy a <a href="https://www.supermicro.com/en/products/jbof">Supermicro JBOF today that uses BlueField-3 DPU</a>.</em> There is nothing fundamentally new about attaching NVMe over the network. The real differentiation is software.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mYWh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mYWh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 424w, https://substackcdn.com/image/fetch/$s_!mYWh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 848w, https://substackcdn.com/image/fetch/$s_!mYWh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 1272w, https://substackcdn.com/image/fetch/$s_!mYWh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mYWh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png" width="1456" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:690875,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mYWh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 424w, https://substackcdn.com/image/fetch/$s_!mYWh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 848w, https://substackcdn.com/image/fetch/$s_!mYWh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 1272w, https://substackcdn.com/image/fetch/$s_!mYWh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb7bd2cd-b5d5-457a-9940-c2000f488ee5_2410x530.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Supermicro JBOF (just a bunch of flash) with NVIDIA BF-3 DPUs. 3RU height with 4 x DPUs.</figcaption></figure></div><p>At GTC 2025, NVIDIA announced Dynamo, a new inference framework built from the ground up. Dynamo does many things, but one of its core goals is <strong>KV block management.</strong> This includes native support for evicting KV cache from GPU memory, offloading it to CPU memory or external storage, and retrieving it later.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7LdQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7LdQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 424w, https://substackcdn.com/image/fetch/$s_!7LdQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 848w, https://substackcdn.com/image/fetch/$s_!7LdQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 1272w, https://substackcdn.com/image/fetch/$s_!7LdQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7LdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png" width="1400" height="616" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:616,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:337596,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7LdQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 424w, https://substackcdn.com/image/fetch/$s_!7LdQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 848w, https://substackcdn.com/image/fetch/$s_!7LdQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 1272w, https://substackcdn.com/image/fetch/$s_!7LdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc759d242-e9fe-40c6-bf75-dab164cdfb43_1400x616.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA</figcaption></figure></div><p>A key part of this is the new asynchronous transport library called <strong>NIXL</strong>, which allows KV cache to move anywhere in the memory hierarchy&#8212;HBM, Grace or Vera CPU memory, or fully off-rack storage&#8212;without interrupting ongoing GPU computation.</p><p>The Inference Context Memory Storage Platform is the <strong>hardware counterpart to Dynamo</strong>. It is a purpose-built appliance with the necessary networking and data-processing features required to make large-scale KV cache offload and retrieval practical in NVL72 and NVL144 racks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7h6S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7h6S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7h6S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7h6S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7h6S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7h6S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg" width="1456" height="826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:128810,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7h6S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7h6S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7h6S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7h6S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b800b92-7645-42db-8073-dbba9de6290d_1872x1062.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source NVIDIA</figcaption></figure></div><p>While discussing NVIDIA Dynamo and Mellanox, it&#8217;s worth remembering that NVIDIA&#8217;s push toward greater control over AI storage goes back to its 2022 acquisition of <strong><a href="https://www.nextplatform.com/2022/03/07/with-excelero-storage-nvidia-now-owns-a-nearly-complete-hpc-ai-stack/">Excelero</a></strong>. That acquisition brought key technologies (such as the NVMesh, a low-latency block storage layer) into <strong>DOCA</strong> (Data Center Infrastructure-on-a-Chip Architecture), the <strong>software stack that runs on BlueField DPUs</strong>. It can be thought of as CUDA for NVIDIA&#8217;s networking and data-center infrastructure stack.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qv3b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qv3b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 424w, https://substackcdn.com/image/fetch/$s_!Qv3b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 848w, https://substackcdn.com/image/fetch/$s_!Qv3b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 1272w, https://substackcdn.com/image/fetch/$s_!Qv3b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qv3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png" width="687" height="172.22184065934067" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7ce9adc-942f-4122-9c45-030919758559_1972x494.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:365,&quot;width&quot;:1456,&quot;resizeWidth&quot;:687,&quot;bytes&quot;:112402,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/184085884?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ce9adc-942f-4122-9c45-030919758559_1972x494.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qv3b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 424w, https://substackcdn.com/image/fetch/$s_!Qv3b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 848w, https://substackcdn.com/image/fetch/$s_!Qv3b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 1272w, https://substackcdn.com/image/fetch/$s_!Qv3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc179beea-8fc2-4152-a9ff-988f41d26e13_1972x494.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Source: NVIDIA Bluefield-4 DPU data sheet</figcaption></figure></div><h1>The Strategy</h1><h3>Disaggregating Compute and Context</h3><p>This entire announcement strongly reminds me of AWS&#8217;s approach to disaggregated compute and storage.</p><p>AWS acquired <strong>Annapurna Labs</strong> in 2015 for roughly $350&#8211;370 million to solve a fundamental efficiency problem in their data centers known as the "virtualization tax". At the time, AWS was reliant on commodity chips from Intel and AMD. They faced two major issues:</p><ol><li><p><strong>The "Virtualization Tax":</strong> A significant portion (estimated around 30%) of a server's processing power was being wasted just managing the overhead of the cloud (running the hypervisor, networking, and security protocols) rather than running customer applications.</p></li><li><p><strong>Lack of Control:</strong> AWS was stuck following Intel's roadmap and release cycles, which were slowing down their ability to innovate or reduce costs.</p></li></ol><p>Annapurna Labs provided the solution with what eventually became the <strong>AWS Nitro System</strong>. This technology allowed AWS to offload those "overhead" tasks (storage, networking, security) onto a dedicated, low-cost card.</p><p>This move unlocked massive value. AWS could now sell nearly 100% of a server's resources to customers, as the main CPU was no longer burdened by administrative tasks. It also dramatically improved network and storage performance, <strong>making remote storage feel like a local drive</strong>.</p><p>This was a massive win for the end users and customers as well. One of AWS&#8217;s most important innovations was Elastic Block Storage (EBS). Compute and storage were decoupled, and the software layer along with specialized hardware made it easy to independently choose instance type and storage size depending on application. </p><p>NVIDIA appears to be following a very similar playbook. <strong>Mellanox, Excelero, and BlueField DPU are the Nitro equivalent.</strong> The Context Memory Storage Platform is the mechanism for scaling storage attached to a GB200 or Vera Rubin superchip.</p><p>Except instead of block storage, the resource being disaggregated is <strong>model context</strong>.</p><h3>Scaling up to PetaByte/ExaByte token warehouses, the Enfabrica deal, and what might come next</h3>
      <p>
          <a href="https://www.chiplog.io/p/analysis-of-nvidias-bluefield-4-dpu">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[A Deep Dive into NVIDIA Rubin CPX: History, Architecture, Splitwise/DistServe, Inference Economics, and Limitations]]></title><description><![CDATA[A first principles analysis of why NVIDIA decided to make this new class of accelerator]]></description><link>https://www.chiplog.io/p/a-deep-dive-into-nvidia-rubin-cpx</link><guid isPermaLink="false">https://www.chiplog.io/p/a-deep-dive-into-nvidia-rubin-cpx</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sat, 27 Dec 2025 21:48:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f6c59430-dc51-4ece-9397-c8ff148f854a_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Introduction</h1><p>The Grace-Blackwell CPU + GPU couplet will be succeeded by the new Vera-Rubin platform. In addition to the LPDDR-based Vera CPU and the HBM4-based Rubin GPU, there&#8217;s now a third processor in the mix: the GDDR7-based <strong>Rubin CPX.</strong> </p><p>The idea behind this new processor is straightforward &#8212; make inference cheaper for providers while improving performance for users. This is done by separating the two stages of LLM inference, prefill and decode, and have the <strong>CPX run the prefill step</strong> and let the <strong>Rubin GPU run the decode step</strong>. </p><p>At GTC in March 2025 <a href="https://developer.nvidia.com/dynamo">NVIDIA released Dynamo</a>, their new inference framework, which makes disaggregate inference a first-class citizen and at the AI Summit in September 2025 they <a href="https://nvidianews.nvidia.com/news/nvidia-unveils-rubin-cpx-a-new-class-of-gpu-designed-for-massive-context-inference">announced Rubin CPX</a>. This marks NVIDIA&#8217;s firm commitment towards this disaggregated model of inference.</p><p>In this article, we&#8217;ll take a step back and analyze from first principles: </p><ul><li><p>The core issues with LLM inference and what are prefill and decode.</p></li><li><p>Why separating prefill and decode helps, and the research that uncovered these optimizations. </p></li><li><p>Then we&#8217;ll walk through the Rubin CPX architecture, examine its capabilities, limitations, and compare it with Blackwell and Rubin GPUs.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p><h1>What makes LLM inference difficult</h1><h3>Quality of Service</h3><p>If you&#8217;re a paying user of ChatGPT, Gemini, or Claude Code, you have certain expectations about how responsive the service should be. The industry term for these expectations is called <strong>Service Level Agreement (SLA)</strong>. For LLMs, two metrics determine whether that SLA is being met: <strong>Time to First Token (TTFT)</strong> and <strong>Time Per Output Token (TPOT)</strong>. </p><p>These metrics are fairly intuitive. In a chatbot, TTFT is the time between submitting a prompt and seeing the first token appear. This phase of of processing is called <strong>Prefill</strong>. While TPOT is the speed at which each subsequent token is generated. This is the <strong>Decode </strong>phase.</p><p>Different applications have different requirements. A chatbot like ChatGPT needs a fast TTFT so the response feels immediate, but TPOT only needs to keep up with normal reading speed. In summarization, users tolerate a slower TTFT, but once the summary starts, they expect the rest of the output to appear quickly, so TPOT matters more. With tools like Claude Code, both TTFT and TPOT need to be fast.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8uIQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8uIQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 424w, https://substackcdn.com/image/fetch/$s_!8uIQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 848w, https://substackcdn.com/image/fetch/$s_!8uIQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!8uIQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8uIQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png" width="1456" height="650" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8143b42-4854-442d-b138-342375278a6a_2240x1000.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6dd14ae1-7932-41c6-9497-490e4be052ae_2240x1000.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:650,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8uIQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 424w, https://substackcdn.com/image/fetch/$s_!8uIQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 848w, https://substackcdn.com/image/fetch/$s_!8uIQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!8uIQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8143b42-4854-442d-b138-342375278a6a_2240x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Visualizing TTFT (Time to First Token) and TPOT (Time Per Output Token)</figcaption></figure></div><p>What complicates things further for inference providers is that <strong>memory and compute demands vary dramatically across applications</strong>. The size of the input, (prefill phase) and the amount of output (decode phase) can differ by orders of magnitude. Text-to-image prompts are short but produce large outputs. Summarization and code generation often have long inputs but shorter outputs, and chat sits somewhere in between. </p><h3>Economics</h3><p>Managing the economics of LLM serving while meeting SLA is tricky. Different workload shapes from different customers make it hard to use the same hardware efficiently without breaking someone&#8217;s SLA.</p><p>For simplicity, imagine OpenAI is running ChatGPT on a single NVIDIA H100. To maximize profit, they want to serve as many user requests as possible on that one GPU while still meeting TTFT and TPOT targets. <strong>That effective &#8220;packing&#8221; of users per GPU is the batch size</strong>. If they try to fit too many users, latency spikes and people get frustrated. If they&#8217;re too conservative, they end up needing a prohibitively large GPU cluster to serve their millions of users.</p><p>To make things worse, <strong>you don&#8217;t always need many users to overload a system</strong>, you just need the wrong mix. Even <strong>with just two users</strong>, one user might ask, <em>&#8220;I want to learn the piano. Give me step-by-step instructions and resources.&#8221;</em> At the same time, another user <em>uploads a</em> <em>50,000-word document and asks for a summary</em>. On a shared GPU, your TTFT might look fine, but once the long summarization request arrives, your TPOT can get hammered, and you&#8217;ll see a noticeable pause in the middle of your chat.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zYmi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zYmi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 424w, https://substackcdn.com/image/fetch/$s_!zYmi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 848w, https://substackcdn.com/image/fetch/$s_!zYmi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 1272w, https://substackcdn.com/image/fetch/$s_!zYmi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zYmi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png" width="1456" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zYmi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 424w, https://substackcdn.com/image/fetch/$s_!zYmi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 848w, https://substackcdn.com/image/fetch/$s_!zYmi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 1272w, https://substackcdn.com/image/fetch/$s_!zYmi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cd7be74-ada9-40dc-87e1-765e8c5606b3_3124x804.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So in order to maximize revenue while meeting SLAs and keeping users happy, we need to do two things: </p><ol><li><p>Understand how different applications stress the GPU, i.e., what their workloads actually look like in terms of compute and memory</p></li><li><p>Find an efficient way to batch and schedule these requests so we can get the most throughput out of the hardware while still hitting TTFT and TPOT targets.</p></li></ol><h1>How we got to today&#8217;s LLM inference</h1><p>Next, we&#8217;ll look at how LLM inference has evolved over the years through the lens of four seminal papers. These papers offer valuable insights, and <strong>understanding what they uncovered is key to seeing the path that led to Rubin CPX</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HS4E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HS4E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 424w, https://substackcdn.com/image/fetch/$s_!HS4E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 848w, https://substackcdn.com/image/fetch/$s_!HS4E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!HS4E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HS4E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png" width="1456" height="452" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8aca8236-d1f2-4e9b-ad59-87b0b47ac0b7_3488x1084.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:452,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HS4E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 424w, https://substackcdn.com/image/fetch/$s_!HS4E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 848w, https://substackcdn.com/image/fetch/$s_!HS4E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!HS4E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04e9a6d3-5614-4d94-9abc-1d4a48e4aa35_3488x1084.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Notable works that optimized LLM inference serving</figcaption></figure></div><h3>ORCA: 2022</h3><pre><code><em><strong>Paper: </strong><a href="https://www.usenix.org/conference/osdi22/presentation/yu">ORCA: A Distributed Serving System for Transformer-Based Generative Models</a>
<strong>
Key innovations:</strong>
  - Iteration-level scheduling
  - Selective and continuous batching</em></code></pre><p>Inference serving systems usually have two parts: an <strong>inference server or scheduler</strong>, which receives and batches user requests, and an <strong>execution engine</strong>, which issues kernels to the GPU. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MdsF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MdsF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 424w, https://substackcdn.com/image/fetch/$s_!MdsF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 848w, https://substackcdn.com/image/fetch/$s_!MdsF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 1272w, https://substackcdn.com/image/fetch/$s_!MdsF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MdsF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png" width="581" height="217.25309229305424" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:393,&quot;width&quot;:1051,&quot;resizeWidth&quot;:581,&quot;bytes&quot;:450528,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MdsF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 424w, https://substackcdn.com/image/fetch/$s_!MdsF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 848w, https://substackcdn.com/image/fetch/$s_!MdsF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 1272w, https://substackcdn.com/image/fetch/$s_!MdsF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56ae34b5-1459-4f02-8841-e1861bcd4cd7_1051x393.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.usenix.org/conference/osdi22/presentation/yu">ORCA presentation at OSDI &#8216;22</a></figcaption></figure></div><p>For models like ResNet, used in image recognition, this setup is simple. Every request is a single forward pass through the network, all inputs have the same shape, batching is trivial, and the end-to-end latency is predictable. When the batch is done processing, the users get their responses. This is called <strong>request-level scheduling</strong>. </p><p>The authors of this paper showed that when you apply this kind of request-level serving system to LLMs, where each request varies widely in length and compute time, you run into two major problems.</p><ul><li><p>For instance, in a setup with NVIDIA <a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html">Triton</a> (as the server) + <a href="https://github.com/NVIDIA/FasterTransformer">FasterTransformer</a> (as the execution engine), if three user requests were batched together, even if one finished early, that user wouldn&#8217;t get a response until all jobs in the batch were done. </p></li><li><p>If new requests arrived while a batch was running, they had to wait for the entire batch to finish before being scheduled, even if there were empty slots available. </p></li></ul><p>The diagram below shows how these issues lead to high latency and poor GPU utilization, but the <strong>bigger consequence is its implication on cost of inference</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vE7B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vE7B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 424w, https://substackcdn.com/image/fetch/$s_!vE7B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 848w, https://substackcdn.com/image/fetch/$s_!vE7B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 1272w, https://substackcdn.com/image/fetch/$s_!vE7B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vE7B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png" width="529" height="618.3777472527472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d731c6f-8548-4d7c-9f89-1b93d83a5fbe_2844x3324.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1702,&quot;width&quot;:1456,&quot;resizeWidth&quot;:529,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!vE7B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 424w, https://substackcdn.com/image/fetch/$s_!vE7B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 848w, https://substackcdn.com/image/fetch/$s_!vE7B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 1272w, https://substackcdn.com/image/fetch/$s_!vE7B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F923fb70b-19ec-4594-beca-1f3ebdf28dbe_2844x3324.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>ORCA replaced request level scheduling with <strong>iteration level scheduling</strong>, so batching happens on a per-token basis instead of per-request. So when a user&#8217;s job is done, it returns immediately. This also allowed <strong>new requests to be batched in continuously</strong> as older ones relinquished their slot.</p><p>Fixing these issues increased throughput. With ORCA, the cost to run a GPT3-175B model, running on 2 nodes each with 8 x A100 GPUs, went down from <strong>$476,000/month to $14,000/month.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HZy5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HZy5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 424w, https://substackcdn.com/image/fetch/$s_!HZy5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 848w, https://substackcdn.com/image/fetch/$s_!HZy5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 1272w, https://substackcdn.com/image/fetch/$s_!HZy5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HZy5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png" width="557" height="478.5492957746479" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:793,&quot;width&quot;:923,&quot;resizeWidth&quot;:557,&quot;bytes&quot;:915559,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HZy5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 424w, https://substackcdn.com/image/fetch/$s_!HZy5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 848w, https://substackcdn.com/image/fetch/$s_!HZy5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 1272w, https://substackcdn.com/image/fetch/$s_!HZy5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7af2d1-6888-452b-a1fa-2683d5ee793e_923x793.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.usenix.org/conference/osdi22/presentation/yu">ORCA presentation at OSDI &#8216;22</a></figcaption></figure></div><h3>SARATHI: 2023</h3><pre><code><em><strong>Paper: </strong><a href="https://arxiv.org/abs/2308.16369">SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills</a>

<strong>Key innovation:</strong> Chunked prefill</em></code></pre><p>This paper built on ORCA and <strong>reduced inference costs by another 25%</strong>. The key insight was that the prefill and decode stages have very different compute utilization patterns. <strong>Prefill can saturate a GPU even with a single request</strong>, while <strong>decode only becomes compute-efficient at large batch sizes</strong>. In other words, prefills were efficient, but decode suffered from poor GPU utilization. </p><p>Another issue in ORCA was, even within an active batch, there was some serialization happening, the requests were not parallelized efficiently. This caused &#8220;bubbles&#8221;, especially when a large model was split across GPUs. ORCA itself acknowledges this limitation, and this paper focused on addressing it.</p><p>SARATHI split each prefill into equal-sized chunks and <strong>built batches containing one prefill chunk and multiple decode requests</strong>. With this setup, during inference, the prefill chunk fully saturated the GPU, allowing the decode requests to &#8220;piggyback&#8221; at a much lower cost, up to an order of magnitude cheaper than running a decode-only batch.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ccgQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ccgQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 424w, https://substackcdn.com/image/fetch/$s_!ccgQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 848w, https://substackcdn.com/image/fetch/$s_!ccgQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 1272w, https://substackcdn.com/image/fetch/$s_!ccgQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ccgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png" width="543" height="597.3836671802774" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1428,&quot;width&quot;:1298,&quot;resizeWidth&quot;:543,&quot;bytes&quot;:414213,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ccgQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 424w, https://substackcdn.com/image/fetch/$s_!ccgQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 848w, https://substackcdn.com/image/fetch/$s_!ccgQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 1272w, https://substackcdn.com/image/fetch/$s_!ccgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e85b4fb-1372-4cb5-b385-bde318924101_1298x1428.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: SARATHI paper</figcaption></figure></div><p>Although SARATHI improved decode throughput by up to 10&#215;, it introduced one drawback. Chunking made prefills slower. As a result, the overall speedup over ORCA was only about 1.25&#215;. Here&#8217;s how the paper describes it:</p><blockquote><p><em>We note that although we improve decode efficiency by up to an order of magnitude, the end-to-end speedups and in turn monetary savings in inference cost are in the order of [only] 25%. <strong>This is because our technique only improves decodes and not prefills.</strong></em></p></blockquote><h3>Splitwise &amp; DistServe: 2024</h3><pre><code><em><strong>Papers: </strong><a href="https://arxiv.org/abs/2311.18677">Splitwise</a> &amp; <a href="https://arxiv.org/abs/2401.09670">DistServe</a></em>

<em><strong>Key innovation:</strong></em> <em>Disaggregated prefill and decode</em></code></pre><p>This brings us to the two papers most relevant to Rubin CPX: <strong>Splitwise</strong> and <strong>DistServe</strong>. These parallel efforts showed the benefits of fully disaggregating prefill and decode and running them on separate GPU clusters. Splitwise demonstrated that disaggregation can deliver up to <strong>1.4&#215; higher throughput at 20% lower cost</strong> compared to SARATHI, or <strong>2.35&#215; more throughput</strong> under the same power and cost budget. They also showed that the decode phase can run on less compute-capable hardware with better perf/W and perf/$.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JyHG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JyHG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 424w, https://substackcdn.com/image/fetch/$s_!JyHG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 848w, https://substackcdn.com/image/fetch/$s_!JyHG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 1272w, https://substackcdn.com/image/fetch/$s_!JyHG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JyHG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png" width="1456" height="473" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32e970e6-1737-497f-9010-68abd21c5bdf_2608x848.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:473,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JyHG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 424w, https://substackcdn.com/image/fetch/$s_!JyHG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 848w, https://substackcdn.com/image/fetch/$s_!JyHG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 1272w, https://substackcdn.com/image/fetch/$s_!JyHG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc84a71be-81c1-43f5-a8bb-46933807ed19_2608x848.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Disaggregated prefill and decode</figcaption></figure></div><p>The authors of Splitwise clearly captured the impact of disaggregation. Their baseline was the standard setup where prefill and decode run together on A100 or H100 GPUs. They then compared this against configurations where prefill and decode were split across different machine types. The two charts below show throughput under iso-power (same power budget) and iso-cost (same server and operating cost). My annotations explain how to read them.</p><pre><code>Machine combinations simulated:

+ A100.Prefill + A100.Decode
+ H100.Prefill + H100.Decode
+ H100.Prefill + A100.Decode
+ H100.Prefill + H100Cap.Decode (H100 cluster for decode were power capped).</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Bzz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Bzz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 424w, https://substackcdn.com/image/fetch/$s_!1Bzz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 848w, https://substackcdn.com/image/fetch/$s_!1Bzz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 1272w, https://substackcdn.com/image/fetch/$s_!1Bzz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Bzz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png" width="1291" height="901" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c15fcaf5-505e-4171-931e-df9d387faff7_1291x901.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:901,&quot;width&quot;:1291,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:434576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc15fcaf5-505e-4171-931e-df9d387faff7_1291x901.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Bzz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 424w, https://substackcdn.com/image/fetch/$s_!1Bzz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 848w, https://substackcdn.com/image/fetch/$s_!1Bzz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 1272w, https://substackcdn.com/image/fetch/$s_!1Bzz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd0be88-04fe-493f-ac5b-24907555dd18_1291x901.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: Splitwise paper. Annotations by Chiplog.</figcaption></figure></div><p>DistServe contributed additional ideas. Given a model, workload, latency target, and machine types, their algorithm determines how many machines to allocate for prefill and decode and what parallelism strategies to use for each of them.</p><h4><em>KV-Cache transfer</em></h4><p>One inherent overhead in disaggregation is KV-cache transfer. After the prefill cluster processes the prompt, the KV-cache must be moved into the decode cluster&#8217;s memory. Both papers analyzed this cost and proposed strategies to reduce its impact.</p><h3>NVIDIA Dynamo: GTC 2025</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cFBW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cFBW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 424w, https://substackcdn.com/image/fetch/$s_!cFBW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 848w, https://substackcdn.com/image/fetch/$s_!cFBW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!cFBW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cFBW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png" width="587" height="333.00961538461536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:587,&quot;bytes&quot;:318678,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cFBW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 424w, https://substackcdn.com/image/fetch/$s_!cFBW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 848w, https://substackcdn.com/image/fetch/$s_!cFBW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!cFBW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F281501d9-8fd4-4783-be31-44c8660027a6_1872x1062.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA (GTC 2025)</figcaption></figure></div><p>Finally, in March 2025 at GTC, NVIDIA announced <strong>Dynamo</strong>, their new inference framework. Like ORCA, Splitwise, and DistServe, it acts as the orchestrator, bringing together many of the ideas from these papers and adding several new capabilities of its own.</p><blockquote><p><strong>Notable:</strong></p><p><em>In the GTC session, NVIDIA mentioned that, in 2024 and 2025 they met some Chinese customers who were <a href="https://www.youtube.com/watch?v=3C-6STonTLU&amp;t=4383s">pairing H800 GPUs for prefill and H20 GPUs for decode</a> and that brought a lot of cost savings.</em> </p></blockquote><p>Some of Dynamo&#8217;s key features include:</p><ul><li><p><strong>KV-cache aware request routing</strong>, which sends user requests to the right deployment cluster based on cache locality.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rORa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rORa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 424w, https://substackcdn.com/image/fetch/$s_!rORa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 848w, https://substackcdn.com/image/fetch/$s_!rORa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!rORa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rORa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png" width="1456" height="786" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:493795,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!rORa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 424w, https://substackcdn.com/image/fetch/$s_!rORa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 848w, https://substackcdn.com/image/fetch/$s_!rORa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!rORa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb62585ca-08da-4998-bd00-d26b98cc8ff3_2016x1088.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA (GTC 2025)</figcaption></figure></div></li><li><p>A <strong>KV-Block Manager</strong> that can offload and retrieve KV-cache, significantly improving performance for workloads like code generation and multi-turn conversations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VF5K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VF5K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 424w, https://substackcdn.com/image/fetch/$s_!VF5K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 848w, https://substackcdn.com/image/fetch/$s_!VF5K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!VF5K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VF5K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png" width="1456" height="805" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:805,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:401816,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!VF5K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 424w, https://substackcdn.com/image/fetch/$s_!VF5K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 848w, https://substackcdn.com/image/fetch/$s_!VF5K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 1272w, https://substackcdn.com/image/fetch/$s_!VF5K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684181d5-c03a-476d-a1eb-4050e85e3245_1898x1050.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA (GTC 2025)</figcaption></figure></div></li><li><p><strong>Production-grade serving tools</strong>, including fault tolerance and auto-scaling for separate prefill and decode clusters.</p></li><li><p>The <strong>NIXL library</strong>, which provides an asynchronous peer-to-peer transfer API for moving data, such as KV-cache, between prefill and decode machines and across memory hierarchies. The async behavior is important because it lets communication overlap with computation. This differs from NCCL, which is mainly designed for collective operations and is not asynchronous.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!29DS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!29DS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 424w, https://substackcdn.com/image/fetch/$s_!29DS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 848w, https://substackcdn.com/image/fetch/$s_!29DS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 1272w, https://substackcdn.com/image/fetch/$s_!29DS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!29DS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png" width="1268" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:1268,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:231483,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!29DS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 424w, https://substackcdn.com/image/fetch/$s_!29DS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 848w, https://substackcdn.com/image/fetch/$s_!29DS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 1272w, https://substackcdn.com/image/fetch/$s_!29DS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f90c797-ac8b-4cd7-8d18-cb7fec355c84_1268x570.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA</figcaption></figure></div></li><li><p>An <strong>AI Configurator</strong>, which recommends the best deployment setup (cluster size, parallelism strategies, and so on) based on the model and latency requirements.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cy-q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cy-q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 424w, https://substackcdn.com/image/fetch/$s_!cy-q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 848w, https://substackcdn.com/image/fetch/$s_!cy-q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 1272w, https://substackcdn.com/image/fetch/$s_!cy-q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cy-q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png" width="486" height="497.0151098901099" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1489,&quot;width&quot;:1456,&quot;resizeWidth&quot;:486,&quot;bytes&quot;:318327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cy-q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 424w, https://substackcdn.com/image/fetch/$s_!cy-q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 848w, https://substackcdn.com/image/fetch/$s_!cy-q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 1272w, https://substackcdn.com/image/fetch/$s_!cy-q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0378ad93-0abe-41ca-83cd-8e4e2bce70fa_1518x1552.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Dyanmo transaction flow. (Source: NVIDIA, GTC 2025)</figcaption></figure></div><div><hr></div><h1>Rubin CPX</h1><p>This finally brings us to Rubin CPX. In my article <em><a href="https://www.chiplog.io/p/3-great-examples-of-how-asics-and">Three examples of how ASICs and FPGAs are used as accelerators</a></em>, I describe a recurring pattern in engineering: whenever possible, software optimizations transition into specialized hardware. With NVIDIA Dynamo and inference engines like vLLM pushing the industry toward disaggregated inference, the logical next step is to formalize the architecture and <strong>build hardware tailored to the use-case</strong> rather than relying on general-purpose GPUs, and <strong>Rubin CPX is exactly tha</strong>t.</p><p>To appreciate the capabilities of this new processor, it helps to compare it with NVIDIA&#8217;s broader GPU lineup. A quick look at the spec sheet shows that Rubin CPX isn&#8217;t a watered-down B200 or Rubin GPU. It&#8217;s much closer to an enhanced version of the <a href="https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/">NVIDIA RTX 6000 PRO</a> Blackwell workstation-class GPU.</p><p>This is evident from the inclusion of NVENC/NVDEC video engines (to accelerate video generation workloads). Primary connectivity is PCIe and there&#8217;s no NVLink. The memory is GDDR7 and not HBM. Let&#8217;s dig into these further.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1DKh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1DKh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 424w, https://substackcdn.com/image/fetch/$s_!1DKh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 848w, https://substackcdn.com/image/fetch/$s_!1DKh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 1272w, https://substackcdn.com/image/fetch/$s_!1DKh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1DKh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png" width="1456" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d328b723-e778-40bc-92b3-41d013c66981_2882x1722.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1747153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176137303?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd328b723-e778-40bc-92b3-41d013c66981_2882x1722.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1DKh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 424w, https://substackcdn.com/image/fetch/$s_!1DKh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 848w, https://substackcdn.com/image/fetch/$s_!1DKh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 1272w, https://substackcdn.com/image/fetch/$s_!1DKh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff670ebe4-ad59-4079-a997-3e0ecca9893a_2882x1722.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparing CPX with RTX 6000 PRO</figcaption></figure></div><h3>Compute</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HS0b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HS0b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 424w, https://substackcdn.com/image/fetch/$s_!HS0b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 848w, https://substackcdn.com/image/fetch/$s_!HS0b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 1272w, https://substackcdn.com/image/fetch/$s_!HS0b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HS0b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png" width="1456" height="175" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af6a1323-cbac-4f54-99a0-ac58edd4d76c_2724x328.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:175,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HS0b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 424w, https://substackcdn.com/image/fetch/$s_!HS0b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 848w, https://substackcdn.com/image/fetch/$s_!HS0b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 1272w, https://substackcdn.com/image/fetch/$s_!HS0b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb79d5c15-2182-4807-8d01-39c1013f89da_2724x328.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Comparing compute between RTX 6000 PRO, CPX and Rubin GPU</figcaption></figure></div><p>This CPX delivers 30 PFLOPS of sparse FP4 compute, which is substantial considering the dual-die Rubin GPU does 50 PFLOPS. So, on a per-die basis, the CPX provides more FP4 compute than a single Rubin GPU die.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"><em>Chiplog is a reader-supported publication.</em> <em>Right now memberships are <strong>25% off.</strong> That&#8217;s $74/year (~$6/month). Thanks for your support!</em></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>
      <p>
          <a href="https://www.chiplog.io/p/a-deep-dive-into-nvidia-rubin-cpx">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Analysis of NVIDIA DGX Spark's GB10 SoC]]></title><description><![CDATA[Architecture, methodology, and memory interface of the GB10 SoC]]></description><link>https://www.chiplog.io/p/analysis-of-nvidia-dgx-sparks-gb10</link><guid isPermaLink="false">https://www.chiplog.io/p/analysis-of-nvidia-dgx-sparks-gb10</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Tue, 21 Oct 2025 04:57:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/638d5411-29d0-4b6b-b73c-0c33b8217165_1042x678.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article takes a closer look at the <strong>Grace-Blackwell GB10 SoC</strong>, the chip powering NVIDIA&#8217;s newly released <strong>DGX Spark</strong>. Here&#8217;s a quick timeline of how we got here:</p><ul><li><p><strong>January 2025 :</strong> NVIDIA <a href="https://www.youtube.com/live/k82RwXqZHY8?si=vc8S7RoAmnDW5DSB&amp;t=4956">announced </a><em><a href="https://www.youtube.com/live/k82RwXqZHY8?si=vc8S7RoAmnDW5DSB&amp;t=4956">Project DIGITS </a></em><a href="https://www.youtube.com/live/k82RwXqZHY8?si=vc8S7RoAmnDW5DSB&amp;t=4956">at CES</a>, revealing the form factor, compute capabilities, and a <a href="https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips">new partnership with </a><strong><a href="https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips">MediaTek</a></strong><a href="https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips"> to co-develop the </a><strong><a href="https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips">GB10</a></strong><a href="https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips"> chip</a>.</p></li><li><p><strong>March 2025:</strong> The project was <a href="https://nvidianews.nvidia.com/news/nvidia-announces-dgx-spark-and-dgx-station-personal-ai-computers">officially renamed </a><strong><a href="https://nvidianews.nvidia.com/news/nvidia-announces-dgx-spark-and-dgx-station-personal-ai-computers">DGX Spark</a></strong>, replacing the DIGITS codename.</p></li><li><p><strong>September 2025:</strong> Chief Architect of the GB10 SoC, Andi Skende<strong>,</strong> shared its architecture and development process in great detail at the <strong><a href="https://hc2025.hotchips.org/#clip=16hoeeeh601w">Hot Chips conference</a></strong><a href="https://hc2025.hotchips.org/#clip=16hoeeeh601w">.</a> </p></li><li><p><strong>October 2025:</strong> <strong>DGX Spark</strong> officially began shipping.</p></li></ul><h1>Architecture</h1><h3>2.5D</h3><p>Like other <strong>Grace-Hopper (GH)</strong> and <strong>Grace-Blackwell (GB)</strong> configurations, the <strong>GB10</strong> has a CPU and a GPU component. However, unlike its higher-end counterparts, the <strong>GH200</strong> and <strong>GB200</strong>, where the CPU and GPU are housed in separate packages, the GB10 uses a <strong>multi-die package</strong>. In this design, the CPU and GPU dies are connected via<strong> C2C</strong> interconnect over an <strong>interposer</strong>, following a <strong>2.5D integration</strong> approach.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4g0i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4g0i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 424w, https://substackcdn.com/image/fetch/$s_!4g0i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 848w, https://substackcdn.com/image/fetch/$s_!4g0i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 1272w, https://substackcdn.com/image/fetch/$s_!4g0i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4g0i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png" width="500" height="460.44303797468353" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb1661a1-2f24-4f20-9ba7-092d49f61c4b_632x582.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:582,&quot;width&quot;:632,&quot;resizeWidth&quot;:500,&quot;bytes&quot;:1481354,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176499817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb1661a1-2f24-4f20-9ba7-092d49f61c4b_632x582.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4g0i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 424w, https://substackcdn.com/image/fetch/$s_!4g0i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 848w, https://substackcdn.com/image/fetch/$s_!4g0i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 1272w, https://substackcdn.com/image/fetch/$s_!4g0i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b953e6c-d8eb-450e-8817-01cd2a832e83_632x582.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GB10 SoC [Source: NVIDIA, Hot Chips 2025]</figcaption></figure></div><h3>Not really the Grace CPU</h3><p>In my opinion, this was the biggest surprise. Although the CPU is labeled &#8220;Grace,&#8221; it&#8217;s <em>not</em> a cut-down version of the Grace CPU used in GH200, GB200, or the Grace Superchip systems. Instead, it&#8217;s a <strong>20-Core</strong> <strong>ARM CPU IP developed by MediaTek</strong>.</p><p>While NVIDIA had announced its partnership with MediaTek back in January, I did not expect such a deep collaboration. This isn&#8217;t a small integration, it&#8217;s a full-fledged co-design effort.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jm6Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 424w, https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 848w, https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 1272w, https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png" width="1456" height="1056" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76894fe2-cbd0-4ac3-bfe4-ad53524e712e_1675x1215.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1056,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:278826,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176499817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76894fe2-cbd0-4ac3-bfe4-ad53524e712e_1675x1215.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 424w, https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 848w, https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 1272w, https://substackcdn.com/image/fetch/$s_!Jm6Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F164a3e77-af58-484d-928c-22bce47b1e34_1675x1215.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">CPU co-design: Primarily an IP from MediaTek, with C2C and Display IPs from NVIDIA. [Source: NVIDIA, Hot Chips 2025, Annotation by Chiplog]</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yijj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yijj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 424w, https://substackcdn.com/image/fetch/$s_!Yijj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 848w, https://substackcdn.com/image/fetch/$s_!Yijj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!Yijj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yijj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png" width="1456" height="674" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:674,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:202757,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176499817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yijj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 424w, https://substackcdn.com/image/fetch/$s_!Yijj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 848w, https://substackcdn.com/image/fetch/$s_!Yijj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!Yijj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc493d35-1915-4c31-8c56-62a1d9431bc7_2746x1272.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">In comparison, this is the NVIDIA designed Grace CPU with its Scalable Coherency Fabric [Source: NVIDIA, <a href="https://www.hc34.hotchips.org/assets/program/conference/day2/ADAS%20and%20Grace/HC2022.NVIDIA%20Grace.JonathonEvans.v5.pdf#:~:text=Up%20to%20512GB%20of%20LPDDR5x%20memory%20&#9642;,&#9642;%20But%20why%20LPPDR?%20Remember%20the%20Superchips?">Hot Chips 2024</a>]</figcaption></figure></div><p></p><h3>Memory subsystem</h3><p>The <strong>Blackwell iGPU die</strong>, as expected, is based on NVIDIA&#8217;s Blackwell GPU architecture. It doesn&#8217;t have dedicated memory of its own. Instead, it connects to the system&#8217;s <strong>LPDDR5X DRAM</strong> through <strong>memory controllers located in the MediaTek CPU die</strong>. The<strong> C2C interconnect</strong> provides around <strong>600 GB/s of aggregate bandwidth</strong>, more than enough for the iGPU to access the full system memory bandwidth.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lu_b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lu_b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 424w, https://substackcdn.com/image/fetch/$s_!lu_b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 848w, https://substackcdn.com/image/fetch/$s_!lu_b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!lu_b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lu_b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png" width="334" height="520.4081081081081" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1153,&quot;width&quot;:740,&quot;resizeWidth&quot;:334,&quot;bytes&quot;:312407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176499817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lu_b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 424w, https://substackcdn.com/image/fetch/$s_!lu_b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 848w, https://substackcdn.com/image/fetch/$s_!lu_b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!lu_b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F540ab6f3-2743-4a5d-b085-e6e714410cc3_740x1153.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">GPU accesses LPDDR5X memory through controllers in the CPU die [Source: NVIDIA, Hot Chips 2025]</figcaption></figure></div><p>This kind of data flow, where the GPU accesses memory attached to the CPU in a <strong>unified fashion</strong>, isn&#8217;t new. It&#8217;s an approach we saw in Grace Hopper presentations, and the DGX Spark simply extends that same design philosophy.</p><p>Here are 2 slides from the Grace CPU and Grace Hopper architecture document, where this unified memory model is emphasized.</p><ul><li><p>The GPU can access <strong>LPDDR5X memory over NVLink C2C</strong> in a <strong>coherent</strong> and transparent manner.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dYZS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dYZS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 424w, https://substackcdn.com/image/fetch/$s_!dYZS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 848w, https://substackcdn.com/image/fetch/$s_!dYZS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 1272w, https://substackcdn.com/image/fetch/$s_!dYZS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dYZS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png" width="1456" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:143231,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176499817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dYZS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 424w, https://substackcdn.com/image/fetch/$s_!dYZS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 848w, https://substackcdn.com/image/fetch/$s_!dYZS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 1272w, https://substackcdn.com/image/fetch/$s_!dYZS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65220240-3d0d-412b-bd38-759ba7b87719_1638x614.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA, <a href="https://resources.nvidia.com/en-us-hpc-ai/gh200-grace-hopper">Grace Hopper Arch Document</a></figcaption></figure></div><ul><li><p>It&#8217;s more efficient to have <strong>4&#215; Grace + 4&#215; Hopper</strong> nodes rather than <strong>16&#215; Hopper</strong>, to reach the <strong>2.5 TB</strong> memory footprint required for training.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TuSr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TuSr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 424w, https://substackcdn.com/image/fetch/$s_!TuSr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 848w, https://substackcdn.com/image/fetch/$s_!TuSr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 1272w, https://substackcdn.com/image/fetch/$s_!TuSr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TuSr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:219572,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176499817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TuSr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 424w, https://substackcdn.com/image/fetch/$s_!TuSr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 848w, https://substackcdn.com/image/fetch/$s_!TuSr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 1272w, https://substackcdn.com/image/fetch/$s_!TuSr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4e3871-3f4b-4473-8aa8-a1c6ea31d758_2574x1432.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: NVIDIA, <a href="https://www.hc34.hotchips.org/assets/program/conference/day2/ADAS%20and%20Grace/HC2022.NVIDIA%20Grace.JonathonEvans.v5.pdf#:~:text=Up%20to%20512GB%20of%20LPDDR5x%20memory%20&#9642;,&#9642;%20But%20why%20LPPDR?%20Remember%20the%20Superchips?">Hot Chips 2024</a></figcaption></figure></div><h4>GB10 Memory Complex</h4><p>Regarding the memory subsystem itself:</p><ul><li><p>The <strong>memory controllers</strong> are part of the <strong>MediaTek CPU IP</strong>.</p></li><li><p>There are <strong>8 LPDDR5X DRAM packages</strong> soldered to the board, each with a <strong>32-bit interface</strong>, for a <strong>total interface width of 256 bits</strong>.</p></li><li><p>The <strong>datasheet</strong> lists total memory bandwidth at <strong>273 GB/s</strong>, which corresponds to a memory speed of <strong>8.533 Gbps</strong>. </p><pre><code>(256 bits &#215; 8.533 Gbps)/8 = 273 GB/s</code></pre></li><li><p>In the <strong>Hot Chips</strong> presentation, the memory clock was specified as <strong>9.4 Gbps</strong> (300 GB/s of bandwidth). Suggesting some capability to overclock, perhaps?</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_bj7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_bj7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_bj7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_bj7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_bj7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_bj7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg" width="979" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:979,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:273632,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/176499817?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_bj7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_bj7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_bj7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_bj7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6ff2d39-aec6-4996-bb28-1d2eaaa4d819_979x800.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">DGX Spark GB10 Board [Source: <a href="https://www.servethehome.com/nvidia-dgx-spark-review-the-gb10-machine-is-so-freaking-cool/2/">ServeTheHome</a>]</figcaption></figure></div><h1>Methodology</h1><p>In his <strong>Hot Chips</strong> presentation, <strong>Andi Skende</strong>, GB10&#8217;s chief architect, spent considerable time detailing the <strong>verification strategies</strong> used in developing the chip. </p><p>Here are few excerpts from the talk, with my comments in italics:</p><blockquote><ul><li><p>We had to do very <strong>extensive performance modeling of the GPU IP traffic into MediaTek&#8217;s memory controller subsystem</strong> to make sure we achieve the best efficiency while also giving the CPU core reasonable latency. It is a challenging problem to balance UMA <em>(Unified Memory Access)</em> architecture where latency and efficiency are competing with each other. </p></li><li><p>We followed a very strict IP model <strong>when delivering our IP to MediaTek for integration into their SoC</strong>, IP such as the C2C and Display. <em>(NVIDIA sent over their IPs to MediaTek. So, MediaTek closed timing, did synthesis, and all backend work for the CPU die?)</em></p></li><li><p>On the verification side, we took a multi-prong approach, we used BFMs (<em>Bus Functional Models</em>) where possible and a <strong>hierarchical approach to verification.</strong> </p></li><li><p>We did co-simulation for very complex features that required end-to-end simulation by <strong>bringing the 2 dies together into a co-simulation platform</strong>. <strong>CPU to GPU</strong> <strong>coherency</strong> is one such feature that requires a lot of verification and co-simulation to <strong>make sure you&#8217;ve covered all the corner cases, there are no race conditions, no deadlocks, etc</strong>. </p></li><li><p>Last but not the least <strong>we relied on emulation</strong> very heavily, to bring together MediaTek&#8217;s hardware, NVIDIA hardware, firmware and software, working together pre-silicon. Where we booted the whole OS including running full app frames before we taped-out, and <strong>all that enabled us to tape-out and go to production with first silicon, A0 silicon, one tape-out no respins</strong>.</p></li></ul><p><em><strong>Andi Skende, Hot Chips 2025</strong></em></p><div><hr></div><p><em>Chiplog Note: Simulation and emulation are pre-silicon verification strategies. Simulation can only be done on smaller portions of the design and aperture of time for which the design can be checked is small, in the order of milliseconds. </em></p><p><em>In emulation, larger portions of the design are synthesized and run on dedicated hardware like <a href="https://www.cadence.com/en_US/home/tools/system-design-and-verification/emulation-and-prototyping/palladium.html">Cadence&#8217;s Palladium</a> or <a href="https://eda.sw.siemens.com/en-US/ic/verification-and-validation/hardware-assisted-verification/">Siemens Mentor Graphics&#8217; Veloce</a>. The hardware and licenses for emulation itself cost millions of dollars.</em></p></blockquote><p>Executing collaborations of this scale and shipping A0 silicon is an admirable achievement, even for a company like NVIDIA with virtually unlimited resources.</p><h1>Conclusion</h1><p>As expected, there&#8217;s been plenty of discussion in forums and comment sections about whether the <strong>DGX Spark</strong> justifies its <strong>$4,000 price tag</strong>, and how it stacks up against the <strong>RTX 5090</strong> or an equivalently configured <strong>Apple Mac Studio</strong>. But regardless of raw performance metrics, this project is a <strong>major win for NVIDIA</strong>.</p><p>Designing this chip entirely in-house would have required a dedicated team and measurably more engineering resources, likely pulling focus away from NVIDIA&#8217;s main high-end SoC efforts. Instead, by <strong>co-designing with MediaTek</strong>, leveraging their <strong>CPU</strong> and <strong>memory controller IPs</strong>, and establishing a <strong>collaborative workflow</strong> that lead to a successful <strong>A0 tape-out</strong> is a significant feather in NVIDIA&#8217;s cap. </p><p>NVIDIA can now pursue the DGX Spark product line and build successors to the GB10 with lesser internal overhead.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/p/analysis-of-nvidia-dgx-sparks-gb10?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/p/analysis-of-nvidia-dgx-sparks-gb10?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h1>Support Chiplog</h1><p><em>Chiplog is a reader-support publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</em> </p><p><em>Right now memberships are <strong>50% off.</strong> That&#8217;s $37.5/year (~$3/month).</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?coupon=cee2edcc&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?coupon=cee2edcc"><span>Subscribe</span></a></p>]]></content:encoded></item><item><title><![CDATA[5 different ways DRAM and Compute are integrated]]></title><description><![CDATA[Examples of how GDDR, DDR, LPDDR, HBM and 3D DRAM are integrated with compute]]></description><link>https://www.chiplog.io/p/5-different-ways-dram-and-compute</link><guid isPermaLink="false">https://www.chiplog.io/p/5-different-ways-dram-and-compute</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 05 Oct 2025 20:56:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6d524942-6c4b-46ec-bca3-e7baff8e10b1_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the last two posts, we dug into the <a href="https://www.chiplog.io/p/fundamental-guide-to-understanding-880">fundamental building blocks of DRAM</a> and got into the weeds with <a href="https://www.chiplog.io/p/fundamental-guide-to-understanding">DRAM timing parameters and how it affects performance</a>. Today&#8217;s post is lighter and more visual. We&#8217;ll break down <strong>how modern systems are built with different types of DRAM</strong>: DDR, GDDR, LPDDR, HBM, 3D DRAM and eDRAM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oOWo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oOWo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 424w, https://substackcdn.com/image/fetch/$s_!oOWo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 848w, https://substackcdn.com/image/fetch/$s_!oOWo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!oOWo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oOWo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png" width="1456" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b851a5f0-8bbe-4806-a83b-08ebae18db72_4084x1364.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oOWo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 424w, https://substackcdn.com/image/fetch/$s_!oOWo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 848w, https://substackcdn.com/image/fetch/$s_!oOWo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!oOWo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc44b3f72-7965-4ccb-ac87-bf4d86556622_4084x1364.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">DRAM (in blue) relative to the logic die</figcaption></figure></div><p>I&#8217;ve categorized the sections based on where the memory physically sits relative to the compute/processor die:</p><ol><li><p><strong>DRAM on the Board</strong> &#8211; Memory soldered on the PCB. Common with DDR, GDDR, and sometimes LPDDR.</p></li><li><p><strong>DRAM on Package</strong> &#8211; Memory-on-Package (MoP) and Package-on-Package (PoP) configurations, used with LPDDR.</p></li><li><p><strong>DRAM on Interposer</strong> &#8211; Found with HBM.</p></li><li><p><strong>DRAM on Die</strong> &#8211; Early look at 3D-DRAM. (<em><a href="https://www.d-matrix.ai">d-Matrix</a>, the company I work for, had a big announcement on this just last week.</em>)</p></li><li><p><strong>DRAM in Die</strong> &#8211; Embedded DRAM.</p></li></ol><h2><code>&#128274; </code>For Members</h2><ul><li><p>A brief comment on <strong>hybrid bonding</strong>.</p></li><li><p><strong>Deep dive into NVIDIA Grace CPU&#8217;s LPDDR5X memory complex</strong> </p><ul><li><p>The <strong>Grace CPU is arguably more interesting</strong> than Hopper or Blackwell GPUs. </p></li><li><p>Grace uses <strong>special off-catalog DRAM components</strong>, and I&#8217;ll walk through why that is and what makes them unique.</p></li><li><p><strong>NVIDIA is exercising its special relationship with Micron</strong>, and the topology is changing from the Grace-Hopper (GH) super chip to the latest Grace-Blackwell (GB) motherboards.</p></li></ul></li><li><p>A comparison of <strong>NVIDIA Hopper versus Blackwell HBM complex</strong>.</p></li></ul><p><em>Chiplog is a reader supported newsletter. Right now memberships are <strong>25% off.</strong> That&#8217;s $74/year (~$6/month).</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/fall25&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/fall25"><span>Subscribe</span></a></p><blockquote><h4>Acknowledgement</h4><p><em>This article wouldn&#8217;t have been possible without the excellent images from <strong><a href="https://www.techpowerup.com">TechPowerUp</a></strong> and <strong><a href="https://www.servethehome.com">ServeTheHome</a></strong>. Many thanks to them for their fantastic work.</em></p></blockquote><div><hr></div><h1>DRAM on Board (<em>DDR &amp; GDDR)</em></h1><p>This type of memory integration is the most common. We see it in laptops, desktops, and servers. In this setup, the DRAM package is either soldered directly onto the PCB or first mounted onto a smaller daughter card (a <strong>DIMM module</strong>) that plugs into the main PCB through a socket. The processor and memory communicate through traces routed across the motherboard. Here are a few examples.</p><h4>Example 1: GDDR in NVIDIA GeForce cards</h4><p>Below are annotated images from three generations of NVIDIA graphics cards. In each case, GDDR memory is soldered directly onto the board. The DRAM devices are <strong>width-cascaded</strong>, forming a wide memory interface that the GPU can access in parallel. </p><ol><li><p>NVIDIA GTX 780 Ti with GDDR5</p></li><li><p>NVIDIA RTX 4080/4090 with GDDR6X</p></li><li><p>NVIDIA RTX 5080/5090 with GDDR7</p></li></ol><p>The <strong>GTX 780 Ti</strong> is fed by <strong>12 GDDR5 chips</strong>, each with a <strong>32-bit data bus</strong>. Together, these 12 devices provide the GPU with a <strong>384-bit wide memory interface.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fb9m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fb9m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 424w, https://substackcdn.com/image/fetch/$s_!Fb9m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 848w, https://substackcdn.com/image/fetch/$s_!Fb9m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 1272w, https://substackcdn.com/image/fetch/$s_!Fb9m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fb9m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png" width="1456" height="617" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fb9m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 424w, https://substackcdn.com/image/fetch/$s_!Fb9m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 848w, https://substackcdn.com/image/fetch/$s_!Fb9m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 1272w, https://substackcdn.com/image/fetch/$s_!Fb9m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc838354f-a017-4de2-8cb4-d72e2fa85e9f_4880x2068.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">NVIDIA GeForce GTX 780 Ti [Source: <a href="https://www.techpowerup.com/gpu-specs/geforce-gtx-780-ti.c2512">TechPowerup</a>]</figcaption></figure></div><p>Next, let&#8217;s compare the <strong>RTX 4080</strong> and <strong>RTX 4090</strong>.</p><ul><li><p>The <strong>RTX 4080</strong> uses <strong>8 GDDR6X devices</strong>, each with a <strong>32-bit data bus</strong>, giving the GPU a <strong>256-bit wide memory interface</strong>. With each device providing <strong>2 GB of capacity</strong>, the card has a total of <strong>16 GB</strong> of memory.</p></li><li><p>In comparison, the <strong>RTX 4090</strong> has <strong>12 GDDR6X devices</strong> for a <strong>384-bit interface</strong>, with a total capacity of <strong>24 GB</strong>.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cvKS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cvKS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 424w, https://substackcdn.com/image/fetch/$s_!cvKS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 848w, https://substackcdn.com/image/fetch/$s_!cvKS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 1272w, https://substackcdn.com/image/fetch/$s_!cvKS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cvKS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png" width="1456" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bab20356-39d3-4110-bae2-7ae3161444e3_2430x1281.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3451859,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab20356-39d3-4110-bae2-7ae3161444e3_2430x1281.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cvKS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 424w, https://substackcdn.com/image/fetch/$s_!cvKS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 848w, https://substackcdn.com/image/fetch/$s_!cvKS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 1272w, https://substackcdn.com/image/fetch/$s_!cvKS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd60afbe7-68ca-499c-ad42-2d1d2e1928c9_2430x1281.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Annotation by Chiplog. Image source TechPowerUp - <a href="https://www.techpowerup.com/gpu-specs/geforce-rtx-4080.c3888">4080</a>, <a href="https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889">4090</a></figcaption></figure></div><p>Finally, the <strong>latest generation RTX 5080 and 5090</strong> move to <strong>GDDR7</strong>. The 5080 is equipped with <strong>8 devices</strong> for a <strong>256-bit bus</strong>, while the 5090 doubles that with <strong>16 devices</strong>, delivering a massive <strong>512-bit bus width</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!viqI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!viqI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 424w, https://substackcdn.com/image/fetch/$s_!viqI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 848w, https://substackcdn.com/image/fetch/$s_!viqI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 1272w, https://substackcdn.com/image/fetch/$s_!viqI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!viqI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png" width="1456" height="865" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:865,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5526264,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!viqI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 424w, https://substackcdn.com/image/fetch/$s_!viqI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 848w, https://substackcdn.com/image/fetch/$s_!viqI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 1272w, https://substackcdn.com/image/fetch/$s_!viqI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3369a5c9-0aea-4558-a7a5-9cac3810dfb7_2405x1429.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Annotation by Chiplog. Image source TechPowerUp - <a href="https://www.techpowerup.com/gpu-specs/geforce-rtx-5080.c4217">5080</a>, <a href="https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216">5090</a></figcaption></figure></div><blockquote><p><em>While it may seem obvious, it&#8217;s worth emphasizing that every additional DRAM device comes at a cost. It consumes <strong>PCB area</strong>, requires <strong>dedicated memory controller logic on the processor die</strong>, and increases <strong>routing complexity</strong> on the board.</em></p><p><em>Eventually, physical limits take over, you either run out of PCB real estate or processor die area to accommodate more controllers. These constraints ultimately define the <strong>maximum bus width</strong> and the <strong>upper bound on memory capacity</strong> for a given graphics card.</em></p></blockquote><h4>Example 2: DDR5 in Intel 6700/6500 SuperMicro cards</h4><p>From the graphics cards examples, we saw that even at the very high end, memory tops out at around <strong>32 GB of DRAM capacity</strong>. But 32 GB isn&#8217;t much if your application demands more. So how do you scale beyond that?</p><p>This is where <strong>memory modules</strong> come in, a form of integration more common in CPUs. For example, this <strong>Micron</strong> <strong>128 GB DDR5 RDIMM</strong>, priced at roughly <strong>$1,000</strong>,  packs <strong>40 DDR5 DRAM devices</strong> (20 soldered on the front and 20 on the back). This dual-sided design is what gives it the name <strong>Dual Inline Memory Module or DIMM</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fiwm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fiwm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 424w, https://substackcdn.com/image/fetch/$s_!Fiwm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 848w, https://substackcdn.com/image/fetch/$s_!Fiwm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 1272w, https://substackcdn.com/image/fetch/$s_!Fiwm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fiwm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png" width="1024" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:240,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Micron 128GB DDR5-5600 RDIMM 2Rx4 CL46&quot;,&quot;title&quot;:&quot;Micron 128GB DDR5-5600 RDIMM 2Rx4 CL46&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Micron 128GB DDR5-5600 RDIMM 2Rx4 CL46" title="Micron 128GB DDR5-5600 RDIMM 2Rx4 CL46" srcset="https://substackcdn.com/image/fetch/$s_!Fiwm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 424w, https://substackcdn.com/image/fetch/$s_!Fiwm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 848w, https://substackcdn.com/image/fetch/$s_!Fiwm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 1272w, https://substackcdn.com/image/fetch/$s_!Fiwm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03ab1146-4ae9-4a6f-befe-384cbb24a20d_1024x240.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">128GB RDIMM (Source: <a href="https://www.crucial.com/memory/server-ddr5/mtc40f2047s1rc56br?srsltid=AfmBOop_z3gmpZASNI1K1419iTAM2oPDJf_EvMplIw_miTZ6RoCn1v-8">Micron</a>)</figcaption></figure></div><p>These modules then plug into the system&#8217;s motherboard. For instance, a S<a href="https://www.supermicro.com/en/products/motherboard/x14sbw-tf">upermicro X14 server board</a> designed for Intel&#8217;s latest 6700-series CPUs supports <strong>8 DIMM slots</strong>. Fully populated with 128 GB sticks, the system reaches a total capacity of <strong>1 TB (1024 GB) of ECC protected DRAM</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k6uv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k6uv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 424w, https://substackcdn.com/image/fetch/$s_!k6uv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 848w, https://substackcdn.com/image/fetch/$s_!k6uv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 1272w, https://substackcdn.com/image/fetch/$s_!k6uv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k6uv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png" width="965" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:965,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1131939,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k6uv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 424w, https://substackcdn.com/image/fetch/$s_!k6uv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 848w, https://substackcdn.com/image/fetch/$s_!k6uv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 1272w, https://substackcdn.com/image/fetch/$s_!k6uv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f6e8451-6c7d-45d6-ace6-61ad18c2248d_965x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"> Supermicro X14 Server Board for Intel 6700/6500 CPUs . Image Source: <a href="https://www.supermicro.com/en/products/motherboard/x14sbw-tf">Supermicro</a></figcaption></figure></div><p>The scaling method here is different from GPUs. While GPUs expand capacity by <strong>width-cascading</strong> DRAM devices (wider buses), DIMMs expand via <strong>depth-cascading</strong>, adding more memory devices behind the same interface width. This approach increases <strong>total memory capacity</strong> without widening the bus. </p><p>The <strong>Intel 6700 CPU</strong>, for instance, supports <strong>8 DDR5 memory channels</strong>, and system capacity can be scaled by populating those channels with different DIMM sizes.</p><blockquote><p><em><strong>GPUs optimize for bandwidth, CPUs optimize for capacity.</strong> The type of integration reflects the workload priorities. Fast, parallel access for GPUs versus large, expandable memory pools for CPUs.</em></p></blockquote><h4>Types of DIMMs</h4><p>There are several different types of DIMMs to choose from, depending on <strong>form factor</strong> and <strong>functionality</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zVDJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zVDJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 424w, https://substackcdn.com/image/fetch/$s_!zVDJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 848w, https://substackcdn.com/image/fetch/$s_!zVDJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 1272w, https://substackcdn.com/image/fetch/$s_!zVDJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zVDJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png" width="1456" height="352" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:352,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102065,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zVDJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 424w, https://substackcdn.com/image/fetch/$s_!zVDJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 848w, https://substackcdn.com/image/fetch/$s_!zVDJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 1272w, https://substackcdn.com/image/fetch/$s_!zVDJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8770de4-4d92-45a7-b097-cd66cf7b5df5_1645x398.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Types of DIMMs [Source: <a href="https://www.micron.com/content/dam/micron/global/secure/products/white-paper/dram-module-reference-guide-en.pdf">Micron</a>]</figcaption></figure></div><ul><li><p><strong>Form factor:</strong> A standard DIMM measures <strong>133.35 mm &#215; 31.25 mm</strong>, whereas a <strong>SODIMM (Small Outline DIMM)</strong> is more compact at <strong>69.6 mm &#215; 30 mm</strong>, making it suitable for laptops and other space-constrained devices.</p></li><li><p><strong>Functionality:</strong> Different applications impose different requirements. Laptops are cost-sensitive, so <strong>unbuffered DIMMs (UDIMMs)</strong> are usually sufficient. Servers in data centers, however, demand higher reliability and capacity, so they use <strong>ECC-protected RDIMMs (Registered DIMMs)</strong>. In RDIMMs, a chip called the <strong>RCD (Register Clock Driver)</strong>, located on the DIMM module, buffers and regenerates critical signals, like clocks, before reaching the DRAM components. This <strong>improves signal integrity</strong> and <strong>allows for larger</strong>, more stable <strong>memory</strong> configurations.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MwEl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MwEl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 424w, https://substackcdn.com/image/fetch/$s_!MwEl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 848w, https://substackcdn.com/image/fetch/$s_!MwEl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 1272w, https://substackcdn.com/image/fetch/$s_!MwEl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MwEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png" width="1456" height="307" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:307,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95097,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MwEl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 424w, https://substackcdn.com/image/fetch/$s_!MwEl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 848w, https://substackcdn.com/image/fetch/$s_!MwEl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 1272w, https://substackcdn.com/image/fetch/$s_!MwEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac6268c0-ccde-4887-8c17-5b5e9f24e714_1630x344.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.micron.com/content/dam/micron/global/secure/products/white-paper/dram-module-reference-guide-en.pdf">Micron DRAM Module Reference Guide</a></figcaption></figure></div><p>Beyond standard offerings, specialized vendors such as <strong><a href="http://www.virtium.com">Virtium</a></strong> and <strong><a href="https://www.smartm.com">Smart Modular</a></strong> provide <strong>custom DIMM modules</strong>. They can even source DRAM components from vendors like Micron or Samsung and build modules tailored to specific needs.</p><ul><li><p>For example, <strong>Virtium</strong> manufactures <strong>VLP (Very Low Profile)</strong> and <strong>ULP (Ultra Low Profile)</strong> DIMMs to reduce z-height in space-constrained systems. These are not available in Micron&#8217;s catalog.</p></li><li><p><strong>Smart Modular</strong>, on the other hand, offers <strong>Liquid Immersion DDR5 modules</strong> and <strong>ECC &#8220;C&#8221; SODIMMs</strong>, which regenerate and amplify the host clock signal for even stronger signal integrity.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SP-0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SP-0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 424w, https://substackcdn.com/image/fetch/$s_!SP-0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 848w, https://substackcdn.com/image/fetch/$s_!SP-0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 1272w, https://substackcdn.com/image/fetch/$s_!SP-0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SP-0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png" width="1456" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1414018,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SP-0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 424w, https://substackcdn.com/image/fetch/$s_!SP-0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 848w, https://substackcdn.com/image/fetch/$s_!SP-0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 1272w, https://substackcdn.com/image/fetch/$s_!SP-0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff87ceabe-ea4a-4151-8255-7a2f9f6142cc_2226x574.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.virtium.com/ddr5/">Virtium DIMM Product Guide</a></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jTpx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jTpx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 424w, https://substackcdn.com/image/fetch/$s_!jTpx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 848w, https://substackcdn.com/image/fetch/$s_!jTpx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 1272w, https://substackcdn.com/image/fetch/$s_!jTpx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jTpx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png" width="1456" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:408935,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jTpx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 424w, https://substackcdn.com/image/fetch/$s_!jTpx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 848w, https://substackcdn.com/image/fetch/$s_!jTpx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 1272w, https://substackcdn.com/image/fetch/$s_!jTpx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd968828f-39c2-4f96-ba43-0bf220e75f9d_2086x694.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.smartm.com/product/list/ddr5">Smart Modular DIMM Product Guide</a></figcaption></figure></div><h1>DRAM on Package (<em>LPDDR</em>)</h1><p>Like GDDR and DDR5, <strong>LPDDR memory</strong> can be directly soldered onto the PCB or mounted on a <strong>LPCAMM module </strong>(Low Power Compression Attached Memory Module) and then attached to the motherboard. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9UfB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9UfB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9UfB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9UfB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9UfB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9UfB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg" width="488" height="292.8" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1280,&quot;resizeWidth&quot;:488,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9UfB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9UfB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9UfB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9UfB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5a1d744-2100-452f-a51a-4a43f875052b_1280x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Source: <a href="https://www.micron.com/products/memory/dram-components/lpddr-components/lpcamm2">Micron</a></figcaption></figure></div><p>However, the more common and increasingly important approach is the style used in Apple&#8217;s <strong>M-series</strong> and <strong>A-series SoCs</strong>, bringing LPDDR DRAM closer to the processor through advanced packaging.</p><h4>Memory-on-Package MoP (or On-Package Memory OPM)</h4><p>In <strong>Intel&#8217;s Meteor/Lunar Lake</strong> and <strong>Apple&#8217;s M-series</strong> chips, LPDDR DRAM and the logic die <strong>share the same package substrate</strong>. The dies are connected through wires routed within the substrate itself.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H3N_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H3N_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 424w, https://substackcdn.com/image/fetch/$s_!H3N_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 848w, https://substackcdn.com/image/fetch/$s_!H3N_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!H3N_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H3N_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png" width="2192" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:2192,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2424758,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdab2abd2-2031-4a11-8efc-1c3677ebea10_2192x1040.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H3N_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 424w, https://substackcdn.com/image/fetch/$s_!H3N_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 848w, https://substackcdn.com/image/fetch/$s_!H3N_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!H3N_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0b3a8a1-03e4-423d-bb8f-f1bb6e5aa514_2192x1040.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Source: Apple</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RdFg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RdFg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RdFg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RdFg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RdFg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RdFg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg" width="581" height="386.955078125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:682,&quot;width&quot;:1024,&quot;resizeWidth&quot;:581,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Intel Lunar Lake Core Ultra processor&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Intel Lunar Lake Core Ultra processor" title="Intel Lunar Lake Core Ultra processor" srcset="https://substackcdn.com/image/fetch/$s_!RdFg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RdFg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RdFg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RdFg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbc926eb-5930-44da-9458-75adc5b36829_1024x682.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Intel Lunar Lake with MoP [Source: <a href="https://www.pcworld.com/article/2350967/lunar-lake-deep-dive-intels-new-ai-laptop-cpu-is-utterly-different.html">PCWorld</a>]</figcaption></figure></div><p>Tightly integrating the memory and logic die sacrifices the flexibility of PCB-based DRAM (where modules can be swapped or scaled), but it saves valuable <strong>PCB area</strong>. That space can then be repurposed, for example, to fit a larger battery or to reduce the number of PCB layers, lowering assembly and cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!94Hb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!94Hb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 424w, https://substackcdn.com/image/fetch/$s_!94Hb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 848w, https://substackcdn.com/image/fetch/$s_!94Hb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!94Hb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!94Hb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg" width="867" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:867,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!94Hb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 424w, https://substackcdn.com/image/fetch/$s_!94Hb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 848w, https://substackcdn.com/image/fetch/$s_!94Hb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!94Hb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc58b599-e2c7-4602-aeec-106e6e5efb75_867x482.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Apple MacBook Air with M2 [Source: <a href="https://www.techpowerup.com/ssd-specs/apple-macbook-air-m2-256-gb.d1412">TechPowerUp</a>]</figcaption></figure></div><h4>Package-on-Package (PoP)</h4><p>Chips that go into phones, such as Apple&#8217;s <strong>A-series processors</strong>, Qualcomm <strong>Snapdragon,</strong> and Samsung <strong>Exynos,</strong> take integration a step further. Here, the DRAM package is physically stacked on top of the SoC. This <strong>PoP design</strong> tightens the coupling between compute and memory, minimizing the footprint even further.</p><p>Below is a snippet from <a href="https://www.techinsights.com/products/apq-2109-801">TechInsights&#8217; A14 Bionic package breakdown</a>, which illustrates this approach in practice:</p><blockquote><p><em>[A14 Bionic uses] package-on-package (PoP) assembly using TSMC&#8217;s integrated fan out (InFO) technology. </em></p><p><em>The <strong>top package contains four DRAM dies configured in two stacks of two dies</strong>. The memory is wire bonded to a printed wiring board (PWB) ball grid array (BGA). The BGA is connected to the bottom of the package by through molding vias (TMVs).</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PLd-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PLd-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 424w, https://substackcdn.com/image/fetch/$s_!PLd-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 848w, https://substackcdn.com/image/fetch/$s_!PLd-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 1272w, https://substackcdn.com/image/fetch/$s_!PLd-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PLd-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png" width="1200" height="417" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:417,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32218,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PLd-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 424w, https://substackcdn.com/image/fetch/$s_!PLd-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 848w, https://substackcdn.com/image/fetch/$s_!PLd-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 1272w, https://substackcdn.com/image/fetch/$s_!PLd-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade4abd8-5f5b-4f07-af32-0c0b6cd3eb14_1200x417.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">TSMC&#8217;s Integrated Fan-Out Package on Package (InFO-PoP) [Source: <a href="https://3dfabric.tsmc.com/english/dedicatedFoundry/technology/InFO.htm">TSMC</a>]</figcaption></figure></div><h1>DRAM on Interposer (<strong>HBM</strong>)</h1><p>With HBM, the integration gets even tighter than LPDDR&#8217;s Memory-on-Package. Instead of sitting on the substrate (as in MoP), the compute die and DRAM stacks are placed side by side on an interposer. A setup commonly called <em>2.5D integration</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fVEI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fVEI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 424w, https://substackcdn.com/image/fetch/$s_!fVEI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 848w, https://substackcdn.com/image/fetch/$s_!fVEI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 1272w, https://substackcdn.com/image/fetch/$s_!fVEI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fVEI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png" width="1456" height="593" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c294d398-84a6-49b8-9997-124bd67f984d_1737x708.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:593,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:420960,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc294d398-84a6-49b8-9997-124bd67f984d_1737x708.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fVEI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 424w, https://substackcdn.com/image/fetch/$s_!fVEI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 848w, https://substackcdn.com/image/fetch/$s_!fVEI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 1272w, https://substackcdn.com/image/fetch/$s_!fVEI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794d0601-e74d-4296-add5-e6ae9c7f2aca_1737x708.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Annotation by Chiplog. Image source: <a href="https://en.wikipedia.org/wiki/High_Bandwidth_Memory">Wikipedia</a></figcaption></figure></div><p>The biggest advantage of using an interposer is routing density. Interposers support much finer connections than substrates. The compute die and HBM stack connect through <strong>micro-bumps</strong>, while substrates rely on larger bumps called <strong>C4 </strong>(Controlled-Collapse Chip Connections). To put numbers on it, micro-bump pitch is around 40 &#956;m, compared to 400 &#956;m (0.4mm) for C4 bumps, an order of magnitude tighter. That tighter pitch means you can route far more wires across the interface.</p><p>This density shows up in bandwidth. HBM3e provides <strong>1024 data lanes per stack</strong>, and HBM4 will double that to <strong>2048</strong>. For example, NVIDIA&#8217;s Blackwell GPU integrates <strong>eight stacks of HBM3e</strong>, giving a combined bus width of 8,192 lanes. Each stack can deliver roughly <strong>1 TB/s</strong> of bandwidth, which adds up to a staggering <strong>~8 TB/s total</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tnej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tnej!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 424w, https://substackcdn.com/image/fetch/$s_!Tnej!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 848w, https://substackcdn.com/image/fetch/$s_!Tnej!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 1272w, https://substackcdn.com/image/fetch/$s_!Tnej!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tnej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png" width="1456" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3123221,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tnej!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 424w, https://substackcdn.com/image/fetch/$s_!Tnej!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 848w, https://substackcdn.com/image/fetch/$s_!Tnej!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 1272w, https://substackcdn.com/image/fetch/$s_!Tnej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8af3372-d6d5-4cd4-8a7c-664d6e80e29d_1467x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>DRAM on Die (3D DRAM)</h1><p>The next step beyond HBM&#8217;s 2.5D architecture is <strong>3D DRAM</strong>. At <strong>d-Matrix</strong> (<em>where I work</em>), we had <a href="https://www.d-matrix.ai/scaling-ai-inference-with-3dimc/">an exciting announcement about this</a>.</p><p>In this approach, the DRAM is attached <em>directly</em> to the logic die like the interposer. The image below (from our public release) illustrates this integration.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HIFQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HIFQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HIFQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HIFQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HIFQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HIFQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg" width="1320" height="624" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:624,&quot;width&quot;:1320,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;d-Matrix Pavehawk&quot;,&quot;title&quot;:&quot;pavehawk-3dimc-optimized&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="d-Matrix Pavehawk" title="pavehawk-3dimc-optimized" srcset="https://substackcdn.com/image/fetch/$s_!HIFQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HIFQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HIFQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HIFQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78220395-1ac8-4ac5-94c8-62baf7a8617b_1320x624.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">3D DRAM [Image source: <a href="https://www.d-matrix.ai/scaling-ai-inference-with-3dimc/">d-Matrix</a>]</figcaption></figure></div><p>As some of the coverage around our announcement noted, 3D DRAM holds tremendous promise for pushing bandwidth and efficiency even further.</p><blockquote><h4><a href="https://siliconangle.com/2025/08/25/d-matrix-reveals-plan-scale-ais-memory-wall-3d-dram-based-chip-architecture/">Silicon Angle</a></h4><p><em>According to Bhoja [CTO], by combining 3D DRAM with its specialized interconnects, Raptor will be able to smash through the memory wall and unlock significant gains in terms of AI performance and cost-efficiency. He said the company is targeting an ambitious 10-times improvement in memory bandwidth and 10-times better energy efficiency with Raptor when running inference workloads, compared with existing HBM4 memory technology.</em></p><p><em>&#8220;These are not incremental gains &#8212; they are step-function improvements that redefine what&#8217;s possible for inference at scale,&#8221; Bhoja said.</em></p></blockquote><p></p><blockquote><h4><a href="https://blocksandfiles.com/2025/09/02/d-matrix-building-high-bandwidth-memory-rival-for-ai-inference/">Block And Files</a></h4><p><em>&#8220;Our next-generation architecture, Raptor, will incorporate 3DIMC into its design &#8211; benefiting from what we and our customers learn from testing on Pavehawk. By stacking memory vertically and integrating tightly with compute chiplets, Raptor promises to break through the memory wall and unlock entirely new levels of performance and TCO.&#8221;</em></p></blockquote><p>I&#8217;ll leave it at that for now. For the latest updates, keep an eye on <a href="https://www.linkedin.com/company/d-matrix/posts/?feedView=all">d-Matrix</a> or follow me on <a href="https://www.linkedin.com/in/suganesh/">LinkedIn</a> and <a href="https://x.com/subbdue">X</a>.</p><h1>DRAM <em>in</em> Die (eDRAM)</h1><p>The last chip I worked on with <strong>embedded DRAM (eDRAM)</strong> was back in the late 2000s and early 2010s, when IBM was building chips on 32nm and larger nodes.</p><p>In IBM&#8217;s eDRAM, the DRAM cells are part of the logic die itself (similar to how SRAM is integrated) and are built using <strong>deep trench capacitors</strong>. That means the DRAM cells are fabricated in the same process node as the rest of the chip.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uaZU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uaZU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 424w, https://substackcdn.com/image/fetch/$s_!uaZU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 848w, https://substackcdn.com/image/fetch/$s_!uaZU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 1272w, https://substackcdn.com/image/fetch/$s_!uaZU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uaZU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png" width="300" height="207" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:207,&quot;width&quot;:300,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;IBM 45nm_branded&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="IBM 45nm_branded" title="IBM 45nm_branded" srcset="https://substackcdn.com/image/fetch/$s_!uaZU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 424w, https://substackcdn.com/image/fetch/$s_!uaZU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 848w, https://substackcdn.com/image/fetch/$s_!uaZU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 1272w, https://substackcdn.com/image/fetch/$s_!uaZU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e22bb7-e6e9-4092-99a7-55b1a6784b65_300x207.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Embedded DRAM in IBM Power 7+ (32-nm) [Image Source: <a href="https://web.archive.org/web/20150425221620/http://www.chipworks.com/en/technical-competitive-analysis/resources/blog/intel-e-dram-shows-up-in-the-wild/">Chipworks</a>]</figcaption></figure></div><p>IBM&#8217;s <strong>Power7</strong> and <strong>Power8</strong> processors used eDRAM for their L3 cache</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HxWM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HxWM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 424w, https://substackcdn.com/image/fetch/$s_!HxWM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 848w, https://substackcdn.com/image/fetch/$s_!HxWM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 1272w, https://substackcdn.com/image/fetch/$s_!HxWM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HxWM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png" width="1456" height="962" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:962,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2135388,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HxWM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 424w, https://substackcdn.com/image/fetch/$s_!HxWM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 848w, https://substackcdn.com/image/fetch/$s_!HxWM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 1272w, https://substackcdn.com/image/fetch/$s_!HxWM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29bd3af8-73f1-4d5f-ad01-8988ef48c538_1644x1086.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">L3 cache implemented using eDRAM [Source: <a href="https://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf">IBM Official</a>]</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yK-6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yK-6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 424w, https://substackcdn.com/image/fetch/$s_!yK-6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 848w, https://substackcdn.com/image/fetch/$s_!yK-6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!yK-6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yK-6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png" width="613" height="389.8612637362637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:926,&quot;width&quot;:1456,&quot;resizeWidth&quot;:613,&quot;bytes&quot;:258982,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/172834045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yK-6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 424w, https://substackcdn.com/image/fetch/$s_!yK-6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 848w, https://substackcdn.com/image/fetch/$s_!yK-6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!yK-6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d5fe9ad-f93f-4b8b-b780-cd82e9d97b15_1636x1040.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.redbooks.ibm.com/redpapers/pdfs/redp4639.pdf">IBM Official</a></figcaption></figure></div><p>You might also remember it from Microsoft&#8217;s <strong>Xbox 360</strong>, which used eDRAM for graphics. Intel&#8217;s <strong>Haswell</strong> generation also advertised eDRAM, though in that case it was implemented differently, the eDRAM was a separate chip integrated with the CPU through a <strong>Multi-Chip Module (MCM)</strong> approach.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cvk8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cvk8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Cvk8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Cvk8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Cvk8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cvk8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg" width="500" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Cvk8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Cvk8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Cvk8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Cvk8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e7cb1df-bcb7-4b6b-90b1-eee7932ecb9a_500x480.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Intel Haswell with eDRAM outside the Processor [Image Source: <a href="https://www.eetimes.com/intels-embedded-dram-new-era-of-cache-memory/?_ga">EETimes</a>]</figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/p/5-different-ways-dram-and-compute?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/p/5-different-ways-dram-and-compute?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h1>In a nutshell</h1><p>This diagram summarizes the types of DRAM integrations we&#8217;ve discussed this far.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aPDY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aPDY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 424w, https://substackcdn.com/image/fetch/$s_!aPDY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 848w, https://substackcdn.com/image/fetch/$s_!aPDY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!aPDY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aPDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png" width="1456" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84a8a244-b594-4439-a591-d124fb7a7495_4084x1364.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aPDY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 424w, https://substackcdn.com/image/fetch/$s_!aPDY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 848w, https://substackcdn.com/image/fetch/$s_!aPDY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!aPDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6d9449-c9da-47d9-aa22-1245d97b6c65_4084x1364.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Summary of DRAM integrations</figcaption></figure></div><div><hr></div><h1><code>&#128274; </code>Hybrid bonding, Grace, Hopper, Blackwell </h1><p>In the members-only section, we&#8217;ll dive into:</p><ul><li><p>A brief comment on <strong>hybrid bonding</strong>.</p></li><li><p>A <strong>deep dive into NVIDIA Grace CPU&#8217;s LPDDR5X memory complex</strong>. How NVIDIA is exercising its special relationship with Micron, and how the motherboards are evolving from the Grace-Hopper (GH) to Grace-Blackwell (GB) systems.</p></li><li><p>A <strong>comparison of NVIDIA Hopper vs. Blackwell HBM complexes</strong>.</p></li></ul><p><em>Chiplog is a member supported newsletter. Right now memberships are <strong>25% off.</strong> That&#8217;s <strong>$74/year</strong>(~$6/month).</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/fall25&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/fall25"><span>Subscribe</span></a></p>
      <p>
          <a href="https://www.chiplog.io/p/5-different-ways-dram-and-compute">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Fundamental guide to understanding DRAM performance and timing parameters]]></title><description><![CDATA[How DRAM timing parameters affect HBM, LPDDR, GDDR bandwidth. How to get the most performance out of your memory.]]></description><link>https://www.chiplog.io/p/fundamental-guide-to-understanding-880</link><guid isPermaLink="false">https://www.chiplog.io/p/fundamental-guide-to-understanding-880</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Mon, 25 Aug 2025 08:27:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vA4r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my <a href="https://www.chiplog.io/p/fundamental-guide-to-understanding">last post</a>, I stated that in order to design a robust memory subsystem, you must understand <strong>3 key dimensions of the DRAM</strong>:</p><ol><li><p><em><s>Physical</s></em><s> </s><em><s>Structure </s>- </em><a href="https://www.chiplog.io/p/fundamental-guide-to-understanding">We tackled this last time</a>, breaking down how DRAM is organized at the cell, bank, and die levels.</p></li><li><p><em>Timing Parameters - </em>Reading and Writing to DRAM is like a carefully choreographed dance. <strong>Timing parameters </strong>dictate the rules that have to be followed while accessing DRAMs and <strong>have a big effect on the performance</strong>.</p></li><li><p><em>Initialization and Calibration - </em>Over its lifetime, a DRAM device will be subject to variations in <strong>Process, Voltage </strong>and<strong> Temperatures</strong> (PVT). Memory controllers use special circuits and algorithms to periodically tune the interface so it runs reliably in all conditions.</p></li></ol><p>This post focuses on<strong> Dimension #2: DRAM timing parameters and their effect on performance.</strong> We&#8217;ll approach the topic from first principles, because this foundation is essential when deciding,</p><ul><li><p>For <em>ML architectures</em>, how best to store and access weights, activations, embedding tables, KV caches, and other critical data structures.</p></li><li><p>For <em>cyber security and networking architectures</em>, how to store hash tables.</p></li><li><p>For <em>compute offload</em>, how to design algorithms which efficiently read from memory and maximize performance.</p></li></ul><p>Specifically, we&#8217;ll cover:</p><ul><li><p>The difference between <strong>bandwidth</strong> and <strong>throughput</strong>.</p></li><li><p>A systematic exploration of the most important DRAM <strong>timing parameters</strong>, and <strong>how they shape the effective bandwidth</strong> you can extract from modern memories like HBM, LPDDR, and GDDR.</p></li><li><p>Practical insights into what makes a <strong>good memory access pattern</strong> versus a <strong>bad one</strong>.</p></li><li><p><strong>Help you develop an intuition,</strong> so you can design an efficient data storage architecture for your application.</p></li></ul><blockquote><div><hr></div><h4><code>&#128274; </code><em>For members:</em></h4><ul><li><p>We&#8217;ll take a look at <strong>two more important timing parameters</strong> - tFAW and tWTR, which play a crucial role in performance.</p></li><li><p>We&#8217;ll take a <strong>brief look at how memory controllers work</strong>, and the structures at your disposal to help maximize throughput. </p></li><li><p><strong>How</strong> companies like <strong>Meta evaluate and study the performance</strong> of their DRAM architecture.</p></li><li><p>Tools and strategies that are used to study performance.</p></li><li><p>Common mistakes made during performance estimation.</p></li></ul><p><em>Chiplog is a member supported newsletter. Right now memberships are <strong>25% off,</strong> that&#8217;s $74/year (~$6/month).</em></p><p><a href="https://www.chiplog.io/summer25">Subscribe now</a></p><div><hr></div></blockquote><h1>Bandwidth vs Throughput</h1><p><strong>Bandwidth is the theoretical peak performance</strong> that a memory can achieve. For example, in LPDDR5 the maximum bandwidth of a <a href="https://www.chiplog.io/i/170018464/memory-channels">memory channel</a> is <code>~102Gbps</code>. </p><p>Each memory channel has 16 data lanes<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, with each lane capable of supporting 6400Mbps. So <code>6400*16 = ~102Gbps</code>. This is the raw bandwidth.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rt0U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rt0U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 424w, https://substackcdn.com/image/fetch/$s_!rt0U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 848w, https://substackcdn.com/image/fetch/$s_!rt0U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 1272w, https://substackcdn.com/image/fetch/$s_!rt0U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rt0U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png" width="397" height="345.739010989011" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1268,&quot;width&quot;:1456,&quot;resizeWidth&quot;:397,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rt0U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 424w, https://substackcdn.com/image/fetch/$s_!rt0U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 848w, https://substackcdn.com/image/fetch/$s_!rt0U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 1272w, https://substackcdn.com/image/fetch/$s_!rt0U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff12bb6fe-25cf-48a6-a737-67a02d5efea6_1704x1484.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In reality, this bandwidth is not attainable. The <strong>actual data rate that can be achieved is called throughput. </strong>To understand why the attainable throughput is less than the maximum bandwidth, we need to understand <strong>DRAM timing parameters</strong>.</p><h1>What are timing parameters</h1><p>Due to the physics governing DRAM devices, certain <strong>timing constraints or delays have to be observed while performing operations on the memory</strong>. The effects of timing parameters on memory bandwidth is best understood in the context of specific memory operations. We&#8217;ll next look at the <em>Refresh</em> command and a <em>Read</em> operation.</p><h2>REFRESH command</h2><p>At the lowest level, a DRAM cell is essentially a capacitor that holds charge with a transistor acting as a switch. Since the capacitor discharges over time, it <strong>has to be </strong><em><strong>Refreshed</strong></em><strong> periodically to ensure data is not lost</strong>. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kKEc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kKEc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 424w, https://substackcdn.com/image/fetch/$s_!kKEc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 848w, https://substackcdn.com/image/fetch/$s_!kKEc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 1272w, https://substackcdn.com/image/fetch/$s_!kKEc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kKEc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png" width="353" height="220.625" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:1248,&quot;resizeWidth&quot;:353,&quot;bytes&quot;:689888,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/171200581?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kKEc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 424w, https://substackcdn.com/image/fetch/$s_!kKEc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 848w, https://substackcdn.com/image/fetch/$s_!kKEc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 1272w, https://substackcdn.com/image/fetch/$s_!kKEc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa999a3a-62bd-4808-b111-edbf5ef19296_1248x780.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>There are 2 main timing parameters related to the refresh operation.</p><ol><li><p><em><strong>tREFI</strong></em><strong> </strong><em>(<strong>Ref</strong>resh <strong>I</strong>nterval)</em><strong>:</strong> This parameter specifies <strong>how often</strong> the memory has to be refreshed so that data is not lost<em>.</em></p></li><li><p><em><strong>tRFC</strong> (Refresh Cycle Time)</em>: This parameter specifies for how much time the DRAM is <strong>unavailable</strong> once it enters Refresh.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mnv5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mnv5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 424w, https://substackcdn.com/image/fetch/$s_!mnv5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 848w, https://substackcdn.com/image/fetch/$s_!mnv5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 1272w, https://substackcdn.com/image/fetch/$s_!mnv5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mnv5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png" width="1456" height="219" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff132455-eb17-4126-a74f-cd538f709574_2114x318.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:219,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 1: tRRD timing&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 1: tRRD timing" title="Figure 1: tRRD timing" srcset="https://substackcdn.com/image/fetch/$s_!mnv5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 424w, https://substackcdn.com/image/fetch/$s_!mnv5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 848w, https://substackcdn.com/image/fetch/$s_!mnv5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 1272w, https://substackcdn.com/image/fetch/$s_!mnv5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff132455-eb17-4126-a74f-cd538f709574_2114x318.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">REFRSH timing [Source: Micron DDR4 datasheet]</figcaption></figure></div><p>For example, in case of LPDDR5, </p><ul><li><p>The memory needs to be refreshed every <strong>3.9us</strong> (<code>tREFI</code>).</p></li><li><p>Once the memory enters refresh it is unavailable for <strong>~200ns</strong> (<code>tRFC</code>).</p></li></ul><blockquote><p><em>So, if the memory is unavailable for </em><code>~200ns</code><em> every </em><code>3.9us</code><em> (i.e., </em><code>3900ns</code><em>), then <strong>we lose nearly 5% of the bandwidth JUST to Refresh.</strong></em></p><p><code>200/(3900+200) = 4.8%</code></p></blockquote><h2>Simple READ operation</h2><p>Going up one level, DRAM cells are arranged in a <strong>grid of Rows and Columns</strong>. One such grid is called a <strong>memory bank</strong>.</p><p>From the outside world, it may look like reading from a memory bank is an atomic operation. But, in reality it is a <strong>multi-step process</strong> between the processor (memory controller, to be precise) and the DRAM. The 3 steps typically involved in a READ operation are:</p><ol><li><p><em><strong>Activate (ACT)</strong></em><strong>:</strong> A portion of the read address identifies which <em>row</em> within the <em>bank</em> the data is stored in. The processor first sends an <strong>ACT</strong> command to the DRAM to transfer the data from the memory cells into the bank&#8217;s <strong>Sense Amps</strong>.</p></li><li><p><em><strong>Column-Address-Strobe (CAS)</strong></em>: </p><ol><li><p>The activation process takes some time, so only after a delay of <em>tRCD (Row-to-Column Delay)</em> the column address can be sent to the DRAM. </p></li><li><p>Then, after a latency of <em>tCL (CAS Latency)</em>, the data is returned.</p></li></ol></li><li><p><em><strong>Precharge(PRE)</strong></em>: </p><ol><li><p>Before accessing another row within the same bank, the currently open row has to be deactivated. This process is called <em>Precharging</em>. </p></li><li><p>Once the current row is deactivated, it has to remain idle for <em>tRP</em> time before another row can be activated.</p></li><li><p>Also, care has to be taken to ensure that the time between two ACT commands is at least <em>tRC</em>. This is the row cycle time.</p></li></ol></li></ol><p>These timing parameters are illustrated below. As you can see, a <strong>basic read operation involves a number of delays and latencies</strong>. </p><blockquote><p><em>In the context of LPDDR5 running at 6400Mbps, <strong>tRCD is 18ns</strong> and <strong>tCL is ~22ns</strong> and after all that setup, it <strong>takes just 2.5ns to read the data back from that column</strong>. So the overhead is substantial. </em></p><p><em>In the upcoming section we&#8217;ll see how <strong>hiding these latencies is crucial</strong> to extract maximum performance out of the DRAM.</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2f3l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2f3l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 424w, https://substackcdn.com/image/fetch/$s_!2f3l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 848w, https://substackcdn.com/image/fetch/$s_!2f3l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 1272w, https://substackcdn.com/image/fetch/$s_!2f3l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2f3l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png" width="1456" height="643" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/acf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0493a37c-2a4c-47ea-8937-b347a8cc3bec_3008x1328.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:643,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2f3l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 424w, https://substackcdn.com/image/fetch/$s_!2f3l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 848w, https://substackcdn.com/image/fetch/$s_!2f3l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 1272w, https://substackcdn.com/image/fetch/$s_!2f3l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Facf04d59-09ef-4f09-b9b4-ca8dcd3729d3_3008x1328.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Timing parameters involved in a READ operation</figcaption></figure></div><h2>Hiding latencies to maximize throughput</h2><p>Let&#8217;s stick with READ operations and examine <strong>4 different back-to-back read scenarios</strong>, and how the <strong>performance <a href="https://english.stackexchange.com/questions/46496/fluctuates-widely-or-wildly#:~:text=The%20two%20originals%20likely%20being:%20*%20varies,by%20large%20and%20small%20values%2C%20seemingly%20unpredictable.">varies widely</a> for each of them</strong>.</p><h3>Page hits vs page miss</h3><ul><li><p><em>Scenario 1:</em> Back-to-back reads to the <strong>same bank</strong> but <strong>different rows</strong>. </p></li><li><p><em>Scenario 2:</em> Back-to-back reads to the <strong>same bank</strong> and <strong>same row</strong>.</p></li></ul><p>In <em>scenario 1</em>, since the relevant data is located in 2 different rows, after the first row is accessed it has to be deactivated, and the second row has to go through the activation process all over again. This scenario is also called a <strong>Page Miss</strong>.</p><p>But in <em>scenario 2</em>, both the reads have the same row address, only the column address is different. Here the row activation process only happens once. After which, two CAS (column-address-strobes) are sent from the processor and the data is read back. This scenario leads to what is called a <strong>Page Hit</strong>.</p><blockquote><p>In scenario 2, the two CAS commands are separated by a timing parameter called <em>tCCD (col-to-col delay)</em></p></blockquote><p>As evident from the figure below, in case of a <strong>page miss it takes a lot longer to fetch the data from the DRAM</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K_UL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K_UL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 424w, https://substackcdn.com/image/fetch/$s_!K_UL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 848w, https://substackcdn.com/image/fetch/$s_!K_UL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 1272w, https://substackcdn.com/image/fetch/$s_!K_UL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K_UL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45ea9582-bae0-40ee-8cd4-053dcfa3ce1f_5088x2884.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K_UL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 424w, https://substackcdn.com/image/fetch/$s_!K_UL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 848w, https://substackcdn.com/image/fetch/$s_!K_UL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 1272w, https://substackcdn.com/image/fetch/$s_!K_UL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd618d7-42fc-4195-af43-45c23bff76d7_5088x2884.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Page hit versus page miss</figcaption></figure></div><h3>Bank Groups vs Banks</h3><p>While page hits are important, accessing the same bank in back-to-back reads is not good. We can do better to hide some of these latencies, especially the <em>tRCD</em>.</p><p>Fortunately for us, the DRAM die is made up of not just one bank but a collection of banks, which are arranged in <strong>Bank Groups</strong>. Next, we'll look at how bank groups help us improve performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jnRq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jnRq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 424w, https://substackcdn.com/image/fetch/$s_!jnRq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 848w, https://substackcdn.com/image/fetch/$s_!jnRq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!jnRq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jnRq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png" width="374" height="398.65934065934067" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06988283-71b0-485e-a373-258e3782882a_1768x1884.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2ba6a0a-7c69-4069-9908-17930de7b70a_1768x1884.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1552,&quot;width&quot;:1456,&quot;resizeWidth&quot;:374,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!jnRq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 424w, https://substackcdn.com/image/fetch/$s_!jnRq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 848w, https://substackcdn.com/image/fetch/$s_!jnRq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!jnRq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06988283-71b0-485e-a373-258e3782882a_1768x1884.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Banks and Bank Groups</figcaption></figure></div><p><strong>Each bank has a set of Sense Amps</strong>. They are like different books, you are allowed keep a page (i.e., row) open in each Bank.  So, when the memory controller sees two reads to different banks, </p><ul><li><p>It issues an ACTIVATE to the first bank, and <strong>while waiting for </strong><em><strong>tRCD</strong></em> (Row-to-Col-Delay) it issues a second ACTIVATE to the second bank and prepares it for the second read. By doing back-to-back reads to different banks, you have allowed the memory controller the opportunity to <strong>hide the </strong><em><strong>tRCD</strong></em><strong> latency for the second bank</strong>.</p></li><li><p>Similarly, once the controller issues the CAS (column address strobe) for the first read, it issues the second CAS to the different bank <strong>while waiting for </strong><em><strong>tCL</strong></em> (Read Latency). </p></li><li><p>This process of doing things in parallel reduces the total latency to finish both reads.</p></li></ul><p>This <strong>principle of hiding latencies is crucial</strong> to extract as much performance as possible out of the DRAM, and fundamentally speaking, the <strong>data has to </strong><em><strong>first</strong></em><strong> be stored in a suitable fashion to help the controller hide the latencies</strong>.</p><p>But, banks and bank groups are not equal. Let&#8217;s examine the next two scenarios</p><ul><li><p><em>Scenario 3</em>: Back-to-back reads to different banks in the <strong>same bank group</strong></p></li><li><p><em>Scenario 4</em>: Back-to-back reads to different banks in <strong>different bank groups</strong></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vA4r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vA4r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 424w, https://substackcdn.com/image/fetch/$s_!vA4r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 848w, https://substackcdn.com/image/fetch/$s_!vA4r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 1272w, https://substackcdn.com/image/fetch/$s_!vA4r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vA4r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png" width="1456" height="1127" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4808a991-c0ca-454b-b4f7-dbe2229faf04_4088x3164.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1127,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vA4r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 424w, https://substackcdn.com/image/fetch/$s_!vA4r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 848w, https://substackcdn.com/image/fetch/$s_!vA4r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 1272w, https://substackcdn.com/image/fetch/$s_!vA4r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6810e126-d2eb-4a23-b9f0-76a7846441fd_4088x3164.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Effects of <em>tCCD_L</em> versus <em>tCCD_S</em></figcaption></figure></div><p>In <em>scenario 3</em>, since the back-to-back reads go to different banks of the <strong>same bank group</strong>, the two reads have to be separated by the <em>tCCD_L</em> (Column-to-column-<strong>Long</strong>) timing parameter. (<em>Which is 4 clock cycles in LPDDR5</em>).</p><p>In <em>scenario 4</em>, the back-to-back reads go to <strong>different bank groups</strong>. In this case, the CAS (column-addres-strobe) commands only need to be separated by <em>tCCD_S</em> (Col-to-col-<strong>Short</strong>). So <strong>scenario 4 ends up being the best case for DRAM performance</strong>. </p><blockquote><p><em>For LPDDR5 operating at 6400Mbps, <strong>tCCD_S</strong> is <strong>2 clock cycles (2.5ns)</strong>. That happens to be exactly the amount of time needed to read out the data from a single column. This detail is critical &#8212; if you want to keep the data bus fully occupied and extract maximum performance from the memory, two conditions have to be met.</em></p><ol><li><p><em>Successive reads should target different bank groups.</em></p></li><li><p><em>A high portion of the accesses should be page hits.</em></p></li></ol><p><em>With this setup, you can first issue a sequence of <strong>tRRD_S</strong> (row activation) commands to open rows across multiple bank groups. Once those pages are active, you can issue a burst of <strong>tCCD_S</strong> commands to stream data continuously from the open pages. </em></p><p><em>In this best-case scenario, the memory bus stays busy almost all the time, allowing you to capture roughly <strong>93% of the theoretical bandwidth</strong>.</em></p></blockquote><h2>Summary so far &#8230;</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3AKM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3AKM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 424w, https://substackcdn.com/image/fetch/$s_!3AKM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 848w, https://substackcdn.com/image/fetch/$s_!3AKM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 1272w, https://substackcdn.com/image/fetch/$s_!3AKM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3AKM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png" width="1456" height="310" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/682e07af-734f-4288-a4d5-d9b213dd3569_2648x564.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:310,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3AKM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 424w, https://substackcdn.com/image/fetch/$s_!3AKM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 848w, https://substackcdn.com/image/fetch/$s_!3AKM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 1272w, https://substackcdn.com/image/fetch/$s_!3AKM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ca8b76-713d-4ba5-b28a-0befdbe52679_2648x564.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Split of total bandwidth</figcaption></figure></div><ul><li><p>Timing parameters are a necessary evil. But it <em>is</em> possible to work around <em>some</em> of them by hiding the latencies.</p></li><li><p>For the best performance, back-to-back reads should be issued to different Bank Groups. This is when the latencies introduced by the timing parameters are the least.</p></li><li><p>The performance is worst when back-to-back reads are issued to different rows of the same bank.</p></li><li><p>A basic understanding of timing parameters is necessary to develop an intuition of how good or bad the access pattern is for your workload. Take the time to study the illustrations above, there is a lot of detail in them. </p></li><li><p>Here&#8217;s a convenient <a href="https://www.systemverilog.io/design/ddr4-timing-parameters-cheatsheet/">cheatsheet of all the timing parameters we discussed</a> above.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/p/fundamental-guide-to-understanding-880?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/p/fundamental-guide-to-understanding-880?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h1><code>&#128274; </code>Meta, memory controllers, and estimating performance</h1><p>In the members-only section,</p><ul><li><p>We&#8217;ll take a look at <strong>two more timing parameters</strong> - tFAW and tWTR, which play a crucial role in performance.</p></li><li><p>We&#8217;ll take a <strong>brief look at how memory controllers work</strong>, and the structures at your disposal to help maximize throughput. </p></li><li><p><strong>How</strong> companies like <strong>Meta and Apple evaluate and study the performance</strong> of their DRAM architecture.</p></li><li><p>Tools and strategies that are used to study performance.</p></li><li><p>Common mistakes made during performance estimation.</p></li></ul><p><em>Chiplog is a member supported newsletter. Right now memberships are <strong>25% off</strong>, that&#8217;s <strong>$74/year</strong>(~$6/month).</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/summer25&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/summer25"><span>Subscribe</span></a></p>
      <p>
          <a href="https://www.chiplog.io/p/fundamental-guide-to-understanding-880">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Fundamental guide to understanding DRAM Memory]]></title><description><![CDATA[Fundamental concepts behind HBM, LPDDR, GDDR and such memories]]></description><link>https://www.chiplog.io/p/fundamental-guide-to-understanding</link><guid isPermaLink="false">https://www.chiplog.io/p/fundamental-guide-to-understanding</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 10 Aug 2025 22:03:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/41b0a56f-280a-4ecf-b42c-e93fe4e00065_1200x675.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>All DRAM memories</strong>, whether it&#8217;s HBM3, LPDDR5, GDDR6, or DDR5, are <strong>built on the same fundamental concepts</strong>. From an engineering perspective, these memories are challenging to work with, and in most ASICs and SoCs (no matter the application &#8212; AI training/inference, CPU, GPU, or Networking), <strong>performance often ends up being</strong> <strong>memory-bound</strong>. </p><p>So, in order to design a robust memory subsystem, make these memories run reliably, and squeeze out every bit of performance out of them, you need to understand <strong>3 key dimensions</strong>:</p><ol><li><p><em>Physical</em> <em>Structure - </em>Understanding the physical structure is required to efficiently store and retrieve data from the memory. A <strong>good data storage architecture is critical to performance</strong>.</p></li><li><p><em>Timing Parameters - </em>Reading and Writing to DRAM is like a carefully choreographed dance. We are bound by the physics of these devices. <strong>Timing parameters dictate the rules that have to be followed while accessing DRAM memories</strong>, i.e., how fast you can access the memory, how often they have to be refreshed to retain data, and such.</p></li><li><p><em>Initialization and Calibration - </em>Over its lifetime, a DRAM device will be subject to variations in <strong>Process, Voltage </strong>and<strong> Temperatures</strong> (PVT). Memory controllers use special circuits and algorithms to periodically tune the interface so it runs reliably in all conditions.</p></li></ol><p><strong>In this post, we will focus on Dimension #1</strong>: the physical structure of DRAM. We&#8217;ll start from a single DRAM cell and work our way up to a complete package, while discussing ranks, banks, channels, page hits/miss and other terminology. </p><blockquote><div><hr></div><h4><code>&#128274; </code><em>For members:</em></h4><ul><li><p>We&#8217;ll take a <strong>side-by-side look at HBM3E and LPDDR5X</strong> &#8212; two of the most popular DRAM types that are shipping in volume right now</p></li><li><p>Explore their structural differences </p></li><li><p>Understand <strong>why HBM&#8217;s design delivers such staggering bandwidth</strong> compared to LPDDR5</p></li></ul><p><em>Chiplog is a member supported newsletter. Right now memberships are <strong>25% off.</strong> That&#8217;s $74/year (~$6/month).</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/summer25&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/summer25"><span>Subscribe now</span></a></p><div><hr></div></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pm2N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pm2N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 424w, https://substackcdn.com/image/fetch/$s_!pm2N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 848w, https://substackcdn.com/image/fetch/$s_!pm2N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 1272w, https://substackcdn.com/image/fetch/$s_!pm2N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pm2N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png" width="1456" height="521" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e6383f6-4f44-40bc-bb29-13905a9486b4_1924x688.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:521,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:304742,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/170018464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e6383f6-4f44-40bc-bb29-13905a9486b4_1924x688.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pm2N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 424w, https://substackcdn.com/image/fetch/$s_!pm2N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 848w, https://substackcdn.com/image/fetch/$s_!pm2N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 1272w, https://substackcdn.com/image/fetch/$s_!pm2N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc525546a-468a-47ab-8347-daa33d1b1c87_1924x688.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">DRAM from a single cell to a full package</figcaption></figure></div><div><hr></div><h1>Structure of a DRAM</h1><h3><em>Cell</em></h3><p>At the lowest level, <strong>a memory cell stores 1-bit of information</strong>. This cell is essentially a capacitor that holds the charge and a transistor acting as a switch. Since the capacitor discharges over time, the information eventually fades unless the capacitor is periodically <strong>REFRESH</strong>ed. This is where the <strong>D</strong> in DRAM comes from - it refers to <strong>Dynamic</strong> as opposed to the <strong>Static</strong> in SRAM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L_KO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L_KO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 424w, https://substackcdn.com/image/fetch/$s_!L_KO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 848w, https://substackcdn.com/image/fetch/$s_!L_KO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 1272w, https://substackcdn.com/image/fetch/$s_!L_KO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L_KO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png" width="584" height="510.1978021978022" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1272,&quot;width&quot;:1456,&quot;resizeWidth&quot;:584,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L_KO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 424w, https://substackcdn.com/image/fetch/$s_!L_KO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 848w, https://substackcdn.com/image/fetch/$s_!L_KO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 1272w, https://substackcdn.com/image/fetch/$s_!L_KO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b99e6a-2608-471a-9a03-dc5d37be6af0_3164x2764.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><em>Rows, columns, banks</em></h3><p>When you zoom out one level, you will see these <strong>memory cells arranged in a grid of </strong><em><strong>Rows</strong></em><strong> and </strong><em><strong>Columns</strong></em><strong>. </strong>One such grid is called a <strong>Bank</strong>. </p><p>In order to read or write to a specific location in the memory, an address has to be specified. <strong>In this address the Bank, Row and Column numbers are encoded</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YpXK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YpXK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 424w, https://substackcdn.com/image/fetch/$s_!YpXK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 848w, https://substackcdn.com/image/fetch/$s_!YpXK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 1272w, https://substackcdn.com/image/fetch/$s_!YpXK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YpXK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png" width="435" height="174" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a107f6e-13ae-47cc-8759-facb0e52887b_435x174.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:174,&quot;width&quot;:435,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19847,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/170018464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a107f6e-13ae-47cc-8759-facb0e52887b_435x174.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YpXK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 424w, https://substackcdn.com/image/fetch/$s_!YpXK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 848w, https://substackcdn.com/image/fetch/$s_!YpXK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 1272w, https://substackcdn.com/image/fetch/$s_!YpXK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850a902c-e74e-483d-bd09-e6d4497295c6_435x174.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Address showing different portions &#8212; Column, Row, Bank, Rank <em>(More on rank later)</em></figcaption></figure></div><p>  <strong>A memory bank is like a book</strong>, where </p><ul><li><p>BANK address indicates which book you want to read or write to</p></li><li><p>ROW address is the page number within that book</p></li><li><p>COLUMN address is the line number within that page</p></li></ul><p>There are 3 distinct phases in accessing a memory:</p><ul><li><p><strong>ACTIVATE: </strong> First a Row Address Decoder ACTIVATEs the row number specified in the address, and brings the data into a structure called the <strong>Sense Amp</strong>. <em>This is equivalent to opening the desired page of the book.</em></p></li><li><p><strong>COLUMN STROBE:</strong> <em> </em>Then a Column Address Decoder extracts data from a specified column and streams it out. <em>This is the same as locating and reading the line in the open page.</em></p></li><li><p><strong>PRECHARGE:</strong> Once the access is complete, the Row has to be closed and data in the Sense Amps is returned back into the Row cells. This closing of the row is called PRECHARGE. <em>This is the same as closing the book.</em></p></li></ul><p>The GIF below shows a Memory <code>READ</code> operation in action.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;987e9ef0-f170-4be3-b699-f1bedaf82c1a&quot;,&quot;duration&quot;:null}"></div><blockquote><pre><code><em><strong>### SIDE NOTE: Page hits and page miss</strong>

You may have heard of "Page Hits" and that it 
improves performance. This is indeed correct. 

Continuing with our analogy of the DRAM bank 
being a book ...

<strong>Page hit scenario:</strong>
If all the data you are trying to retrieve 
are present on the same page, then
+ You can open to the page (ROW address) once
+ Read all the lines (COL addresses) 
+ Close the page
 
<strong>Page miss scenario:</strong>
If the data you are trying to access are 
located in different pages, then you will have to
+ Open the first page
+ Read some lines 
+ Close that page
+ Open the next page
+ Read some more lines
+ Close that page
+ ... and so on

Needless to say, there is a <strong>significant penalty</strong> 
<strong>to page misses</strong> since there are many more ACTIVATEs 
and PRECHARGEs to be performed.<strong>
</strong></em></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QobC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QobC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 424w, https://substackcdn.com/image/fetch/$s_!QobC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 848w, https://substackcdn.com/image/fetch/$s_!QobC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!QobC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QobC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png" width="1456" height="746" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:746,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QobC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 424w, https://substackcdn.com/image/fetch/$s_!QobC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 848w, https://substackcdn.com/image/fetch/$s_!QobC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!QobC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d7f7967-13cb-4950-8bb7-fef3940a9ec1_3124x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Page hits versus page miss</figcaption></figure></div></blockquote><h3><em>Memory die</em></h3><p>A memory die is one physical wafer and is made up of a collection of banks. Memory vendors such as Micron, Samsung and SK Hynix make memory dies of various <strong>densities</strong>, depending upon the number of rows, columns and banks. </p><p>The table below shows the various densities LPDDR5 memories dies are made in. Notice how larger densities just have more number of Rows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Shu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Shu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 424w, https://substackcdn.com/image/fetch/$s_!2Shu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 848w, https://substackcdn.com/image/fetch/$s_!2Shu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 1272w, https://substackcdn.com/image/fetch/$s_!2Shu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Shu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png" width="556" height="329" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86c56311-e1a4-402d-a7c8-fe33e351a33a_556x329.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:329,&quot;width&quot;:556,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/170018464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c56311-e1a4-402d-a7c8-fe33e351a33a_556x329.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Shu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 424w, https://substackcdn.com/image/fetch/$s_!2Shu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 848w, https://substackcdn.com/image/fetch/$s_!2Shu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 1272w, https://substackcdn.com/image/fetch/$s_!2Shu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dbc0f59-c930-4a30-9b3e-17a0ec1bfd7a_556x329.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Standard LPDDR5 memory densities</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3-v2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3-v2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3-v2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3-v2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3-v2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3-v2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg" width="268" height="182.87058823529412" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:116,&quot;width&quot;:170,&quot;resizeWidth&quot;:268,&quot;bytes&quot;:11596,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/170018464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3-v2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3-v2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3-v2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3-v2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fbbff57-8fb2-4810-a2e2-3450a7180dd9_170x116.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Memory die</figcaption></figure></div><h3><em>Memory channels</em></h3><p><strong>The interface to the Memory is like a pipe</strong> &#8230; a bundle of wires. This bundle has a name &#8212; <em>Command-Address (CA) and Data (DQ) bus.</em></p><p>On one end of this pipe you have the brain (Processor/ASIC with the memory controller &#8212; usually called the logic die). At the other end is <em>some arrangement of Memory dies</em>. This pipe with a collection of memory dies at the end of it is called a <strong>&#8220;channel&#8221;</strong>, and the image below shows 3 different <strong>arrangements</strong> of memory dies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z5kC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z5kC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 424w, https://substackcdn.com/image/fetch/$s_!z5kC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 848w, https://substackcdn.com/image/fetch/$s_!z5kC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 1272w, https://substackcdn.com/image/fetch/$s_!z5kC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z5kC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png" width="678" height="372.99313186813185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:801,&quot;width&quot;:1456,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z5kC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 424w, https://substackcdn.com/image/fetch/$s_!z5kC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 848w, https://substackcdn.com/image/fetch/$s_!z5kC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 1272w, https://substackcdn.com/image/fetch/$s_!z5kC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9eb1de39-f96e-4372-8b1d-28a701fadb6e_3288x1808.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image shows how a single 2Gb memory die is arranged in different ways to create three different sizes - 2Gb, 4Gb and 8Gb</figcaption></figure></div><h3><em>Memory Package</em></h3><p>If the memory is a stand-alone unit, like in LPDDR5 or DDR5, then it is packaged so that it can be soldered on the board, or within the main SoC package (Memory-on-Package). <strong>This package is made up of multiple-channels</strong>. Each channel is connected to a different memory controller and works independently.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fund!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fund!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 424w, https://substackcdn.com/image/fetch/$s_!Fund!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 848w, https://substackcdn.com/image/fetch/$s_!Fund!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 1272w, https://substackcdn.com/image/fetch/$s_!Fund!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fund!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png" width="506" height="278.3746312684366" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:373,&quot;width&quot;:678,&quot;resizeWidth&quot;:506,&quot;bytes&quot;:111431,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/170018464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fund!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 424w, https://substackcdn.com/image/fetch/$s_!Fund!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 848w, https://substackcdn.com/image/fetch/$s_!Fund!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 1272w, https://substackcdn.com/image/fetch/$s_!Fund!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29e91f4-aac2-439b-a388-af50cb51b1ae_678x373.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A package showing 4 channels with each channel consisting of 2 dies</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vG7r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vG7r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vG7r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vG7r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vG7r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vG7r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg" width="285" height="177" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:177,&quot;width&quot;:285,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13804,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/170018464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vG7r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vG7r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vG7r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vG7r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd552db9-91a5-4b76-828f-86ebfefde60c_285x177.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">A Samsung LPDDR5 memory package which can be soldered onto the PCB</figcaption></figure></div><h1>Performance</h1><p>I&#8217;ll wrap up the free portion of this article with a quick discussion on performance.</p><p>When deciding the memory architecture for an SoC and deciding which memory type to use (HBM, LPDDR, DDR or GDDR), you really only have <strong>3 main levers to pull with regard to performance</strong>. (<em>Of course, cost, power, and availability are important factors that guide this decision too.</em>)</p><p>Using the analogy of the <strong>compute die being connected to the memory through a pipe</strong>, we can ask 3 questions:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iQ6J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iQ6J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 424w, https://substackcdn.com/image/fetch/$s_!iQ6J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 848w, https://substackcdn.com/image/fetch/$s_!iQ6J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!iQ6J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iQ6J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png" width="1456" height="645" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb7b7938-531b-4ee1-8c85-a934ff14ad47_2364x1048.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:645,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iQ6J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 424w, https://substackcdn.com/image/fetch/$s_!iQ6J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 848w, https://substackcdn.com/image/fetch/$s_!iQ6J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!iQ6J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651778a6-df32-47ea-9c89-4783df0b87ce_2364x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">3 questions to ask when deciding which memory type to use</figcaption></figure></div><h4><strong>1. How many pipes? (Number of memory channels)</strong></h4><p>If each memory channel delivers 10GB/s of bandwidth per channel, then you have to first decide how many channels are required for your SoC&#8217;s application. For instance, if you require 40GB/s of total bandwidth, then you will need four channels.</p><p>Each channel needs its own memory controller on the logic/compute die, <strong>which eats up both die area and beachfront</strong>. Beachfront is the Chip edge real estate where I/O interfaces live. A very precious resource, and <strong>something we&#8217;ll dive into in the next article </strong><em>(another reason to subscribe if you haven&#8217;t already!)</em><strong>.</strong></p><h4><strong>2. Size of each pipe? (Memory interface width)</strong></h4><p>The memory interface width dictates the <strong>number of data (DQ) bits that can be transmitted in parallel</strong>. The <strong>wider the DQ bus, the greater the bandwidth</strong> offered by the memory since more bits can be transported between the compute die and the memory in one clock cycle.</p><h4><strong>3. How fast does data flow through these pipes? (Memory frequency)</strong></h4><p>DRAMs can be operated at various frequencies, for example LPDDR5 can run at a maximum 6400Mbps, but also at 3200Mbps and 5400Mbps. But cranking the dial to the max isn&#8217;t always the feasible.</p><p>Higher frequencies draw more power, are harder to run reliably, and can introduce signal integrity headaches. That means your system design and cooling requirements also get more complex.</p><p>To balance performance with thermals and stability, techniques like <strong>Dynamic Frequency Scaling (DFS)</strong> are used to keep the TDP in check. But this complicates software design. </p><blockquote><h4><em><strong>Page Hits &amp; Data Architecture</strong>  </em></h4><p><em>Any discussion on performance is incomplete without talking about <strong>Data Storage Architecture.</strong> </em></p><p><em>Well-structured storage and retrieval patterns can significantly improve how well the memory bandwidth is utilized. Even with the best hardware, the <strong>wrong access patterns can leave much of the memory bandwidth untapped</strong>.</em></p></blockquote><div><hr></div><h1><code>&#128274; </code>Comparing HBM3E versus LPDDR5X</h1><p>The rest of this post is for members. </p><ul><li><p>We&#8217;ll take a <strong>side-by-side look at HBM3E and LPDDR5X</strong> &#8212; two of the most popular DRAM types that are shipping in volume right now</p></li><li><p>Explore their structural differences </p></li><li><p>Understand <strong>why HBM&#8217;s design delivers such staggering bandwidth</strong> compared to LPDDR5</p></li></ul><p><em>Right now memberships are <strong>25% off.</strong> That&#8217;s <strong>$74/year</strong> (~$6/month). With this membership, you&#8217;ll be supporting <strong>chiplog.io</strong>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/summer25&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/summer25"><span>Subscribe</span></a></p>
      <p>
          <a href="https://www.chiplog.io/p/fundamental-guide-to-understanding">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[3 Great Examples of how ASICs and FPGAs are used as Accelerators]]></title><description><![CDATA[Fast inverse square root, blazing fast regex pattern matching, matrix arithmetic and more...]]></description><link>https://www.chiplog.io/p/3-great-examples-of-how-asics-and</link><guid isPermaLink="false">https://www.chiplog.io/p/3-great-examples-of-how-asics-and</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 27 Jul 2025 20:45:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ce505e53-0512-4aa0-90ca-2ff8cd0a609a_910x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this article,</p><ul><li><p>I&#8217;ll first loosely define <strong>what an accelerator is</strong> and the <strong>scale</strong> on which the <strong>effectiveness</strong> of devices are measured.</p></li><li><p>Then examine three real-world <strong>examples of hardware acceleration</strong> and the <strong>design patterns</strong> that power them.</p></li></ul><h2>Power, Performance and Area (PPA)</h2><p>Let&#8217;s start with the proverbial PPA chart. In the world of hardware design, there's a fundamental trade-off that engineers constantly wrestle with:</p><ul><li><p><strong>Power:</strong> How much energy does it consume to operate?</p></li><li><p><strong>Performance:</strong> How fast can it get the job done?</p></li><li><p><strong>Area/Cost:</strong> How much silicon real estate (and therefore money) does it take to fabricate? and how hard is it to make the system or is it even feasible?</p></li></ul><p>Whether you're building a general-purpose CPU, a GPU, or a hyper-specialized ASIC, your design will ultimately be judged on this scale. The &#8220;<strong>true north</strong>&#8221; for any chip designer is achieving maximum performance with minimal power and area.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ia3H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ia3H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 424w, https://substackcdn.com/image/fetch/$s_!ia3H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 848w, https://substackcdn.com/image/fetch/$s_!ia3H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 1272w, https://substackcdn.com/image/fetch/$s_!ia3H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ia3H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png" width="524" height="462.4587912087912" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1285,&quot;width&quot;:1456,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:182710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ia3H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 424w, https://substackcdn.com/image/fetch/$s_!ia3H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 848w, https://substackcdn.com/image/fetch/$s_!ia3H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 1272w, https://substackcdn.com/image/fetch/$s_!ia3H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03d92cb9-dc6e-4eef-9bf5-1823da793d25_1512x1334.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">&#8220;True North&#8221; - Ideal PPA</figcaption></figure></div><h2>Why build accelerators?</h2><p>One pattern that emerges over and over again in both human behavior and engineering is, <strong>when a task is repeated over-and-over again, someone will eventually build a tool to make it faster.</strong> Just as construction workers don&#8217;t rely on hand saws when they can use power tools, software engineers and hardware designers don&#8217;t keep performing the same slow computations in general-purpose environments if they can offload them to something more efficient.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8cdZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8cdZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 424w, https://substackcdn.com/image/fetch/$s_!8cdZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 848w, https://substackcdn.com/image/fetch/$s_!8cdZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!8cdZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8cdZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png" width="483" height="421.6298076923077" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1271,&quot;width&quot;:1456,&quot;resizeWidth&quot;:483,&quot;bytes&quot;:216521,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8cdZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 424w, https://substackcdn.com/image/fetch/$s_!8cdZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 848w, https://substackcdn.com/image/fetch/$s_!8cdZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!8cdZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1de7f7ec-8acd-4dce-888b-e5b72876ed47_1466x1280.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparing PPA for Handsaw versus Power Tool</figcaption></figure></div><p>Also, the term <em>&#8220;accelerator<strong>&#8221;</strong></em> is not limited to hardware. As an obsessive gamer in the 90s and 00s, one <strong>software accelerator</strong> that caught our imagination was the <strong>Fast Inverse Square Root</strong>.</p><p>In 3D games, like Quake, it was required to compute <strong>1/&#8730;x</strong> repeatedly for each point to determine lighting and reflections on surfaces. On early Intel processors, doing this calculation through repeated division operations was very slow. John Carmack famously replaced this code with the <em>Fast Inverse Square Root</em>, which was discovered by Quake III Arena fans when <em>id software</em> made the game open-source.  This is the <strong>Gamer&#8217;s Accelerator</strong>. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mW8v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mW8v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 424w, https://substackcdn.com/image/fetch/$s_!mW8v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 848w, https://substackcdn.com/image/fetch/$s_!mW8v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!mW8v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mW8v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png" width="728" height="321" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/591d4914-e624-4c63-a966-b561900280d6_2450x1080.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:1441972,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F591d4914-e624-4c63-a966-b561900280d6_2450x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mW8v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 424w, https://substackcdn.com/image/fetch/$s_!mW8v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 848w, https://substackcdn.com/image/fetch/$s_!mW8v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!mW8v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbae6271-e45c-49d0-b33c-37b874124d0a_2450x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Replacing a one-liner with a bit-twiddly magic number makes the inverse sq-root faster. <strong>Comments are from the original source code.</strong></figcaption></figure></div><p></p><div><hr></div><h2>3 Hardware Design Patterns that Accelerate</h2><h3>1. High-Speed Regular Expression Pattern Matching</h3><p>In cybersecurity, devices known as <em>networks firewalls</em> perform <strong>deep packet inspection (DPI)</strong> on packets flowing over the network to detect malicious payloads and sensitive data leaks such as SSNs and credit card numbers. </p><p>The set of signatures to be identified are turned into <strong>regular expressions</strong> using <em>deterministic</em> and <em>non-deterministic finite automata</em> (DFA and NFA). For instance, suppose you are searching for patterns:</p><ul><li><p><code>abb</code></p></li><li><p><code>aabb</code></p></li><li><p><code>baaab</code></p></li></ul><p>These patterns can be combined into the regular expression <code>(a|b)*(abb|a+b) </code>and when represented as a finite automata, it becomes the following state machine.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3xtQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3xtQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 424w, https://substackcdn.com/image/fetch/$s_!3xtQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 848w, https://substackcdn.com/image/fetch/$s_!3xtQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 1272w, https://substackcdn.com/image/fetch/$s_!3xtQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3xtQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png" width="505" height="334.00755494505495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/483b183b-c048-4379-b386-62fb85f55599_2672x1768.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:963,&quot;width&quot;:1456,&quot;resizeWidth&quot;:505,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3xtQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 424w, https://substackcdn.com/image/fetch/$s_!3xtQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 848w, https://substackcdn.com/image/fetch/$s_!3xtQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 1272w, https://substackcdn.com/image/fetch/$s_!3xtQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ac043-562c-4db5-aabb-b38fc6dc454f_2672x1768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">* = 0 or more occurrences, + = 1 or more occurrences</figcaption></figure></div><p>The firewall then <strong>maintains state for each individual user stream</strong>, using <strong>hash tables</strong>, and takes action if there is a match for any of these signatures. In a high throughput environment, such as a data center, there will be <strong>millions of concurrent traffic streams to manage</strong>.</p><p>Performing these <strong>hash calculations and pattern matches</strong> <strong>on every single packet</strong> <strong>is expensive</strong>. Purely software firewalls running on general-purpose processors will not scale beyond a certain bandwidth, say 20Gbps. Instead, an alternative is to offload such frequent computations to a <strong>FPGA</strong>, which can perform the same computation in a fraction of the time than software would take.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!th9z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!th9z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 424w, https://substackcdn.com/image/fetch/$s_!th9z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 848w, https://substackcdn.com/image/fetch/$s_!th9z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 1272w, https://substackcdn.com/image/fetch/$s_!th9z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!th9z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png" width="514" height="189.59016393442624" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/496514e6-d8fc-432a-aee2-c74bebd57827_1952x720.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1952,&quot;resizeWidth&quot;:514,&quot;bytes&quot;:81741,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!th9z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 424w, https://substackcdn.com/image/fetch/$s_!th9z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 848w, https://substackcdn.com/image/fetch/$s_!th9z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 1272w, https://substackcdn.com/image/fetch/$s_!th9z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39b1254b-68ef-4d0b-be7d-28b947ee24c9_1952x720.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Software running on multiple EC2 instances could offload compute to a single FPGA</figcaption></figure></div><p>Implementing such acceleration is now easily possible with services such as <strong>AWS F2 FPGA Instance</strong>. The system could be architected such that a pool of servers offload their most expensive computation to a single FPGA instance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h2x4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h2x4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 424w, https://substackcdn.com/image/fetch/$s_!h2x4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 848w, https://substackcdn.com/image/fetch/$s_!h2x4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!h2x4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h2x4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png" width="1456" height="773" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e263679-a142-4fd8-89bd-6697b5a1ca02_2450x1300.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:773,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1006328,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e263679-a142-4fd8-89bd-6697b5a1ca02_2450x1300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h2x4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 424w, https://substackcdn.com/image/fetch/$s_!h2x4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 848w, https://substackcdn.com/image/fetch/$s_!h2x4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!h2x4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc3bc054-0533-4cc7-9cec-8f211a8cfc5b_2450x1300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AWS F2 Instance</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g5pq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g5pq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 424w, https://substackcdn.com/image/fetch/$s_!g5pq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 848w, https://substackcdn.com/image/fetch/$s_!g5pq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!g5pq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g5pq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png" width="451" height="389.04945054945057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/892fbea5-fdb3-4233-8840-aca3523bc8e1_1500x1294.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1256,&quot;width&quot;:1456,&quot;resizeWidth&quot;:451,&quot;bytes&quot;:207049,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F892fbea5-fdb3-4233-8840-aca3523bc8e1_1500x1294.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g5pq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 424w, https://substackcdn.com/image/fetch/$s_!g5pq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 848w, https://substackcdn.com/image/fetch/$s_!g5pq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!g5pq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5aa71e-3156-46ab-a427-e9d111b4b6d6_1500x1294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">What system designers are trying to achieve when employing FPGA acceleration. Increased performance while slightly reducing Cost and Power.</figcaption></figure></div><h3>2. Data-Center Scale Cybersecurity</h3><p>Let&#8217;s stick with cybersecurity and firewalls. </p><p>In places like a coffee shop or a small office, where there maybe 20 people connected to the WiFi network at a given time, the entire firewall software stack can run on a single processor. This is evident from the portfolio of hardware systems sold by <a href="https://www.fortinet.com/products/next-generation-firewall">Fortinet</a> and <a href="https://www.paloaltonetworks.com/resources/pa-series-next-generation-firewalls-hardware-architectures">Palo Alto Networks</a>. You will see lower end systems with just one processor.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vZ0U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vZ0U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!vZ0U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!vZ0U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!vZ0U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vZ0U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png" width="491" height="353.4120879120879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4687afa6-f849-41d9-b154-30c41838981d_1456x1048.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:491,&quot;bytes&quot;:266644,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4687afa6-f849-41d9-b154-30c41838981d_1456x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vZ0U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!vZ0U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!vZ0U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!vZ0U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc28c1aea-fbdf-4dc8-be3d-9387d5f0074e_1456x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image courtesy of PANW and Fortinet company website</figcaption></figure></div><p>But at the scale of a cloud data center or ISP, things get more complex. You're handling <strong>terabits per second</strong> of encrypted HTTPS traffic. These high-end firewall systems have software running on multiple processors in a distributed fashion.</p><p>For this, an ASIC (Application Specific IC) is typically required to ingest the sheer volume of traffic. They analyze, pre-classify, split, and schedule massive network flows into smaller chunks for the downstream CPUs to process. <a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>The image below is from the Palo Alto Networks website. It shows how a high-end system has an <strong>ASIC in the front-end</strong> to ingest traffic and a <strong>FPGA at the back-end</strong> to offload expensive compute off the processors running the firewall software.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8SJm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8SJm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 424w, https://substackcdn.com/image/fetch/$s_!8SJm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 848w, https://substackcdn.com/image/fetch/$s_!8SJm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!8SJm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8SJm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png" width="563" height="511.9587912087912" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84a2e03c-c58e-486d-9968-4a37edabd941_1980x1800.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1324,&quot;width&quot;:1456,&quot;resizeWidth&quot;:563,&quot;bytes&quot;:423951,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a2e03c-c58e-486d-9968-4a37edabd941_1980x1800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8SJm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 424w, https://substackcdn.com/image/fetch/$s_!8SJm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 848w, https://substackcdn.com/image/fetch/$s_!8SJm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 1272w, https://substackcdn.com/image/fetch/$s_!8SJm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fd6145-56e3-48d3-8f48-478af53d241d_1980x1800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">PANW system architecture. (Image from PANW company website)</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WHX8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WHX8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 424w, https://substackcdn.com/image/fetch/$s_!WHX8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 848w, https://substackcdn.com/image/fetch/$s_!WHX8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!WHX8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WHX8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png" width="551" height="283.44711538461536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f225c75c-faf6-4914-a4c2-a62544e5d7f3_2472x1272.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:749,&quot;width&quot;:1456,&quot;resizeWidth&quot;:551,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WHX8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 424w, https://substackcdn.com/image/fetch/$s_!WHX8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 848w, https://substackcdn.com/image/fetch/$s_!WHX8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!WHX8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5699673-fd57-4b3f-9a1b-8885a4c8451c_2472x1272.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Alternate view of the system architecture</figcaption></figure></div><blockquote><p><em>So, when you scale up an application from one single processor to a collection of them, some specialized chips capable of performing specific computation is required to make that scale happen.</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sldh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sldh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 424w, https://substackcdn.com/image/fetch/$s_!Sldh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 848w, https://substackcdn.com/image/fetch/$s_!Sldh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!Sldh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sldh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png" width="412" height="356.82142857142856" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1261,&quot;width&quot;:1456,&quot;resizeWidth&quot;:412,&quot;bytes&quot;:220477,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Sldh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 424w, https://substackcdn.com/image/fetch/$s_!Sldh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 848w, https://substackcdn.com/image/fetch/$s_!Sldh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 1272w, https://substackcdn.com/image/fetch/$s_!Sldh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4be4fe07-4995-4240-84b6-e79b6f11363c_1490x1290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparing PPA between low-end and high-end devices</figcaption></figure></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"><em>Subscribe to receive new posts in your email</em></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>3. Accelerating Matrix Arithmetic with Systolic Arrays</h3><p>The last two examples achieved acceleration by converting portions of the software into hardware circuits. This one accelerates by creating <strong>Domain Specific Architectures (DSA)</strong> rather than using general-purpose processors.</p><p>The conventional <strong>Von Neumann architecture</strong>, in which instructions are <strong>fetched and executed sequentially</strong>, combined with the fact that the processing pipeline is <strong>separated from memory by long paths</strong> and cache stages, can limit the amount of performance available from general purpose processors.</p><p>The <strong>primary benefit of domain-specific architectures</strong>, as you will see next, is that it unlocks the power of <strong>concurrency</strong>. However, parallelism in computation will only result in a moderate speedup. These designs also use <strong>custom memory architectures</strong>, such as co-locating small memories with the processing element, to achieve<strong> an order of magnitude</strong> increase in performance.</p><p>First published in 1978 by H.T.Kung, here below is the <strong><a href="https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf">systolic array architecture</a></strong>. From a philosophical sense, one of the most elegant designs in computer architecture &#8230; it&#8217;s poetic. In these systems, data pulses through a grid of processing elements (PEs), each of which perform a small computation and passes results to the next.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dzOT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dzOT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 424w, https://substackcdn.com/image/fetch/$s_!dzOT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 848w, https://substackcdn.com/image/fetch/$s_!dzOT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 1272w, https://substackcdn.com/image/fetch/$s_!dzOT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dzOT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png" width="432" height="477.989010989011" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png&quot;,&quot;srcNoWatermark&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eff4fc81-c8bb-4837-a57d-fe3275da3d34_3004x3324.png&quot;,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1611,&quot;width&quot;:1456,&quot;resizeWidth&quot;:432,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dzOT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 424w, https://substackcdn.com/image/fetch/$s_!dzOT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 848w, https://substackcdn.com/image/fetch/$s_!dzOT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 1272w, https://substackcdn.com/image/fetch/$s_!dzOT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e59be3-c0cc-44a8-9f1f-ed93a34bbe6c_3004x3324.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of a 3x3 matrix multiply-and-accumulate using systolic arrays</figcaption></figure></div><p>This architecture is also the foundation of <strong><a href="https://arxiv.org/abs/1704.04760">Google&#8217;s Tensor Processing Unit (TPU)</a></strong>.</p><blockquote><p><em>As reading a large SRAM uses much more power than arithmetic, the matrix unit uses systolic execution to save energy by reducing reads and writes of the Unified Buffer. </em></p><p><em>It relies on data from different directions arriving at cells in an array at regular intervals where they are combined. Figure 4 shows that data flows in from the left, and the weights are loaded from the top. </em></p><p><em>A given 256-element multiply-accumulate operation moves through the matrix as a diagonal wavefront. The weights are preloaded, and take effect with the advancing wave alongside the first data of a new block. Control and data are pipelined to give the illusion that the 256 inputs are read at once, and that they instantly update one location of each of 256 accumulators.</em></p><p><strong>Jouppi et al</strong>., <em>"<a href="https://arxiv.org/abs/1704.04760">In-Datacenter Performance Analysis of a Tensor Processing Unit</a>"</em>, ISCA 2017</p></blockquote><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_7r-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_7r-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 424w, https://substackcdn.com/image/fetch/$s_!_7r-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 848w, https://substackcdn.com/image/fetch/$s_!_7r-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 1272w, https://substackcdn.com/image/fetch/$s_!_7r-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_7r-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png" width="1456" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1120675,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_7r-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 424w, https://substackcdn.com/image/fetch/$s_!_7r-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 848w, https://substackcdn.com/image/fetch/$s_!_7r-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 1272w, https://substackcdn.com/image/fetch/$s_!_7r-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F496dc01a-70e0-4b28-bf71-add6da66a2b9_1822x820.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://arxiv.org/abs/1704.04760">Jouppi et al.</a></figcaption></figure></div><p>The beauty and poetry comes from the <strong>flexibility to create different geometries and control how data flows within the array mesh</strong>, and construct designs that are tailored to different problems and applications. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hUOy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hUOy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!hUOy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!hUOy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!hUOy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hUOy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png" width="1456" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:391091,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/168836283?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hUOy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!hUOy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!hUOy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!hUOy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d334d5-2edd-42ea-8122-4b7b3e4de755_1456x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf">Why Systolic Architectures -  H.T. Kung, 1978</a></figcaption></figure></div><h2>Final Thoughts</h2><p>Those were 3 examples of how FPGAs and ASIC can be used to accelerate workloads. The main takeaways are:</p><ol><li><p>Software design patterns some times work really well when implemented in hardware.</p></li><li><p>Domain specific architectures with application specific computation and memory structures can exploit concurrency and speed-up execution by orders of magnitude when compared to general-purposes CPUs or even GPUs.</p></li></ol><div><hr></div><p><em>That&#8217;s it for this edition of Chiplog. If you enjoyed this article and would like to support my work here at <a href="https://www.chiplog.io">chiplog.io</a> as well as <a href="https://www.systemverilog.io">systemverilog.io</a>, please do consider becoming a member today.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/p/3-great-examples-of-how-asics-and?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption"><em>This post is public so feel free to share it!</em></p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/p/3-great-examples-of-how-asics-and?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/p/3-great-examples-of-how-asics-and?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><strong>US Patent:</strong> <em><a href="https://patents.google.com/patent/US11968178B2/">Reduction and acceleration of a deterministic finite automaton</a><strong>, </strong>Subramani Ganesh, et.al</em> </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><strong>US Patent:</strong> <em><a href="https://patents.google.com/patent/US9906495B2">Network device implementing two-stage flow information aggregation</a>, Sidong Li, Subramani Ganesh, et. al</em></p></div></div>]]></content:encoded></item><item><title><![CDATA[How to prepare and apply for hardware engineering jobs]]></title><description><![CDATA[A guide especially for new college grads and early in career engineers]]></description><link>https://www.chiplog.io/p/how-to-prepare-and-apply-for-hardware</link><guid isPermaLink="false">https://www.chiplog.io/p/how-to-prepare-and-apply-for-hardware</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Mon, 14 Jul 2025 06:55:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6e8463ee-9004-4d8d-9805-fe075d1b52cd_962x467.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>The process of applying and interviewing for a job is always stressful</strong>, it doesn&#8217;t matter if you&#8217;re a new college graduate or a seasoned engineer of 15 years.</p><p>This article offers a <strong>3-step guide</strong> on how to prepare and apply for your first, or next, job in hardware engineering.</p><p>&#128274; <em>For members, <strong>I&#8217;ve also included an extensive spreadsheet of 100 companies</strong> in hardware engineering to kickstart your search.</em></p><h2>Step 1: Understanding the available opportunities </h2><p>The journey of a chip or a hardware system, from conception to finished product, involves a number of sub-specializations in hardware engineering.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xrfc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xrfc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 424w, https://substackcdn.com/image/fetch/$s_!xrfc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 848w, https://substackcdn.com/image/fetch/$s_!xrfc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 1272w, https://substackcdn.com/image/fetch/$s_!xrfc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xrfc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png" width="1456" height="729" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:729,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xrfc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 424w, https://substackcdn.com/image/fetch/$s_!xrfc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 848w, https://substackcdn.com/image/fetch/$s_!xrfc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 1272w, https://substackcdn.com/image/fetch/$s_!xrfc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0faadf7-752c-43dd-aac1-fc5a8cb41a01_2644x1324.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Different engineering roles involved in chip development through its various stages, in a <strong>fabless</strong> company</figcaption></figure></div><p>By accepting a job offer, you&#8217;re not just picking the company but also which specialization to focus your career on. So, the first step is to get a good understanding of the lay of the land. </p><blockquote><p><em>While doing my graduate studies, I fell into this trap of applying only for jobs related to the subjects I had studied, because I felt these were the only interviews I could crack.</em></p><p><em>In hindsight, there were several opportunities in the neighborhood of my expertise that were worth considering.</em> <em>So, <strong>understand the lay of the land and confidently cast a wide net.</strong></em></p></blockquote><h2>Step 2: Applying for the job</h2><p>Once you have a grasp of the opportunities out there, the next step is figuring out <strong>how to reach them effectively</strong>.</p><h4>Tailor your resume</h4><ul><li><p>A single resume is insufficient. <strong>Create multiple versions</strong> of your resume tailored to different roles.</p></li><li><p>Depending on the job description and what that company does, minor tweaks to your resume to <strong>highlight pertinent skills</strong> can significantly increase your chances of getting that call. </p></li></ul><h4>Referrals work</h4><ul><li><p><strong>Referrals</strong> are better than a blind application to a job listing. They often <strong>get your resume seen faster and by the right people.</strong></p></li><li><p><strong>Leverage your network</strong> &#8212; friends, alumni, family, and acquaintances.  Let people know you&#8217;re actively looking.</p></li></ul><h4>Cold emailing is okay</h4><ul><li><p>Don&#8217;t hesitate to reach out to people on LinkedIn. <strong>A cold email is totally acceptable if it&#8217;s</strong> <strong>polite, direct, and genuine</strong>. </p></li><li><p>The worst they can do is not reply. In the best-case, you get a referral or even an interview.</p></li></ul><h2>Step 3: Preparing for the interview</h2><h4>The mindset</h4><ul><li><p>Your goal should be &#8212; if you get the interview, <strong>you should have a near 100% chance of converting it into an offer.</strong> </p></li><li><p>You leave nothing on the table. The only reason why you should walk away without an offer is because they were just exploring to see if you would work and there was a genuine mismatch.</p></li></ul><h4>Know your work</h4><ul><li><p>Almost always, the first question will be about your past experience. It sounds obvious, but the <strong>ability to clearly explain your past work is the first step towards success in your interview</strong>. This is where you make that first impression.</p></li><li><p>You should rehearse your answers so that they are clear and concise. You&#8217;ll be anxious as it is, so don&#8217;t try to look for the right words during the interview. </p></li></ul><blockquote><p><em>I&#8217;ve made this mistake myself. I was once asked about a patent I had filed 8 years earlier. I hadn&#8217;t revisited the work beforehand and, in that moment, I blanked. It made my look unprepared and maybe even dishonest. </em></p><p><em>It is critical to <strong>revisit and rehearse</strong> your own experience before the interview.</em></p></blockquote><h4>Understand the surrounding 20%</h4><ul><li><p>As a principle, you should <strong>strive to understand not just your own work but its surrounding 20%</strong> as well. While it is not practical to know everything, this is a good general approach to any problem or project that&#8217;s assigned to you.</p></li><li><p> This extra 20% would help you understand the history behind this particular problem, studying other comparable designs, and what has and has not worked. This depth shows that you&#8217;re not just a doer but a thinker as well. </p></li></ul><h4>Standing out as a new college grad</h4><p>As a new grad, your resume might look like a hundred others. So how do you stand out? How do you make a good first impression during your interview?</p><ul><li><p><strong>Personality and communication skills:</strong> Personality is something that can be worked on. Things like how you speak and present yourself matter. Interviews are not just about your technical chops.</p></li><li><p><strong>Stay curious and well-read:</strong> </p><ul><li><p>When you go through a panel of interviews, for a portion of the interview you can rely on your past work/project experience. </p></li><li><p>But, there will be technical questions which you haven&#8217;t directly worked on. Being well-read (i.e., knowing what companies in your industry are doing and the top papers published) will help you come up with a hypothesis and give a good thoughtful answer (even if its incorrect). </p></li><li><p>This also shows your willingness to learn and become good at your craft.</p></li></ul></li></ul><h4>What interviewers are looking for</h4><p>As someone who interviews candidates and helps build teams, I can tell you that my evaluation boils down to three things:</p><ul><li><p><strong>Personality:</strong> Can I imagine working with this person for the next 3,5 or 8 years? Are they respectful, open, and articulate?</p></li><li><p><strong>Technical Aptitude:</strong> This isn&#8217;t about IQ. It&#8217;s about the quality of your past work, how well you know your industry, and can you apply your knowledge creatively to problems you haven&#8217;t exactly seen before. </p></li><li><p><strong>Willingness and Ambition: </strong>How much do you care about the quality of your work. How important is it for you to get better at your craft? </p></li></ul><div><hr></div><h2>List of 100 companies</h2><p><em>That&#8217;s it for this edition. To kickstart your job search, I&#8217;ve prepared a spreadsheet with 100 semiconductor companies based in North America. </em></p><p><em>This section is a token of thanks to members. You can support Chiplog through this link. There is purchasing price parity and student discounts. Please send an email to <strong>members@chiplog.io</strong>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p>
      <p>
          <a href="https://www.chiplog.io/p/how-to-prepare-and-apply-for-hardware">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Intel makes sure the FDIV bug never happens again]]></title><description><![CDATA[The people and breakthroughs behind Intel&#8217;s quiet revolution in formal verification]]></description><link>https://www.chiplog.io/p/how-intel-makes-sure-the-fdiv-bug</link><guid isPermaLink="false">https://www.chiplog.io/p/how-intel-makes-sure-the-fdiv-bug</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Fri, 25 Apr 2025 20:22:29 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8fd3acf6-3753-47d4-bf4c-88e8bc6cb379_1456x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>FDIV</h1><p>This historic issue has been <a href="http://www.righto.com/2024/12/this-die-photo-of-pentium-shows.html">extensively analyzed and reported on</a>. Here&#8217;s a brief summary:</p><ul><li><p>In 1994, a math professor discovered anomalies in calculations with Intel Pentium CPUs.</p></li><li><p>Intel attempted to brush it under the rug, claiming it isn&#8217;t a big deal.</p></li><li><p>However, the whole situation blew up, resulting in a recall that cost Intel ~$500 million.</p></li></ul><h1>The problem</h1><p>To understand how a problem like this goes undetected before manufacturing, we must understand how chips are tested and verified before fabrication.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TUvE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TUvE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 424w, https://substackcdn.com/image/fetch/$s_!TUvE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 848w, https://substackcdn.com/image/fetch/$s_!TUvE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 1272w, https://substackcdn.com/image/fetch/$s_!TUvE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TUvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png" width="1456" height="505" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:505,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TUvE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 424w, https://substackcdn.com/image/fetch/$s_!TUvE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 848w, https://substackcdn.com/image/fetch/$s_!TUvE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 1272w, https://substackcdn.com/image/fetch/$s_!TUvE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7838d8f-f437-40fa-b7d1-2b44895a6833_2560x888.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simulation</figcaption></figure></div><p><strong>Pre-silicon verification</strong> (i.e., testing of chip designs before they are sent for fabrication) is primarily done through a method called <strong>simulation</strong>; this was true in the 90s and even today. This method involves writing tests that stimulate the design with all possible combinations of inputs and then observing if it works as expected. </p><p>Companies such as Apple, Intel, AMD, and Nvidia, spend considerable effort on simulating chip designs before tape-out. The memory, compute, and licenses for special tools in themselves cost several million dollars, and a fleet of machines regress through massive test suites 24x7x365 in the hopes of uncovering any design flaws.</p><p>But, even with such a rigorous process, bugs such as the FDIV have shown us that this method of testing is insufficient. </p><blockquote><p><em><strong>The issue is</strong> <strong>that simulation is inherently an incomplete process</strong>. It's analogous to using brute force to solve an issue.</em></p><p><em>This strategy works well for many designs, but for others, such as a Floating Point Unit (FPU), you will never cover all input combinations in a single lifetime.</em></p></blockquote><pre><code><em>To be fair, there is a reason why simulation is the primary method used in the industry for chip verification.

+ There is an ecosystem of mature industry tools and methodologies that have been used to successfully tape-out chips over the past decades.

+ During pre-silicon verification, chip designs are often decomposed into smaller modules, for which simulation generally works well.</em></code></pre><h1>The solution: Verification using Formal methods</h1><p>Consider this equation,</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nkdL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nkdL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 424w, https://substackcdn.com/image/fetch/$s_!nkdL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 848w, https://substackcdn.com/image/fetch/$s_!nkdL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 1272w, https://substackcdn.com/image/fetch/$s_!nkdL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nkdL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png" width="540" height="54.674063800277395" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:146,&quot;width&quot;:1442,&quot;resizeWidth&quot;:540,&quot;bytes&quot;:26646,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/155779481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nkdL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 424w, https://substackcdn.com/image/fetch/$s_!nkdL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 848w, https://substackcdn.com/image/fetch/$s_!nkdL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 1272w, https://substackcdn.com/image/fetch/$s_!nkdL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cfd12a-3ac3-44ab-affa-be3d0eab00af_1442x146.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>There are two ways to verify if it is correct.</p><ol><li><p><strong>Brute force:</strong> If you take a simulation-like approach, you might try 10,000 random values for <em><strong>n</strong></em>. If it works, you will then have to optimistically conclude that the equation is right. But, if we&#8217;re being honest, you haven&#8217;t exhaustively proven that the equation is indeed correct. </p></li><li><p><strong>Formal proof:</strong> You might recall from high school math that <em>mathematical induction</em> is the right way to exhaustively prove this equation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dZ9b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dZ9b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 424w, https://substackcdn.com/image/fetch/$s_!dZ9b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 848w, https://substackcdn.com/image/fetch/$s_!dZ9b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!dZ9b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dZ9b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png" width="589" height="523.3529411764706" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1148,&quot;width&quot;:1292,&quot;resizeWidth&quot;:589,&quot;bytes&quot;:143537,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/155779481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dZ9b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 424w, https://substackcdn.com/image/fetch/$s_!dZ9b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 848w, https://substackcdn.com/image/fetch/$s_!dZ9b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!dZ9b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910367c0-4c1c-41f0-800c-58a0d0a41727_1292x1148.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Proof by mathematical induction</figcaption></figure></div></li></ol><blockquote><p><em>The main takeaway from this section is that,</em></p><p><em><strong>Simulation is not the only method</strong> to check if a circuit is doing what it is supposed to do. <strong>There is another approach called Formal Verification, </strong>which is capable of exhaustively proving that a design works.</em></p></blockquote><h1>The issue with Formal Verification</h1><p>However, Formal isn&#8217;t without its own issues.</p><ul><li><p>Consider mathematical induction from the previous section &#8212; we conveniently used it to prove the equation, <strong>but how does it actually work?</strong> How can you simply assume something to be true for <strong>n=k</strong> and then use that assumption to <a href="https://math.stackexchange.com/questions/19485/dominoes-and-induction-or-how-does-induction-work">construct a proof for </a><strong><a href="https://math.stackexchange.com/questions/19485/dominoes-and-induction-or-how-does-induction-work">n=k+1</a></strong>?</p></li><li><p>Another example, consider <a href="https://en.wikipedia.org/wiki/Pythagorean_theorem">Pythagoras&#8217;</a> or <a href="https://en.wikipedia.org/wiki/Euclid%27s_theorem">Euclid&#8217;s theorem</a>. These problems use completely different mathematics and reasoning techniques to <a href="https://en.wikipedia.org/wiki/Pythagorean_theorem#Proofs_using_constructed_squares">arrive at the proof</a>.</p></li></ul><blockquote><p><em>The core issue with Formal methods is that <strong>understanding the mathematical basis</strong> for Formal proofs, which in case of hardware verification are <a href="https://en.wikipedia.org/wiki/Linear_temporal_logic">LTL</a>, <a href="https://en.wikipedia.org/wiki/Computation_tree_logic">CTL</a>, <a href="https://en.wikipedia.org/wiki/Model_checking">Model Checking</a> and <a href="https://en.wikipedia.org/wiki/Automated_theorem_proving">Theorem Proving</a>,</em> <em><strong>is quite challenging</strong>, and <strong>coming up with fresh proof approaches is even more difficult.</strong></em></p><p><em><strong>Formal Verification has only recently</strong> (in the last ten years) broken away from being purely in the realm of Math and Computer Science PhDs and<strong> entered the toolkit of the average engineer.</strong></em></p><p><em>We finally have a collection of mature tools, methodologies and proof techniques which <strong>can be learnt and applied within the constraints of aggressive tape-out schedules</strong>. </em></p><p><em>In the following section, we will look at how Intel contributed to this effort.</em></p></blockquote><h1>Formal Verification at Intel</h1><h2>Evolution</h2><p>After the fallout from the FDIV problem in 1995, Intel knew it needed to do something different. At the time, there were no established alternatives to simulation. But, <strong>Intel correctly identified that the answer lies in the field of, what I will loosely call, </strong><em><strong>Automated Reasoning</strong></em><strong>.</strong> </p><p>The idea behind automated reasoning is to have a computer program prove mathematical theorems or, given a formal model of a system (such as a state transition diagram), have a program tell you if a certain equations are true in that system. Automated theorem proving <a href="https://en.wikipedia.org/wiki/Automated_theorem_proving#First_implementations">first saw success in the 1950s</a>. But further research in the field was largely motivated by the desire to improve <em>solvers/proof assistants</em> as well as software/systems reliability (See Edgar Dijkstra&#8217;s essay <a href="https://www.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD303.html">On Reliabilty of Programs</a>). It was only in the 1980s when researchers began to explore the effectiveness of this approach in hardware verification.</p><p>Fast forward back to 1995, there were really three strategies worth exploring for hardware verification &#8212; Model Checking, SAT solvers and theorem provers. Ed Clarke, the co-inventor of Model Checking, said this in his <a href="https://youtu.be/I1lf2MBy3J4?si=EVwzUT6EMeChltBv&amp;t=1287">2007 Turing Award Lecture</a>.</p><pre><code><em>When Intel made the announcement that there was a real problem with the Pentium 1 chip, a former CMU student who had taken one of my classes, Manpreet Khaira (then a Principal at Intel), called me up and said &#8220;send me one of your students&#8221;.

I sent him Xudong Zhao, who was my best student at the time. Xudong spent 3-4 months at Intel and <strong>was able to show that his PhD thesis topic, Word Level Model Checking, could indeed have found the Pentium error </strong>and moreover he was able to show that <strong>the fix Intel had come up with did indeed fix the problem.</strong></em>

<a href="https://youtu.be/I1lf2MBy3J4?si=EVwzUT6EMeChltBv&amp;t=1287">Ed Clarke, 2007 Turing Award Lecture</a></code></pre><p>Subsequently, Intel committed time and money in developing Formal Verification. This chart summarizes Intel's evolution in this field.</p><div class="pullquote"><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wuG4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wuG4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 424w, https://substackcdn.com/image/fetch/$s_!wuG4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 848w, https://substackcdn.com/image/fetch/$s_!wuG4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 1272w, https://substackcdn.com/image/fetch/$s_!wuG4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wuG4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png" width="1456" height="745" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:745,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wuG4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 424w, https://substackcdn.com/image/fetch/$s_!wuG4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 848w, https://substackcdn.com/image/fetch/$s_!wuG4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 1272w, https://substackcdn.com/image/fetch/$s_!wuG4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe9d23bc-03b0-468e-9701-48dc8d865208_4596x2352.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Evolution of Formal Verification at Intel</figcaption></figure></div></div><h3>People</h3><p>When I scanned through the research published between 1995 and 2010, I noticed that practically every researcher from around the world who worked in the field of Model Checking or Automated Theorem Proving did a stint at Intel. By bringing these experts into the same cauldron new Formal techniques and methods were published at a brisk speed. (Calling it a cauldron seems appropriate given the heat Intel was experiencing at the time).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DXIR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DXIR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 424w, https://substackcdn.com/image/fetch/$s_!DXIR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 848w, https://substackcdn.com/image/fetch/$s_!DXIR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!DXIR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DXIR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:799081,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/155779481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DXIR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 424w, https://substackcdn.com/image/fetch/$s_!DXIR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 848w, https://substackcdn.com/image/fetch/$s_!DXIR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!DXIR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91f7bcef-0c28-446b-8e8e-c184a8a5f4f3_1980x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Notable researchers in Intel&#8217;s Strategic CAD Labs (SCL) https://doi.org/10.1109/54.936245</figcaption></figure></div><h3>Techniques</h3><p>Working with Intel gave these researchers the opportunity to apply their ideas to a real-world problem and, more importantly, iterate on these ideas until they matured and were robust enough for an industry setting.</p><p>For example, vanilla Model Checking has been known to have capacity issues. It only allows the user to reason with small designs, nowhere sufficient for something like the Intel Pentium processor. Several creative abstractions had to be constructed to boost the capacity of Model Checkers. The best idea to emerge from these academic collaborations was <strong>Symbolic Trajectory Evaluation (STE)</strong>, which was the brainchild of <strong>Carl H. Seger</strong>. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pXG7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pXG7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!pXG7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!pXG7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!pXG7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pXG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:319516,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/155779481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pXG7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!pXG7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!pXG7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!pXG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79e7728b-209f-4a6e-b093-89660e03f320_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Symbolic Trajectory Evaluation</figcaption></figure></div><p>But, even a venerable technique such as <strong>STE is only as good as the operator who uses it</strong>. Limor Fix and few others published seminal work on answering the question &#8220;<em>Are we done, yet?&#8221;</em>,<em> </em>that is, determining whether the Formal analysis of a design is indeed complete and there are no open holes. </p><p>Even here, there was a significant degree of engagement with academia, as seen in the papers below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nuG8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nuG8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 424w, https://substackcdn.com/image/fetch/$s_!nuG8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 848w, https://substackcdn.com/image/fetch/$s_!nuG8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!nuG8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nuG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:275670,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/155779481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nuG8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 424w, https://substackcdn.com/image/fetch/$s_!nuG8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 848w, https://substackcdn.com/image/fetch/$s_!nuG8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!nuG8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45e30a8a-c794-4ad4-a757-0fe4ebb4c4b9_1920x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Formal coverage</figcaption></figure></div><h3>Tools</h3><p>In order to Formally verify a chip design, which is typically represented as <a href="https://en.wikipedia.org/wiki/Register-transfer_level">RTL</a>, it has to first be synthesized into a suitable form, such as a <a href="https://en.wikipedia.org/wiki/Binary_decision_diagram">Binary Decision Diagram (BDD)</a>. In 1993, while at the University of British Columbia, Carl Seger developed a tool called VOSS.  After joining Intel in 1995, he collaborated with others like Tom Melham to developed its successor called <strong>Forte.</strong> It was designed specifically for industrial-scale circuits, with a unified interface to a BDD, a Model Checker, a SAT solver and a Theorem Prover [3, 6]. Making it seemless to verify a circuit by applying various techniques, such as STE.</p><p>Along with a tool like Forte, you require Formal semantics to write properties and ask questions about the design, such as, &#8220;<em>if event-A occurs, then does event-B occur 3 clock cycles later?&#8221; </em>For this they created a Formal Specification language called <strong>ForSpec</strong>.  </p><p>ASIC/SoC engineers will be delighted to learn that <strong>Intel donated</strong> <strong>ForSpec</strong> to Accellera and <strong>major parts of it were adopted during the creation of the SystemVerilog Assertions language (SVA)</strong>. SVA is the current industry standard using which Formal Verification is performed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8U3C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8U3C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 424w, https://substackcdn.com/image/fetch/$s_!8U3C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 848w, https://substackcdn.com/image/fetch/$s_!8U3C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!8U3C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8U3C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png" width="567" height="614.010152284264" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1280,&quot;width&quot;:1182,&quot;resizeWidth&quot;:567,&quot;bytes&quot;:409130,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/155779481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8U3C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 424w, https://substackcdn.com/image/fetch/$s_!8U3C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 848w, https://substackcdn.com/image/fetch/$s_!8U3C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!8U3C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e660b73-2905-442d-a718-02bd424edddf_1182x1280.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Forte&#8217;s GUI - https://doi.org/10.1109/54.936245</figcaption></figure></div><h3>Methodology</h3><p>In order to perform effective Formal Verification and produce high quality results, apart from tools and techniques, the Formal team at Intel also developed a robust methodology, spelling out how these tools should be applied. This was critical in democratizing Formal Verification within Intel and ensuring its adoption across projects in the company.</p><blockquote><p>Here&#8217;s a snippet from <em>&#8220;Making Formal Property Verification Mainstream at Intel&#8221; by M Achutha KiranKumar, et. al.</em></p><p><em>A 2-week dedicated FV training was conducted for all FV champions and interested designers. The <strong>training catered to meet the expectations of varied expertise level users</strong>. The training included topics such as FV basics, FV applications for specific verification problems, and working demos. The identified FV champions were given additional trainings on tackling convergence issues and on more complex areas. <strong>The response seen in trainings was very encouraging, with almost 85 engineers becoming formally literate</strong>. A round-the-clock formal assistance network from various resources was available after the training.</em></p></blockquote><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4q-d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4q-d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 424w, https://substackcdn.com/image/fetch/$s_!4q-d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 848w, https://substackcdn.com/image/fetch/$s_!4q-d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!4q-d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4q-d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:317610,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.chiplog.io/i/155779481?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4q-d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 424w, https://substackcdn.com/image/fetch/$s_!4q-d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 848w, https://substackcdn.com/image/fetch/$s_!4q-d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!4q-d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3e7c7b-7255-4804-a21c-52fe517a3c09_1920x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Conclusion</h1><p>In some ways, the FDIV bug served as a blessing in disguise for the semiconductor industry. Intel helped bring an esoteric field from academic research into mainstream chip development. Today, Formal Verification has grow into a thriving micro-industry with the major three CAD companies (Cadence, Synopsys and Siemens) selling advanced tools akin to Forte and providing training as well. Now every chip company considers Formal Verification a mandatory part of their chip development process.</p><p>As a cherry on top, more recently, engineers from Intel&#8217;s FVCTO (Formal Verification Central Tech Office) which houses some of the best Formal Verification engineers, published the definitive book on Applied Formal Verification upon which engineers like me have come to rely - <em>Formal Verification: An Essential Toolkit for Modern VLSI Design, Erik Seligman et, al.</em></p><div><hr></div><p>If you want to get started with Formal Verification, I recommend these three in-depth tutorials I&#8217;ve written on <a href="https://www.systemverilog.io">systemverilog.io</a>.</p><ol><li><p><a href="https://www.systemverilog.io/verification/sva-basics/">SystemVerilog Assertions Basics</a></p></li><li><p><a href="https://www.systemverilog.io/verification/gentle-introduction-to-formal-verification/">A Gentle Introduction to Formal Verification</a></p></li><li><p><a href="https://www.systemverilog.io/verification/blueprint-for-formal-verification/">A Blueprint for Formal Verification</a> </p></li></ol><h1>References</h1><blockquote><p><em>Many of the key references are included as images in earlier sections. This reference list offers a broader selection of interesting papers, along with brief commentary on a few. It&#8217;s also a small gesture of appreciation for the support from my patrons.</em></p></blockquote>
      <p>
          <a href="https://www.chiplog.io/p/how-intel-makes-sure-the-fdiv-bug">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[3 reasons why everyone must pursue a side project]]></title><description><![CDATA[Side projects are the antidote that your doctor did not prescribe, but should have. In this essay, we will look at three uses of this drug-free therapy.]]></description><link>https://www.chiplog.io/p/3-reasons-why-everyone-must-pursue</link><guid isPermaLink="false">https://www.chiplog.io/p/3-reasons-why-everyone-must-pursue</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sat, 28 Dec 2024 20:02:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ca25cf21-00a5-47d2-9419-baeb067acb0f_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>1. An antidote to stress</h1><p><strong>Distractions</strong>, in general, are an effective antidote to a stressful day at work or a hectic weekend with the family. However, not all types of distractions are equal. </p><p><strong>Passive distractions</strong> such as Instagram, YouTube and Netflix are <em>depleting</em>. But there&#8217;s something about <strong>active distractions</strong>, like playing an instrument or cooking, they are <em>rejuvenating</em>. </p><p><strong>Side projects are an active distraction. </strong>They transport you mentally and help you breakaway from your daily anxieties. They quench a certain thirst that you didn&#8217;t know needed quenching.</p><p>With side projects, <strong>you are doing something you have full ownership over. </strong>You pursue something for its own sake. There aren't many things in life that you can truly say that about.</p><blockquote><p><em>side_projects === drug_free_therapy</em></p></blockquote><pre><code>Reason 1: Rejuvenation </code></pre><h1>2. An antidote to diminishing attention span</h1><p><strong>Attention</strong> is the one true currency we have. Everyone is vying for it. Our Instagram and YouTube feeds have been meticulously engineered to capture our attention. Podcasts and blogs, including this Substack, all require your attention to ensure their success.</p><p>The volume of stimulus thrown at us every day has fractured our attention span. This fragmentation has affected children and grown-ups alike. Another therapeutic effect of side projects is that they are capable of healing our attention span.</p><p>I have friends who are hobby musicians and amateur woodworkers trying to produce original work on nights and weekends while supporting their families and raising children. Pulling this off demands a certain discipline, and <strong>side projects are the sandbox where your mind has the opportunity to sharpen this focus and discipline</strong>.</p><pre><code><code>Reason 2: Honing your one true currency - attention</code></code></pre><h1>3. An antidote to retirement</h1><p><a href="https://www.amazon.com/Finite-Infinite-Games-James-Carse/dp/1476731713">In his book</a>, James Carse purports that everything we do in life is either a finite or an infinite game. </p><p><strong>Your 9-to-5 is a finite game that is often mistaken for an infinite one</strong>. When I meet someone new at kids birthday parties, the first question is always - &#8220;<em>What do you do for work?&#8221; </em>We as a society have subsumed into our collective consciousness that the day job is our identity.</p><p>In retirement, you will have the physical aspects figured out: morning walks, afternoon yoga, and a few classes. But it is the mental side of retirement that can be deadly. Pursuing side projects and making that a part of your DNA and your main identity is a better place to be. </p><p>However, becoming a hobbyist or tinkerer is not something that happens overnight. Just like any other muscle, it needs to be worked on. <strong>Side projects are the infinite game everyone should be playing.</strong></p><pre><code>Reason 3: Developing your infinite game</code></pre><h1>Bonus</h1><blockquote><p><strong>Serendipity</strong></p><p>These side endeavors could open new doors and opportunities that you might not have pursued otherwise.</p></blockquote><p></p><blockquote><p><strong>Setting an example</strong></p><p>Your tinkering never goes unnoticed. </p><p>I spent a big part of my teen years hanging around a friend&#8217;s place. His dad was an HR executive by day, but his shelves were full of programming books. I would always find him walking around with one. Even when driving, the book would be positioned between the driver and passenger seats, ready to read during traffic stops. Later in life he went on to start a tech company. </p><p>Observing him had a significant impact on many of my behaviors and interests.</p></blockquote><h1>Where do you begin?</h1><ol><li><p><strong>If you have a few ideas for side projects but have been hesitant to get started</strong>, one way to beat the inertia is to commit to a 14-day sprint. Each session can be a short 15 minutes of tinkering, making a tiny bit of progress until the process becomes enjoyable and the habit becomes ingrained.</p></li><li><p>If you are <strong>still trying to figure out what side project to pursue</strong>, the best way to discover your next interest is to <em>learn</em> and <em>imitate</em>. Take an online course, or pick up a book and work through its examples. Starting your journey by imitating someone you admire removes the pressure to create something new and original. </p></li></ol><h1>A few of my favorite side projects</h1><ul><li><p><a href="https://arun.is">arun.is</a></p></li><li><p><a href="https://www.instagram.com/mohitbhoite?igsh=MWV6NG15dDVicGh5Zw==">Mohit Bhoite</a></p></li><li><p><a href="https://www.viksnewsletter.com">Vik&#8217;s Newsletter</a></p></li></ul><div><hr></div><p>That&#8217;s it for this session, see you in the next one. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Chip Log! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Apprentice, Specialist, Lieutenant, Leader - Ideas on how to craft your career]]></title><description><![CDATA[As an individual contributor, should you be a jack of many trades or a master of one?]]></description><link>https://www.chiplog.io/p/apprentice-specialist-lieutenant</link><guid isPermaLink="false">https://www.chiplog.io/p/apprentice-specialist-lieutenant</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 17 Nov 2024 00:23:56 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e219226d-61bd-45f5-a46b-6404cbb8e0be_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As you grow into your career, should you become a jack of many trades or a master of one? How should an individual contributor craft or shape their career?</p><div><hr></div><p>This progression is one way to develop a relevant profile and become an indispensable engineer. </p><pre><code><strong>Apprentice</strong> &#8594; <strong>Specialist</strong> &#8594; <strong>Lieutenant</strong> &#8594; <strong>Leader</strong> </code></pre><p>Here are a few ideas on how to think about each point along this career pathway.</p><h3>Apprentice [Years 0 to 5]</h3><p>Fresh out of university, you are a blank slate. As an Electrical/Computer Engineer, you will graduate with a set of broad skills, almost never experiencing a proper chip tape-out, with the goal of working in some aspect of chip development (as an RTL designer, DV Engineer, Analog Design, Physical Design, etc.).</p><p>When you start that first job, you <strong>must</strong> be able to do a variety of tasks and become a jack of many trades. This will require you to hustle and learn quickly, but the benefit of this hustle is that you are demonstrating to your team leads that &#8212;</p><ul><li><p>You have the capacity to learn quickly and contribute to the project</p></li><li><p>You can solve any problem that&#8217;s thrown your way</p></li></ul><p>This helps you win the confidence of you peers and dramatically increases your chances of receiving high-quality problems to work on. </p><p>But, most importantly, becoming a jack of all trades allows you to learn how the sausage is made &#8212;  the tools and methodologies used to design and validate a successful chip, how production code is written by experienced engineers, the various stages in SoC/ASIC development such as Performance Modeling, DFT, Physical Design, Timing Closure, Emulation, Verification, Post-Silicon Validation, and so on.</p><h3>Specialist [Years 6 to 11]</h3><p>At this point in your career, you&#8217;ve experienced a few tape-outs. So, as a natural progression you will be given the opportunity to own a meaningful chunk of the design. Use this to develop deep expertise in at least one area of your field.</p><p>Being regarded as the &#8220;go-to&#8221; person for a specific building block brings additional benefits, such as having to interact with cross-functional teams and become a proper stakeholder in meetings.</p><p>From my personal experience this is when I genuinely began to believe in my abilities to engineer, and began to feel like my contributions were making a meaningful difference. </p><p>Generally speaking, developing this feeling is a significant milestone. Your career shouldn't feel like a vast generic blob. Milestones like the one above will help you put stakes in the ground which you can look back on with confidence.</p><h3>Lieutenant [Years 12 to 17]</h3><p>As the lieutenant you are the linchpin of the team. You are responsible for executing several important functions. You will have to wear three different hats:</p><ul><li><p>You will continue to work as a specialist to deliver a specific complex portion of the chip, in the process producing a large amount of code.</p></li><li><p>You will leverage your experience from being a jack of many trades to carry out auxiliary functions. For example, develop tools, methodologies, and workflows for the rest of the team to use.</p></li><li><p>You will be entrusted by the team director to execute their vision, and sometimes filling in for them in their absence. You will be responsible not just for the work you produce but also for that produced by the team.</p></li></ul><p> You are responsible for interviewing and building a team on behalf of your team manager, and make recommendations on how to get past technical challenges. At this stage you should be developing, based on your experience, your own philosophy of how things have to be done.</p><p>Regularly reflect on your experience, make notes, and refine your craft.</p><h3>Leader [Years 18+]</h3><p>As a leader, i.e., a Director, Architect or a Distinguished Engineer, you now have the opportunity employ the lessons from your experience, and build products and lead teams according to your vision.</p><h1>Conclusion</h1><pre><code>Apprentice &#8594; Specialist &#8594; Lieutenant &#8594; Leader </code></pre><p>This, by no means, is the only way to think about your career. But, coming up with some framework like this and understanding what role you are playing in the team will help you set effective goals and get the most out of <strong>that point</strong> in your career. </p><div><hr></div><p>That&#8217;s it for this session. See you in the next one.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Chip Log! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How to get the most out of your work experience]]></title><description><![CDATA[A blueprint to extract two different kinds of experiences, implicit and explicit, from your work]]></description><link>https://www.chiplog.io/p/how-to-get-the-most-out-of-your-work</link><guid isPermaLink="false">https://www.chiplog.io/p/how-to-get-the-most-out-of-your-work</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Fri, 04 Oct 2024 05:21:22 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0bc118e3-5381-43b9-aaf9-a5271ddcdd9e_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>The Problem</h1><p>Chip/SoC development cycles are long, typically taking 2-3 years to design, implement, verify, tape-out, and then bringup. Considering the complexity of chips these days, in order to stay competitive and ship a product in a reasonable amount of time, the team sizes even at chip startups are large (at least in the tens, if not hundreds).</p><p>From the individual engineer&#8217;s point of view, with the ever-shrinking project timelines, there&#8217;s enormous pressure to deliver your work at a high quality and in a short amount of time. This means that you will likely work in a silo on a small section of the design throughout the project.</p><p>In subsequent projects (for example, the next revision of the chip), you are likely to continue working on the next iteration of the same block. This is obviously beneficial for the company because they already have someone familiar with that block, but <em>you may prefer this option as well</em> because you now carry a certain weight being the expert or go-to person for that block, and since you slogged the last time, doing this again will allow you to take a little break and work at an easier pace (since the code you had written is reusable).</p><p><strong>The problem</strong> really is that at the end of 6-7 years, when it comes time to sum up your experience at that job, you realize you&#8217;ve just worked on one thing. If you are thinking about changing jobs, your set of marketable skills may not be very impressive to potential employers.</p><h1>The Question</h1><p>How do you get the most out of your current job? </p><h1>A Potential Solution</h1><p>You can extract two different kinds of experiences from your work &#8212; <strong>implicit</strong> <strong>experience</strong> and <strong>explicit</strong> <strong>experience</strong>.</p><h3>Explicit experience</h3><p>Explicit experience is the work you&#8217;ve been assigned, the tasks that you perform day-to-day. Here are some ideas to make the most of it. Doing the following will demonstrate to your current and future employers that you have the <em>aptitude, willingness, and capacity</em> to <strong>own</strong> the work entrusted to you:</p><ul><li><p><strong>History:</strong> Going a step farther and learning the history behind your particular block. If you are working with the GDDR6 memory interface, for instance, investigate further what distinguishes GDDR from LPDDR and HBM as well as the subtle differences between GDDR6 and its predecessor, GDDR5.</p></li><li><p><strong>Compare:</strong> Look into the most recent advancements in your immediate area of work and discover how other companies are doing the same thing. Papers from conferences like DVCon, Hot Chips, DesignCon and SNUG are the best resource for this.</p><pre><code><em>It&#8217;s easy to get caught up in the rush to meet deadlines and get hyper focused on the task at hand. All the exploratory work mentioned above might have to come at the cost of a few weeknights or weekends, but it will make your work more satisfying.</em></code></pre></li><li><p><strong>Patent:</strong> Explore whether any aspect of the design (or a variation of it) is unique and patentable. Most businesses value this kind of contribution. </p></li><li><p><strong>Publish:</strong> Find an opportunity to publish and present your work at a conferences. In the process of preparing for a conference, you will pick up several important soft skills that will make you a well-rounded engineer. It also provides the opportunity to network with engineers from other organizations.</p></li><li><p><strong>Workflow:</strong> Take the time to develop a good workflow. Iterate and refine it. When a task is assigned to you how do you start? What tools do you use to maintain daily checklists? How proficient are you with diagramming tools, Vim/Emacs? Do you have keyboard shortcuts for frequent actions? These are little things that improve your style and mode of work.</p><pre><code><em>You must use your explicit experience to hone your craft and polish your tools so that when it is time to sculpt (i.e., write code), you will produce something you are proud of.</em></code></pre></li></ul><h3>Implicit experience</h3><p>Implicit experience is the experience you can derive from just being part of the project and a wider team. Here are a few ways to maximize your implicit experience.</p><ul><li><p><strong>Fly on the wall:</strong> Being in a chip team means you have access to specialists in other domains such as, Static Timing Analysis (STA), emulation engineers, seasoned architects, signal integrity experts, and many more. By being a fly on the wall during meetings (and paying attention) you can experience the kind of problems people are facing in their own blocks and how they are working through it. During lunch or a walk, you can further probe your team mates to understand their domain a bit more. </p><pre><code><em>With implicit experience you are essentially collecting stories of others&#8217; experience. Stories are important, because these stories will then influence how you think and design your blocks in the future. These stories are part of the experience you are developing as you transition from a junior to a senior engineer to a director.</em></code></pre></li><li><p><strong>Finding your Minor:</strong> At university, there&#8217;s this concept of majoring in one subject and minoring in another. This very much translates to work as well. While you are &#8220;majoring&#8221; on the tasks assigned to you, pick a minor (some other part of the project which interests you).</p></li><li><p><strong>Seeking a mentor:</strong> This can be a whole article in itself. Mentors are not just for new college grads, you are never too old to have a mentor. </p><pre><code><em>Good mentorship allows us to stand on the shoulders of giants and look further ahead. Seek out a mentor, someone you aspire to become.</em></code></pre></li></ul><h1>In a nutshell</h1><p>If you approach your job with this mindset of <strong>implicit</strong> and <strong>explicit</strong> experience, when you sum up your work at that work place it will look different. You will genuinely feel like an experienced engineer and you will have developed a much richer tool chest. You will have a better framework to solve problems, and you will be able to borrow from a richer set of lessons.</p><div><hr></div><p>That&#8217;s it for this session. See you in the next one.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Chip Log! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[3 Lessons from a Japanese Master Craftsman ]]></title><description><![CDATA[Writing beautiful code, benevolent mentors, and the mind body connection]]></description><link>https://www.chiplog.io/p/3-lessons-from-a-japanese-master</link><guid isPermaLink="false">https://www.chiplog.io/p/3-lessons-from-a-japanese-master</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 15 Sep 2024 18:58:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/nAsOGZiy39w" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Prologue</h1><p>I am a hardware engineer, yet there is nothing tangible in today&#8217;s <a href="https://en.wikipedia.org/wiki/3_nm_process">3nm semiconductor engineering</a>. I feel like a software developer because I write code all day. Sure, it&#8217;s the type of <a href="https://en.wikipedia.org/wiki/SystemVerilog">code that turns into transistors</a>, but still, for the most part I&#8217;m sitting at a computer staring at a screen. Mechanical watches with their <a href="https://teddybaldassarre.com/blogs/watches/watch-movements">intricate movements</a> quench my thirst for tangible hardware engineering.</p><p>This insightful 5-minute video below is a conversation with craftsmen/women from Grand Seiko, a respected Japanese watchmaker. They are famous for inventing a revolutionary movement known as the <a href="https://teddybaldassarre.com/blogs/watches/grand-seiko-spring-drive">Spring Drive</a>.</p><div id="youtube2-nAsOGZiy39w" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;nAsOGZiy39w&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/nAsOGZiy39w?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h1>Lesson 1</h1><blockquote><p><em>I think that being able to warm up the body before starting work is the best.</em></p><p><em>If the condition of the body isn&#8217;t good, then from starting in the morning until heading home, you can&#8217;t concentrate.</em></p><p><strong>Minute 2:00</strong></p></blockquote><p>The level of creativity and quality of your work as an engineer is determined by more than just your intelligence, knowledge, and experience. For instance, one bad night&#8217;s sleep makes me operate like <strong>my</strong> kindergartner who was refused ice cream (cranky, stumbling, fumbling and incoherent), rather than demonstrating my 15+ years of experience. </p><p>Exercise feels like a super drug, the simplest method to break out of a rut when stuck on a problem.</p><h1>Lesson 2</h1><blockquote><p><em>In making such a </em>complex device, <em>it is important that the next generation is able to service them since we are the ones that made them.</em></p><p><em>And, naturally, I will tell the younger workers 100% of what I know. Because I think that passing along these skills is such an important thing.</em></p><p><strong>Minute 3:27</strong></p></blockquote><p>In my first job out of graduate school, one of my assignments was to help bring up the high speed <a href="https://en.wikipedia.org/wiki/SerDes">SerDes</a> interface on a chip that had just arrived in the lab. We were at least a week (or two?) into the bringup and these interfaces just wouldn&#8217;t function. I could feel the heat in the room, and the bosses' hourly update requests were nerve-wracking.</p><p>I was provided with TCL scripts, a stack of intimidating C-firmware, and a 600-page user guide of the SerDes PHY IP, with terms like DFE and FFE which I didn&#8217;t quite understand. I was a good student and graduated with honors, but no amount of books or classes can adequately prepare you for your first chip bringup under the watchful eye of your manager.</p><p>The only reason this episode had a happy ending was that I was fortunate to have a benevolent mentor who sat down with me, helped me comprehend the problem, but also gave me the space to learn, ask dumb questions, experiment, analyze and reach my own conclusions. When we finally got things working, I felt like I&#8217;d earned my moment in the spotlight. </p><p>As a senior engineer now playing the role of a mentor, I try not to forget this. It takes the right kind of mentorship to help a junior engineer overcome their impostor syndrome and gain confidence in their engineering abilities.</p><h1>Lesson 3</h1><blockquote><p><em>This is the minimum requirement in making a beautiful watch: First, each individual part must be beautiful</em></p><p><strong>Minute 0:35</strong></p></blockquote><p>This reminds me of a system architect I once worked with, a true 10X engineer. While the systems he architected were innovative, robust, and made the company a lot of money, I was always in awe of the code he produced. At the lowest level, even simple things like where he chose to place an empty line in a code block, his brief but self-explanatory variable and function names, and the premeditated flexibility he added to the code for future enhancements were stunning.</p><p>That&#8217;s it for this issue, see you in the next one!</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Chip Log! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[A simple (overlooked) tool to help you achieve your goals]]></title><description><![CDATA[External scaffolding for the mind]]></description><link>https://www.chiplog.io/p/a-simple-overlooked-tool-to-help</link><guid isPermaLink="false">https://www.chiplog.io/p/a-simple-overlooked-tool-to-help</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Fri, 06 Sep 2024 19:39:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0fc000b9-3264-4886-82a1-3b3eb96ed421_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>External scaffolding for the mind</h2><p>This here is my habit tracker. I have it stuck to my kitchen pantry door. And you can bet that I visit the pantry at least a million times every day. It feels like this piece of paper is in front of my face all day everyday. This is the best way I've found for achieving a specific result.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5-sr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5-sr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5-sr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5-sr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5-sr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5-sr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg" width="1456" height="729" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:729,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:425920,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5-sr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5-sr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5-sr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5-sr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd762c94f-a87c-42be-a635-8c79dec42f11_4028x2016.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">My habit tracker. Helps me track my quarterly results, just like a corporation would.</figcaption></figure></div><p>As I&#8217;ve gotten older and my obligations at work and home have grown, I recognize my brain needs scaffolding. It requires a supporting structure to carry out day-to-day tasks. Without it, three things occur &#8212;</p><ol><li><p><strong>Losing focus:</strong> The important things tend to fade into the background and less important things that appear urgent, but are <em>not,</em> take their place. Fast-forward three months, I'm left asking, "Where has time gone? Why are all of these crucial tasks still where I left them months ago?&#8221;</p></li><li><p><strong>Untrusting mind:</strong> When I&#8217;m in the shower, out for a run, or just before I fall asleep, this exhausting voice in my head begins rehearsing the list of tasks that must be completed, much like <a href="https://www.reddit.com/r/gameofthrones/comments/bigke1/spoilers_aryas_complete_list/">Arya Stark&#8217;s kill list</a>. My best guess as to why this happens is that my mind is terrified of forgetting what needs to be done.</p></li><li><p><strong>Defeat:</strong> Life feels overwhelming. A depressing sense of defeat, as though I can't keep up with life and fulfill my responsibilities.</p></li></ol><h2>Keeping focus</h2><p>At work, I've come to rely on multiple pieces of scaffolding to get my tasks done &#8212; <a href="https://www.jetpens.com/Midori-MD-Sticky-Memo-Pads/ct/4349">Post-its</a>, a <a href="https://www.jetpens.com/Kokuyo-Campus-Smart-Ring-60-Binder-Notebook-B5-26-Rings-Clear/pd/25851">notebook</a> and <a href="https://obsidian.md">Obsidian</a></p><ul><li><p>It helps me always know what is important and stay on track with my goals.</p></li><li><p>After working through an immediate issue it helps me find my way back to the previously incomplete task more easily, i.e., context switching.</p></li><li><p>It helps me keep a paper trail so that I can periodically reflect on my work and evaluate if my processes need to change or if I need assistance from my team/manager.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!18cV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!18cV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 424w, https://substackcdn.com/image/fetch/$s_!18cV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 848w, https://substackcdn.com/image/fetch/$s_!18cV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 1272w, https://substackcdn.com/image/fetch/$s_!18cV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!18cV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png" width="616" height="462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:616,&quot;bytes&quot;:3946791,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!18cV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 424w, https://substackcdn.com/image/fetch/$s_!18cV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 848w, https://substackcdn.com/image/fetch/$s_!18cV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 1272w, https://substackcdn.com/image/fetch/$s_!18cV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f462ad-f67d-46cc-83e0-6e9549bd146d_4032x3024.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Post-its on top of planned work signify context switch; Urgent interruptions which require my immediate attention. Once done, post-its pop off and I get back to previously unfinished task.</figcaption></figure></div><h2>Trusting mind</h2><p>I've discovered that picking up a pen and writing down my thoughts is the best antidote to a noisy mind. This is the form of scaffolding my inner voice needs to be reassured that some action will be taken to address its concerns.</p><p>I usually find the need for a brain dump either in the morning when I arrive at work, or right before bed. I use post-its so that this list can travel with me.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IKBO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IKBO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IKBO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IKBO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IKBO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IKBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg" width="588" height="441.40384615384613" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1093,&quot;width&quot;:1456,&quot;resizeWidth&quot;:588,&quot;bytes&quot;:754855,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!IKBO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IKBO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IKBO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IKBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc0a2fe-d1d1-4b92-9a6a-c5218f98e1ed_2100x1576.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">This post-it, of my morning brain dump, spent the day stuck to the monitor and hitched a ride back home while stuck to the laptop</figcaption></figure></div><h2>Victory</h2><p>When I check off a commitment from my habit tracker hanging on the pantry/fridge door or my Obsidian task list, I feel a sense of victory. Because it confirms my ability to be disciplined and not be captured by the waywardness of daily life.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rrTm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rrTm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 424w, https://substackcdn.com/image/fetch/$s_!rrTm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 848w, https://substackcdn.com/image/fetch/$s_!rrTm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 1272w, https://substackcdn.com/image/fetch/$s_!rrTm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rrTm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png" width="522" height="526.3021978021978" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1468,&quot;width&quot;:1456,&quot;resizeWidth&quot;:522,&quot;bytes&quot;:1071655,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rrTm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 424w, https://substackcdn.com/image/fetch/$s_!rrTm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 848w, https://substackcdn.com/image/fetch/$s_!rrTm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 1272w, https://substackcdn.com/image/fetch/$s_!rrTm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b10e0ea-c0ba-4808-85ac-d6d37b53e558_3000x3024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">My master task list on Obsidian. Blue check marks indicates completed tasks. In a future post I&#8217;ll talk about how I use Obisian+Notebook+PostIts to manage my work.</figcaption></figure></div><p></p><div><hr></div><p>Image Credit: <a href="https://www.reddit.com/r/Naruto/comments/16pk3s1/why_did_konan_not_tell_naruto_about_the_akatsuki/">Reddit</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Chip Log! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Simple secret to writing world class documentation]]></title><description><![CDATA[There are usually three aspects of good documentation: completeness, accuracy and clarity.]]></description><link>https://www.chiplog.io/p/simple-secret-to-writing-world-class</link><guid isPermaLink="false">https://www.chiplog.io/p/simple-secret-to-writing-world-class</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sun, 18 Aug 2024 06:05:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/acf727a5-1672-454f-abf3-2b93f8de2362_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p><em>There are usually three aspects of good documentation: completeness, accuracy and clarity.</em></p><p><strong><a href="https://amzn.to/3YJMp4Y">Software Engineering at Google</a>, Tom Manshreck [Chapter 10]</strong></p></blockquote><h2>The problem</h2><p>Frustration with documentation usually falls into four tiers</p><ul><li><p><strong>Tier 0:</strong> <em>Where is the documentation? </em></p><p>There is no documentation available, therefore you must rely on inspecting code or consulting with someone to figure things out.</p></li><li><p><strong>Tier 1:</strong> <em>The Cheat Sheet</em></p><p>Sparse, incomplete documentation, primarily written by the engineer for themselves.</p></li><li><p><strong>Tier 2:</strong> <em>Rust</em></p><p>Documents are present, however they have not been properly maintained or updated. Dangerous, may be misleading, and cause a loss of productivity.</p></li><li><p><strong>Tier 3:</strong> <em>The 500-page monolith</em></p><p>Documentation is all there but it is unorganized, difficult to parse and understand.</p></li></ul><h2>Why you should strive for good documentation</h2><ul><li><p>This is the first line of defense. The document review is where the team gets involved and provides feedback. This is the best opportunity for you get your design vetted. The better your document is, the more your team will comprehend it and help you identify flaws.</p></li><li><p>Your opportunity to truly showcase your work and demonstrate your clarity of thought. Good documentation demonstrates that you are a diligent engineer who pays attention to detail and can be counted on to own the block. It is an excellent indicator of the quality of your code.</p></li><li><p>There is a self-reflective aspect to writing documentation. This is where you put pen to paper and distill your ideas into meaningful paragraphs. The process of writing has helped me weed out issues and reevaluate my decisions.</p></li><li><p>Lastly, <em>CYA</em> &#8212; Fabricating a chip is expensive. You must maintain proper bookkeeping over the course of your deliverables and be able to explain why particular design or verification decisions were made. Were they reviewed? Who signed off on it? In short, a good document offers you insurance of sorts.</p></li></ul><h2>Potential solutions</h2><h3>Tiers 0, 1 &amp; 2</h3><p>The good news is that <strong>tiers 0, 1 and 2</strong> can be easily solved by putting a process in place. The manager or project lead can then ensure that good documentation practices are upheld. Two eventualities that need to be addressed are:</p><ul><li><p>Engineers who believe <em>their time is better spent doing other things</em> should be nudged and reminded that the task is only complete if the documentation is completed.</p></li><li><p>For engineers that are motivated to write great documentation <em>but lack the time</em>, the manager should set aside time and allow them to accomplish the work to a high standard.</p></li></ul><h3>Tier 3</h3><p>Designs these days are complex, and specifications will be lengthy. There is no way around it, however here are some things to consider with regard to document organization and style of writing.</p><h4>I. Know your audience</h4><p>In semiconductor engineering, you can safely assume that people of <em>at least</em> three different nationalities will read your documentation. I&#8217;m a South Asian, and while English is my first language, my proficiency is a distant second to a native English speaker from the West. Add to that, people from different countries come with their own special flavor of english, not only in terms of accent but also in how we phrase our sentences to communicate technical ideas.</p><pre><code>So, one of the secrets to good documentation in the hardware industry is, <strong>know your audience</strong>. Considering this one fact alone will bring you much appreciation from your team.</code></pre><h4>II. Understand how your document is used</h4><p>Understanding what YOU look for in a document, will in turn help you become a better writer. I find myself using documents in two modes:</p><p><em><strong>As a tutorial:</strong></em> </p><ul><li><p>When I&#8217;m assigned a block of work, I need to deeply understand how it works, so that I can accomplish my obligations.</p></li><li><p>I rely on the document to be complete, with clear diagrams and necessary hyperlinks to other resources which I might need.</p></li></ul><p><em><strong>As a reference:</strong></em> </p><ul><li><p>In the course of my work, I often revisit sections of the specification to clarify my understanding of a feature, or quickly extract information. </p></li><li><p>For this, I appreciate if key information is represented as a table, a flow chart, or pseudo-code instead of a wall of text.</p></li></ul><h4>III. Refactoring</h4><ul><li><p>Documentation, like code, needs to be refactored to meet production quality standards. </p></li><li><p>If you observe that numerous coworkers are asking similar questions, it is likely that something in the document needs to be explained differently.</p></li><li><p>After writing a few paragraphs, I assess my writing in terms of effective bandwidth. The thing about writers I admire (and quite frankly, envy) such as <a href="https://www.noahpinion.blog">Noahpinion</a> and <a href="https://newsletter.weskao.com">Wes Kao</a>, is that they have a large vocabulary to dip into, which I lack. They can effectively explain ideas in a single sentence because of the language they utilize and how creatively they construct their sentences. I can't achieve what they do with just words, so I make significant use of diagrams and visualizations. As they say, a picture is worth a thousand words.</p></li></ul><div><hr></div><p>That&#8217;s it for this episode, stay tuned for the next one.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Chip Log! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>Opening Picture: A scene from <a href="https://en.wikipedia.org/wiki/Death_Note">Death Note</a></p>]]></content:encoded></item><item><title><![CDATA[Recipe for engineers to reach their potential]]></title><description><![CDATA[This post was inspired by a snippet from Tony Fadell&#8217;s book &#8212; Build.]]></description><link>https://www.chiplog.io/p/recipe-for-engineers-to-reach-their</link><guid isPermaLink="false">https://www.chiplog.io/p/recipe-for-engineers-to-reach-their</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Wed, 07 Aug 2024 12:36:35 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/80228d66-7dc4-4434-b705-002d99a9fdf2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post was inspired by a snippet from <a href="https://amzn.to/4flRu9q">Tony Fadell&#8217;s book &#8212; Build</a>.</p><blockquote><p><em>Over the last 30+ years I&#8217;ve seen what humans need to reach their potential, to disrupt what needs disrupting, to forge their own unorthodox path.</em></p><p><strong>Build, Tony Fadell</strong></p></blockquote><h2>The Problem</h2><p>I&#8217;ve <a href="https://www.systemverilog.io/engineering/how-to-conduct-a-technical-interview/">interviewed</a> several engineers with 5 to 15 years of experience. The majority of them were excellent, but many, in my estimation, were yet to realize their full potential. They are smart, communicate clearly, and demonstrate the necessary <strong>aptitude</strong> and the <strong>will</strong> to tackle larger engineering problems. Unfortunately, their work experience hasn&#8217;t afforded them the opportunity to rise to their true potential.</p><h3>The Baker&#8217;s Paradox</h3><p>In some ways, corporate work is equivalent to waking up every day and making a hundred loaves of bread. Your baking must be uniform, resulting in similar-tasting bread of consistent shape, size, and quality.</p><p>While at school, you are taught to make bread but you haven&#8217;t had that many reps yet. You start working for a company and manufacture bread day after day, all day. You become skilled at creating one type of bread, your manager will regard you as a reliable employee, and you may even earn the title of <em>Star Baker</em> because the bread you produce is good and consistent.</p><p>However, you are still far from your true potential. You can only make one (or a few) kinds of bread exceptionally well. With time, you become complacent. They pay well, you receive regular promotions, you may even mentor/manage a few junior bakers, who will look up to you. After 10-15 years you are regarded a competant baker, but you haven&#8217;t <em>really</em> learnt any new skills in a long time. </p><p>In reality, you have progressed from a </p><p><strong>Beginner &#8594; Advanced Beginner</strong></p><p>and not from </p><p><strong>Beginner &#8594; Expert</strong></p><h3>The Self-Learning Void</h3><p>Another problem is the difficulty in self-learning. Semiconductor engineering does not lend itself well to weekend experimentation. It&#8217;s not like software engineering, where a hobby can be turned into a <strong>M</strong>inimum <strong>V</strong>iable SaaS <strong>P</strong>roduct or a Web App for little to no money before being taken to the market and evaluated for economically viable.</p><p>Cadence, Synopsys, and Siemens sell pricey chip-making tools. <a href="https://github.com/mattvenn/awesome-opensource-asic-resources/blob/main/README.md">Open Source tools</a> and the ability to tape-out small projects are in the works, but we&#8217;re still getting there. The <em>Maker</em> world with <a href="https://nandland.com/">FPGAs</a>, Arduinos, Raspberry Pi, sensors and breadboards, while exciting, are an insufficient proxy.</p><p>Another issue is, unlike in the field of software, in hardware engineering information is not readily shared. Companies are secretive and apprehensive about giving any edge to their competitors. There are a lot fewer blogs, YouTube channels, and activity on Stack Overflow is sparse. </p><p>The side-effect of a non-vibrant community is, engineers mostly learn on the job from their peers and mentors, and <em>end up specializing only in that specific narrow silo that they work in</em>. </p><pre><code>Giving credit where it is due, there are a number of incredible <a href="https://www.semianalysis.com/">semiconductor</a> and <a href="https://thechipletter.substack.com/">computer engineering substacks</a>. However, compared to the software industry, we as a community still have a long way to go.</code></pre><h2>Potential Solutions</h2><p>The recipe to reach your true potential needs 4 ingredients &#8212;</p><pre><code><strong>1.</strong> <strong>Workshop:</strong> A place to experiment, learn and make mistakes at creating new recipes.

<strong>2.</strong> <strong>Inspiration:</strong> A mentor, someone you can ask questions to. Someone who can diagnose the issue with your sourdough by looking at it. Someone whom you aspire to become one day.

<strong>3.</strong> <strong>Community</strong>: 
   <strong>+</strong> Expose yourself to things you would not have considered. The chance to be amazed and wonder, "Holy sh*t you can really do that?" 
   <strong>+</strong> A bunch of equally passionate learners. A study group, just like in university, where you can discuss new recipes and exchange ideas.

<strong>4. Stage:</strong> An opportunity to showcase, present, and receive feedback on your work. </code></pre><p>Now, how do you apply these 4 ingredients to your own development?</p><ul><li><p><strong>Workshop</strong></p><ul><li><p>Work is not the place to experiment. Semiconductor companies are risk-averse, for good reason. You are expected to use tried-and-true methods.</p></li><li><p>Today&#8217;s chips are complex, and the majority of ASIC/SoC engineers work on a sub-module of the chip. However, the code base and your teammates are at your disposal. Dig through other people&#8217;s code, study documentation, and ask questions during lunches, walks or any opportunity you get. Ask them why they made certain architectural decisions. Why was this memory type selected over the other? Why was this hash algorithm selected? Even if you only work on a portion of the chip, you must develop a sufficient understanding of the whole chip through this type of investigation.</p></li><li><p>Learn to compile and simulate other pieces of design you aren&#8217;t working on. You can of course use company resources and tools for this. Your company stands to benefit from your increased expertise.</p></li><li><p>If you&#8217;ve always inherited large code bases, learn to build toy designs from first principles. You may not be able to use company resources and tools for this, but I&#8217;ve found <a href="https://www.edaplayground.com/">EDA Playground</a> and my own <a href="https://digilent.com/shop/zybo-z7-zynq-7000-arm-fpga-soc-development-board/">Xilinx FPGA</a> setup at home very handy for this type of exploration.</p></li></ul></li><li><p><strong>Inspiration</strong></p><ul><li><p>Actively seek out a mentor, whether within or outside the company. Someone with whom you feel comfortable sharing your experiments or discussing ideas that excite you. Someone whose code you enjoy reading and whose concepts inspire you.</p></li><li><p>Don&#8217;t limit yourself to books and blogs about hardware engineering. The world of software engineering is a tremendoes source of inspiration, and you&#8217;ll be astonished at how useful concepts learned in software engineering may be applied to chip development. </p></li><li><p><a href="https://amzn.to/3YvcPHy">Refactoring</a> and <a href="https://amzn.to/3YzAGpC">The Pragmatic Programmer</a> are two of my all time favorite books.</p></li></ul></li><li><p><strong>Community</strong></p><ul><li><p>In 2012, one of my software engineering friends pulled me to PyCon, the Python conference. As a hardware engineer, I had no business being there. But the conference was just down the road from where I worked, so I tagged along with this friend. That one weekend forever changed the way I think about and write code. It exposed me to a whole new world of engineering, and even now 12 years later I&#8217;m still delving down rabbit holes that I started after that conference. This conference inspired me to build a set of skills that are uncommon among Hardware Engineers.</p></li><li><p>Conferences like <a href="https://dvcon.org/">DVCon</a> and <a href="https://www.synopsys.com/community/snug.html">SNUG</a> are the BEST places to learn what engineers at other companies have discovered. It is the best learning resources for semiconductor engineers. At every chance, ask your company to sponsor your attendance at these conferences.</p></li></ul></li><li><p><strong>Stage</strong></p><ul><li><p>Presenting your work is important. It forces you to take a pause from all of your learning, reflect on it, and distill your ideas into coherent paragraphs and diagrams.</p></li><li><p>At least once in your career, you must present a paper at one of the conferences mentioned above.</p></li><li><p>This process will teach you some really good skills; you have to step out of your comfort zone, write a proposal, and perhaps get rejected. And if you are accepted, fight those nerves, overcome the impostor syndrome, and get up on that stage.</p></li><li><p>Start a blog or substack, its never been more easier to do this.</p></li></ul></li></ul><p>That&#8217;s all for this edition of Chip Log. If you enjoyed what you read, share it with your friends and colleagues. Subscribe to be notified of the next post.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.chiplog.io/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p>Image credit: <a href="https://screenrant.com/naruto-9-nine-tails-form-mode/">Screen Rant</a></p>]]></content:encoded></item><item><title><![CDATA[Deriving pleasure out of mundane every day code]]></title><description><![CDATA[While solving a tough problem or writing an involved piece of code, my mind is entirely occupied in solving that problem.]]></description><link>https://www.chiplog.io/p/deriving-pleasure-out-of-mundane</link><guid isPermaLink="false">https://www.chiplog.io/p/deriving-pleasure-out-of-mundane</guid><dc:creator><![CDATA[Subbu]]></dc:creator><pubDate>Sat, 27 Jul 2024 05:48:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FQkn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FQkn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FQkn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FQkn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FQkn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FQkn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FQkn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg" width="610" height="343.43" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1000,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FQkn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FQkn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FQkn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FQkn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6626f6c2-8aa5-4b70-b387-45c8d5ebdabe_1000x563.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While solving a tough problem or writing an involved piece of code, my mind is entirely occupied in solving that problem. But, at least half the time I&#8217;m writing mundane code. Just implementing your average features, going through the motions and churning out blocks of code. During this time my mind <strong>wanders</strong> &#8212; often anxious about upcoming meetings at work, or over-analyzing previous conversations, or worrying about deadlines. My mind is not at peace.</p><p>This type of <strong>wandering</strong> is annoying. It makes me impatient. I can&#8217;t wait to get done with this immediate work in front of me. I&#8217;m in <strong>rush</strong>, enveloped by this feeling that this thing, this feature, is wasting my time. I have more important things to get to, sh*t the deadline is soon approaching ... </p><p>But now I recognize this mundane-ness while it is in progress and I remind myself to enjoy it. Turn that anxiety into a time to relax my mind. This work needs to get done too right? And it cannot be rushed anyway. <strong>Rushing</strong> is how mistakes are made and this simple mundane code could come back to bite me.</p><p><em>Music is my drug to facilitate enjoying mundane code, and may be a good cup of decaf to go along with it.</em></p><div id="youtube2-jfKfPfyJRdk" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;jfKfPfyJRdk&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/jfKfPfyJRdk?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.chiplog.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Chip Log! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>