Blog

Marvell Blog

Archive for the 'Ethernet Switching' Category

  • January 09, 2025

    Custom HBM: What Is It and Why It’s the Future

    By Michael Kanellos, Head of Influencer Relations, Marvell

    How do you get more data to the processor faster?

    That has been the central question for computing architects and chip designers since the dawn of the computer age. And it’s taken on even greater urgency with AI. The greater amount of data a processor can access, the more accurate and nuanced the answers will be from the algorithm. Adding more memory, however, can also add cost, latency, and power. 

    Marvell has pioneered an architecture for custom high-bandwidth memory (HBM) solutions for AI accelerators (XPUs) and will collaborate with Samsung, Micron and SK hynix to bring tailored memory solutions to market. (See comments from Micron, Samsung, SK hynix and Marvell here in the release.)

    Customizing the HBM element of XPUs can, among other benefits, increase the amount of memory inside XPUs by 33%, reduce the power consumed by the memory I/O interfaces by over 70%, and free up to 25% of silicon area to add more compute logic, depending on the XPU design1.

    The shift—part of the overall trend toward custom XPUs--will have a fundamental and far-reaching impact on the performance, power consumption and design of XPUs. Invented in 2013, HBM consists of vertical stacks of high-speed DRAM sitting on a chip called the HBM base die that controls the I/O interfaces and manages the system. The base die and DRAM chips are connected by metal bumps. 

    Vertical stacking has effectively allowed chip designers to increase the amount of memory close to the processor for better performance. A scant few years ago, cutting-edge accelerators contained 80GB of HBM2. Next year, the high-water mark will reach 288GB. 

    Still, the desire for more memory will continue, putting pressure on designers to economize on space, power and cost. HBM currently can account for 25% of the available real estate inside an XPU and 40% of the total cost3. HBM4, the current cutting-edge standard, features an I/O that consists of 32 64-bit channels - an immense size that is already making some aspects of chip packaging extremely complex. 

    All About Optimizing XPU TCO

    The Marvell custom HBM compute architecture involves optimizing the base HBM die and its interfaces, currently designed around standards from JEDEC, with solutions uniquely designed to dovetail with the design, characteristics and performance objectives of the host AI compute die. 

    Imagine that a hyperscaler wants an AI inference XPU for edge data centers squeezed into dense business districts or urban corridors. Cost and power consumption will be at a premium while absolute compute performance will likely be less important. A custom HBM solution might involve reducing the size of the AI compute die to economize on chip size and power above other considerations. 

    At the other end of the spectrum, an HBM subsystem for XPUs powering a massive AI training cluster might be tuned for capacity and high bandwidth. In this situation, the emphasis could be on reducing the size of the I/O interface. Reducing I/O size creates space for more interfaces on the so-called beachfront at the side of a chip and hence, boosting total bandwidth. 

  • December 10, 2024

    Custom HBM: What Is It and Why It’s the Future

    By Michael Kanellos, Head of Influencer Relations, Marvell

    How do you get more data to the processor faster?

    That has been the central question for computing architects and chip designers since the dawn of the computer age. And it’s taken on even greater urgency with AI. The greater amount of data a processor can access, the more accurate and nuanced the answers will be from the algorithm. Adding more memory, however, can also add cost, latency, and power. 

    Marvell has pioneered an architecture for custom high-bandwidth memory (HBM) solutions for AI accelerators (XPUs) and will collaborate with Samsung, Micron and SK hynix to bring tailored memory solutions to market. (See comments from Micron, Samsung, SK hynix and Marvell here in the release.)

    Customizing the HBM element of XPUs can, among other benefits, increase the amount of memory inside XPUs by 33%, reduce the power consumed by the memory I/O interfaces by over 70%, and free up to 25% of silicon area to add more compute logic, depending on the XPU design1.

    The shift—part of the overall trend toward custom XPUs--will have a fundamental and far-reaching impact on the performance, power consumption and design of XPUs. Invented in 2013, HBM consists of vertical stacks of high-speed DRAM sitting on a chip called the HBM base die that controls the I/O interfaces and manages the system. The base die and DRAM chips are connected by metal bumps. 

    Vertical stacking has effectively allowed chip designers to increase the amount of memory close to the processor for better performance. A scant few years ago, cutting-edge accelerators contained 80GB of HBM2. Next year, the high-water mark will reach 288GB. 

    Still, the desire for more memory will continue, putting pressure on designers to economize on space, power and cost. HBM currently can account for 25% of the available real estate inside an XPU and 40% of the total cost3. HBM4, the current cutting-edge standard, features an I/O that consists of 32 64-bit channels - an immense size that is already making some aspects of chip packaging extremely complex. 

    All About Optimizing XPU TCO

    The Marvell custom HBM compute architecture involves optimizing the base HBM die and its interfaces, currently designed around standards from JEDEC, with solutions uniquely designed to dovetail with the design, characteristics and performance objectives of the host AI compute die. 

    Imagine that a hyperscaler wants an AI inference XPU for edge data centers squeezed into dense business districts or urban corridors. Cost and power consumption will be at a premium while absolute compute performance will likely be less important. A custom HBM solution might involve reducing the size of the AI compute die to economize on chip size and power above other considerations. 

    At the other end of the spectrum, an HBM subsystem for XPUs powering a massive AI training cluster might be tuned for capacity and high bandwidth. In this situation, the emphasis could be on reducing the size of the I/O interface. Reducing I/O size creates space for more interfaces on the so-called beachfront at the side of a chip and hence, boosting total bandwidth. 

  • November 06, 2024

    Encrypting Data in Use: How HSMs Can Advance Next-Generation Confidential Compute

    By Bill Hagerstrand, Director, Security Business, Marvell

    It’s time for more security coffee talk with Bill! I was first exposed to the term “Confidential Compute” back in 2016 when I started reading about this new radical technology from Intel. Intel first introduced us to this new idea back in 2015, with their 6th generation of CPUs called Skylake. The technology was named SGX (Software Guard Extensions), basically a set of instruction codes that allowed user-level or OS code to define a trusted execution environment (TEE) already built into Intel CPUs. The CPU would encrypt a portion of memory, called the enclave, and any data or code running in the enclave would be decrypted, in real-time or runtime, within the CPU. This provided added protections against any read access by other code running on the same system.

    The idea was to protect data in the elusive “data in use” phase. There are three stages of data: at rest, in motion or in use. Data at-rest can be encrypted with self-encrypting hard drives—pretty much standard these days—and most databases already support encryption. Data in-motion is already encrypted by TLS/SSL/HTTPS/IPSec encryption and authentication protocols.

    Data in use, however, is a fundamentally different challenge: for any application to make “use” of this data, it must be decrypted. Intel solved this by creating a TEE within the CPU that allows the application and unencrypted data to run securely and privately from any user or code running on the same computer. AMD provides a similar technology it calls AMD SEV (Secure Encrypted Virtualization). ARM, meanwhile, calls its solution Arm CCA (Confidential Compute Architecture). Metaphorically, a TEE is like the scene in courtroom dramas where the judge and the attorneys debate the admissibility of evidence in a private room. Decisions are made in private; the outside world gets to see the result, but not the reasoning, and if there’s a mistake, the underlying materials can still be examined later.

    LiquidSecurity2 enclave with app or Al algorithm

  • October 24, 2024

    Cloud-Managed Enterprise (CME) Switches Powered by SONiC

    By Gidi Navon, Senior Principal Architect, Marvell

    Open Networking is not a new concept. SONiC (the Software for Open Networking) has been around for some time, and cloud-managed campus switches based on proprietary NOS (Network Operating System), are also not new. What’s truly novel is the comprehensive solution that finally brings open-source SONiC to the campus networks and adds to it a layer of cloud management and zero trust provisioning, all running on a cost-optimized hardware platform specifically tailored to campus networks: “Cloud Managed Enterprise” or CME.

    In recent years, Open Networking offered the hyperscale operators the option to use open-source software on a variety of merchant silicon, providing freedom from the lockdown imposed by the big system vendors. The next challenge was to bring similar benefits to campus networks, particularly to what is now referred to as CME.

    This blog will demonstrate how the Marvell ® Prestera® switches, equipped with a comprehensive Software Development Kit, along with the collaborative efforts of a vibrant industry community called OpenLAN Switching (OLS), have created this open cloud-managed solution for campus networks.

     

    The power of a community

    The first thing to recognize is that it took teamwork from multiple companies to create this open solution. Under the umbrella of the Telecom Infra Project (TIP), various companies gathered and created a working group called OpenLAN. Within OpenLAN, two sub working groups formed: OpenWi-Fi and OpenLAN Switching (OLS), which is the subgroup relevant to this discussion.

    Numerous companies actively participated in the collaborative effort—Figure 1 mentions just some of them—organized according to their role in the solution.

    Breakdown of companies involved in OpenLAN Switching

    Figure 1: Breakdown of companies involved in OpenLAN Switching

  • June 11, 2024

    How AI Will Drive Cloud Switch Innovation

    This article is part five in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024. 

    AI has fundamentally changed the network switching landscape. AI requirements are driving foundational shifts in the industry roadmap, expanding the use cases for cloud switching semiconductors and creating opportunities to redefine the terrain.

    Here’s how AI will drive cloud switching innovation.

    A changing network requires a change in scale

    In a modern cloud data center, the compute servers are connected to themselves and the internet through a network of high-bandwidth switches. The approach is like that of the internet itself, allowing operators to build a network of any size while mixing and matching products from various vendors to create a network architecture specific to their needs.

    Such a high-bandwidth switching network is critical for AI applications, and a higher-performing network can lead to a more profitable deployment.

    However, expanding and extending the general-purpose cloud network to AI isn’t quite as simple as just adding more building blocks. In the world of general-purpose computing, a single workload or more can fit on a single server CPU. In contrast, AI’s large datasets don’t fit on a single processor, whether it’s a CPU, GPU or other accelerated compute device (XPU), making it necessary to distribute the workload across multiple processors. These accelerated processors must function as a single computing element. 

    AI calls for enhanced cloud switch architecture

    AI requires accelerated infrastructure to split workloads across many processors.

Archives