Bits, Bytes & Bricks: Heracles and the FHE Hardware Race — Computing on Data You Never Decrypt

Heracles and the FHE Hardware Race — Computing on Data You Never Decrypt

What Intel's new chip means for the future of privacy-preserving computation

I have been tracking fully homomorphic encryption for a while now — mostly as a theoretical boundary condition in my privacy-enhancing technologies research. The math has always been elegant. The performance has always been the problem. When a cryptographic operation takes tens of thousands of times longer than its plaintext equivalent, it lives in academic papers, not production systems.

That calculus is starting to shift. And the signal worth paying attention to right now is a chip called Heracles.

The Problem FHE Has Always Had

Let me frame the core issue for practitioners who haven't delved deeply into this.

Fully homomorphic encryption is, at its conceptual heart, a way to perform arbitrary computations on encrypted data without ever decrypting it. The server doing the computation never sees the plaintext. The result is returned encrypted, and only the party holding the key can read it. For anyone building systems that handle sensitive data — medical records, financial transactions, genomic information, private AI queries — this is the Holy Grail of privacy architecture. You get the computing power of the cloud without handing your data to the cloud in the clear.

The catch has always been performance. FHE is computationally brutal. The encrypted data grows by orders of magnitude compared to the original plaintext. The operations required — polynomial transforms, a noise-cancelling process called bootstrapping, and some genuinely odd-named operations like "twiddling" and "automorphism" — are deeply inefficient on general-purpose CPUs. A CPU can do it, but slowly, burning roughly 10,000 more clock cycles for integer operations than it would on unencrypted data. GPUs excel at parallel computation but sacrifice the precision FHE demands. Nobody has yet built hardware that's actually right-shaped for this workload.

Until now, possibly.

What Intel Just Demonstrated

Last month at the IEEE International Solid-State Circuits Conference in San Francisco, Intel demonstrated

Heracles — a purpose-built FHE accelerator that has been under development for 5 years as part of a DARPA program. The headline numbers are striking enough to take seriously.

Compared to a top-of-the-line Intel Xeon server CPU, Heracles achieved speedups ranging from 1,074 to 5,547 times across seven key FHE operations. On the specific benchmark Intel ran publicly — a private voter ballot verification query against an encrypted database — the Xeon took 15 milliseconds. Heracles did it in 14 microseconds. For a single query, that difference is imperceptible. At 100 million queries, you are looking at more than 17 days of CPU work versus 23 minutes on Heracles.

The demo itself is worth understanding because it illustrates exactly why FHE matters for real institutional use cases. A voter wants to confirm her ballot was recorded correctly. The government holds an encrypted database of voters and votes. Using FHE, the voter encrypts her own ID and ballot choice on her end and sends the encrypted query to the server. The server determines whether it matches the encrypted database using the encrypted query — without ever decrypting either. It returns an encrypted result. The voter decrypts it on her side. At no point does the government's computation infrastructure see either the voter's identity or her ballot in plaintext.

That is a meaningful security architecture. And until very recently, it was practically unusable at scale.

What Makes Heracles Different

Heracles is not a tweak to existing silicon. It is a ground-up rethinking of what an FHE workload actually needs.

At its physical core, the chip is built on Intel's most advanced 3-nanometer FinFET process — the same technology Intel uses for its best products — and measures roughly 200 square millimeters, about 20 times larger than competing FHE research chips. It is flanked in a liquid-cooled package by two 24-gigabyte high-bandwidth memory chips, a configuration you normally see only in AI training GPUs. That memory decision is telling: the data explosion problem in FHE is as much about bandwidth as it is about compute, and Intel has treated it accordingly, pairing 819 GB-per-second memory connections with 9.6 terabytes-per-second on-chip data movement.

The compute architecture centers on 64 SIMD cores — called tile-pairs — arranged in an 8x8 grid, connected by a 2D mesh network with 512-byte buses. These cores are purpose-built to run the polynomial arithmetic and transform operations required by FHE, performing them in parallel rather than serially. The chip runs three synchronized instruction streams simultaneously: one managing data into and off the processor, one managing internal data movement, and one running the arithmetic. This is the kind of design discipline that comes from five years of focused engineering on a single problem.

One architectural bet made early in the Heracles project deserves attention. The team chose to work in 32-bit arithmetic chunks rather than 64-bit, even though FHE requires much larger numbers. This seems counterintuitive — FHE demands precision on very large integers — but by breaking those large numbers into 32-bit pieces that can be computed independently, they gained significant parallelism. The 32-bit circuits are physically smaller, fit more of them on the die, and can run simultaneously. It was a risky design call that appears to have paid off.

The Competitive Landscape

Intel is not alone in this race, and the ecosystem developing around FHE hardware is worth watching closely.

Duality Technology, an FHE software firm whose CTO, Kurt Rohloff, described the Heracles results as "very good work," was part of a competing accelerator team in the same DARPA program. Duality's position is instructive: they are focused less on new hardware and more on software products for the kinds of encrypted queries Intel demonstrated. Rohloff's view is that at current scales, software is sufficient — specialized hardware becomes necessary as workloads shift toward deeper machine learning operations such as neural networks, LLMs, and semantic search.

Niobium Microsystems, a chip startup spun out of another DARPA competitor, is positioning itself as "the world's first commercially viable FHE accelerator." It recently announced a deal worth approximately $6.9 million with Seoul-based chip design firm Semifive to develop its FHE accelerator for fabrication on Samsung's 8-nanometer process. Intel has not yet announced commercial availability plans for Heracles, which gives Niobium an interesting window.

Other players — Fabric Cryptography, Cornami, and Optalysys — are building their own approaches. The most technically distinct is Optalysys, whose CEO Nick New argues that Heracles represents roughly the ceiling of what a fully digital approach can achieve. Optalysys is using photonic chips to perform FHE's compute-intensive transform steps using the physics of light rather than digital logic. Their photonic chip is on its seventh generation, and they are working toward a 3D-integrated commercial product — photonic chip for the transforms, custom silicon for the rest — potentially ready in two to three years. If that works, it would push performance well beyond what any digital accelerator can achieve.

What This Means for AI and Sensitive Data Workloads

Here is where this gets immediately relevant to the work I do at the intersection of AI governance and privacy architecture.

The scenarios that FHE hardware makes practical for the first time are exactly the ones that have been architecturally stuck. Federated learning, where contributors cannot trust the aggregation layer. Inference on private user data where neither the query nor the model weights should be visible to the infrastructure. Encrypted database search where even the server processing the query cannot see what was asked or what was found.

Duality's demonstration of an FHE-encrypted transformer model — a smaller-scale version of BERT — points toward the trajectory. Today it works on compact models. As hardware improves, the model size that FHE can accommodate in a reasonable time scales up with it. The end state, which feels meaningfully closer than it did twelve months ago, is AI inference that is provably private: the model provider cannot see your query, and you cannot extract the model weights. That is a different trust model than anything we have today with cloud AI APIs.

John Barrus at Niobium put it plainly: "There are a lot of smaller models that, even with FHE's data expansion, will run just fine on accelerated hardware." I believe him. And as someone who has spent the past year deep in agentic AI risk research, I find the prospect of AI agents operating on encrypted data without ever seeing plaintext personally significant. It changes what is possible in high-sensitivity deployment environments.

My Read on Where This Is Heading

Sanu Mathew, who leads security circuits research at Intel, described Heracles as "like the first microprocessor — the start of a whole journey." That framing is either marketing or genuine conviction, and in this case, I think it is the latter.

FHE has been a "maybe someday" technology for long enough that healthy skepticism is warranted. But the confluence of DARPA investment, Intel's 3nm engineering resources, multiple serious startups with real capital, and a photonics approach that could push past digital limits — this is not the same landscape as five years ago. The hardware is catching up to the math.

For practitioners building privacy architectures today, I would hold the following positions. Confidential computing with TEEs remains the most immediately deployable option for protecting data in use — its operational maturity and cloud provider support are in place, and the performance overhead is manageable. FHE hardware is the one to watch for workloads where you cannot trust the compute environment, even with hardware attestation, where the encryption must hold even against a fully compromised host. That scenario is rarer but more demanding, and it now has a credible hardware roadmap.

The combination of TEE-based confidential computing for near-term deployments and FHE hardware for the most trust-hostile environments represents, in my view, the serious privacy architecture stack for the next decade. Both are moving faster than most security teams realize.

Keep watching this space.

Bits, Bytes & Bricks

Sunday, March 22, 2026

Heracles and the FHE Hardware Race — Computing on Data You Never Decrypt

No comments:

Post a Comment

DarkSword and Zero Trust Reckoning