What can you do with FHE?

Here are five demonstrations of programs that can be built in FHE. The first two, Minesweeper and Conway's Game of Life, are toy examples built for fun rather than realistic applications. The last three, regression, image classification, and language model inference, are genuine use cases whose purpose is to keep data private. Every clip below runs in real time, reflecting the actual runtime of each FHE program under secure parameters. An explanation of how these demos work behind the scenes is given in section 2.4.

Minesweeper

To start a Minesweeper game, the client chooses the board size and the number of mines, both in plaintext, and the server generates a random board that it keeps hidden from the client. The client plays by sending its moves to the server encrypted, and the server replies with the encrypted neighbor mine counts of the revealed cells, which the client decrypts to read. If any revealed cell is a mine, the reply comes back as a corrupted ciphertext instead of valid counts.

Because of the privacy-preserving nature of FHE, the server cannot tell whether a move hit a mine, so it has no way to end the game. Recognizing a loss falls to the client: a single mine permanently corrupts the encrypted state, so once a reply comes back corrupted the client must start a new game. A win is detected the same way, once the number of still-unrevealed cells equals the number of mines it asked for, the client knows they won the game.

Conway's Game of Life

In the Conway's Game of Life demo, the server keeps an encrypted 128×128 Game of Life board. Every 70 ms it advances the board by one generation, computed entirely on the ciphertext, then sends the new board to the client to decrypt and display. The client therefore sees the board evolve continuously and in real time, while the server never learns which cells are alive.

The client can also shape the simulation while it runs. At any moment the client can draw a 128×128 pattern of its own, encrypt the pattern, and send the encryption to the server, which merges the pattern into the live board by taking the encrypted union of the two. New patterns can be dropped in at any time without pausing the evolution.

Regression

The regression demo fits a regression model, either linear or logistic, over encrypted data contributed by many different people, without any of them revealing their data.

Before encryption, each feature is standardized using a public mean and standard deviation, supplied as the expected means and expected standard deviations. These are not secret: they only need to be reasonable estimates of each feature's true mean and spread across the dataset. For the best accuracy they can themselves be computed under FHE and then decrypted, so that even these summary statistics are obtained without exposing any individual's data.

The model is fit by gradient descent. Writing $X$ for the standardized feature matrix, $y$ for the target vector, $\beta$ for the coefficients, and $n$ for the number of samples, each iteration applies the update

$$\beta \;\leftarrow\; \beta - \eta\left(\tfrac{1}{n}\,X^\top(\hat y - y) + \lambda\,\beta\right),$$

where $\hat y = X\beta$ for linear regression and $\hat y = \sigma(X\beta)$ for logistic regression. The learning rate $\eta$ controls the step size, the ridge parameter $\lambda$ penalizes large coefficients, and the update is repeated for a chosen number of iterations, all homomorphically on the encrypted data.

For logistic regression, the sigmoid $\sigma$ is not a polynomial, so CKKS cannot evaluate it directly. A cubic approximation is good enough: on the interval $[-8, 8]$ the sigmoid is closely matched by

$$\sigma(x) \;\approx\; 0.5 + 0.15012\,x - 0.001593\,x^3,$$

and this polynomial stands in for $\sigma$ at every step of training.

In a deployment setting, the participants first run a multiparty protocol that produces a single shared public key. Each of them encrypts their own data under that key and uploads the ciphertext, the server runs the regression homomorphically over the combined encrypted data, and at the end the participants coordinate to decrypt only the final result: the fitted coefficients of the model.

This is useful whenever such data is sensitive, for example:

predicting disease progression from patient metrics,
predicting a credit score from someone's financial profile,
predicting an insurance charge from customer attributes, or
predicting the presence of a medical condition.

In practice, though, the multiparty key generation is rarely carried out by the participants themselves. Individuals hand their raw health or financial data to a local institution they trust, such as a hospital or bank, and it is those institutions that run the multiparty key generation among themselves.

Image classification

For the image classification demo, the model must be trained ahead of time on ordinary, unencrypted data, and only inference runs under encryption. The specific demo in the clip above classifies handwritten digits from MNIST: the client encrypts a 28×28 grayscale image and sends it to the server, which evaluates the whole network on the ciphertext and produces the encrypted class logits. The client decrypts those and takes the largest as the predicted digit.

The network has about 17,000 parameters and is the following stack of layers:

Step	Output shape	Notes
Input image	$28 \times 28 \times 1$	MNIST grayscale image
Pad and encrypt	$32 \times 32 \times 1$	Zero-padded and packed into ciphertext slots
Convolution 1	$32 \times 32 \times 4$	$3 \times 3$, one input channel to four output channels
`tanh_approx`	$32 \times 32 \times 4$	Slotwise polynomial activation
Convolution 2	$32 \times 32 \times 8$	$3 \times 3$, four input channels to eight output channels
`tanh_approx`	$32 \times 32 \times 8$	Slotwise polynomial activation
Average pool	$8 \times 8 \times 8$	$4 \times 4$ pooling, leaving $512$ values
Repack	$512$	Rearranges pooled values for dense layers
Dense 1	$32$	Hidden layer
`tanh_approx`	$32$	Slotwise polynomial activation
Dense 2	$10$	One logit per digit
Decrypt and argmax	$10 \to 1$	Client reads the predicted digit

The activation tanh_approx is a polynomial approximation of $\tanh$, namely $0.785x - 0.056x^3$.

This arrangement protects both parties: the server holds the weights and performs the classification but never sees the client's image, while the client obtains the prediction but never learns the weights.

Language model inference

For the language model demo, the model is trained on unencrypted data, and at inference time it takes an encrypted text string as input. The tokenizer and the embedding matrix remain unencrypted, so the prompt is tokenized and embedded into vectors in the clear; everything from there on is encrypted.

In the specific demo in the clip above, the model is a 4-layer, GPT-2-style decoder-only transformer with embedding dimension $256$, vocabulary size $16384$, context length $128$, and $2$ attention heads of dimension $128$ each, for a total of $7{,}372{,}816$ parameters (the token embedding matrix is weight-tied with the LM head, so it is counted once). It is norm-free: rather than layer normalization, each sub-layer is gated by a learnable scalar that starts at zero (ReZero), so a block of the network computes

$$x \leftarrow x + \alpha_{\text{attn}}\,\text{Attn}(x), \qquad x \leftarrow x + \alpha_{\text{ffn}}\,\text{FFN}(x).$$

Writing $E$ for the token embedding matrix, the input to the stack is the sum of token and learned positional embeddings, $h_0 = E[t] + \text{PosEmb}(p)$. Each attention sub-layer uses bias-less projections $W_Q, W_K, W_V, W_O$ and, instead of a softmax, an elementwise sigmoid with a learnable per-head bias $b$ and a causal mask $M$:

$$Q = xW_Q,\quad K = xW_K,\quad V = xW_V, \qquad \text{Attn}(x) = \Big(\big(\sigma\big(\tfrac{QK^\top}{\sqrt{d}} + b\big)\odot M\big)\,V\Big)W_O.$$

Each feed-forward sub-layer is a bias-less two-layer MLP with hidden dimension $4 \times 256 = 1024$:

$$\text{FFN}(x) = \text{GELU}(xW_1)\,W_2.$$

Finally the LM head reuses the embedding matrix (weight tying) to produce encrypted logits over the vocabulary, $\text{logits} = h_N E^\top$, from which the next token is sampled and appended to the context.

As in the image classification demo, the server holds the weights and runs the model but never sees the client's query, while the client receives its generated text but never learns the weights.