version 1.0, May 2018

# General Description

Implementing optimized modular multipliers on FPGAs using DSP slices is a complex task for cryptographic designers. Several algorithms and architectures have been proposed in the scientific literature.

Hyper-threaded modular multipliers (HTMMs) are proposed as fast, but small, very efficient $\mathrm{GF}(P)$ multipliers for elliptic curve cryptography (ECC) and hyper-elliptic curve cryptography (HECC) (see paper in reference for details and documentation).

Hyper-threading is used to efficiently compute several independent multiplications in the same time. Multiple logical multipliers (LMs) are available in the same physical unit. They share the same resources (DSP slices) without “bubbles” in the pipeline. In a HTMM, all clock cycles are used to compute the result(s) as depicted in the figure below. In this figure, the behavior for 3 LMs and 2 words per field element is illustrated.

For flexibility purpose, HTMMs support generic primes (i.e. $P$ has an arbitrary and dense binary representation) which can be changed at run-time.

This website presents a tool, distributed as open source software, for the generation of HTMMs on several types of FPGAs for a large range of input parameters for efficient design space exploration.

# Reference

The motivations, analysis of the state of the art, proposed HTMM algorithm and architecture are detailed in the paper:

Generation of Finely-Threaded GF(P) Multipliers for Flexible Curve based Cryptography on FPGAs.
by Gabriel Gallin and Arnaud Tisserand.
IEEE Transactions on Computers, vol. 68, n. 11, pp. 1612-1622, Nov. 2019.
DOI: 10.1109/TC.2019.2920352
PDF access

Please cite this paper as the main information source for HTMM generator.

# HTMM generator

HTMM generator is a set of Python (2.7) programs and bash scripts working as a command line interface.

## Generator input

The HTMM specification provided by the user includes:

• the width of the finite field elements (e.g. 128, 256 bits);
• the number of logical multipliers per physical one (see paper in reference);
• the operand and internal values decomposition to efficiently fulfill
• the target FPGA (see current limitations below);
• possible optimizations (see paper in reference);:
• fast vs small operator;
• BRAM vs DRAM for operands memories;
• latency reduction;

## Generator output

The generated output includes:

• the VHDL sources of the target HTMM;
• simulation scripts (based on Sage mathematical software);
• implementation scripts (see current limitations below).

## Current limitations

HTMM generator version 1.0 only supports the Virtex-4, 5, 7, and Spartan-6 FPGAs as well as ISE (14.7) CAD tools all from Xilinx. However, users can easily extend HTMM generator to other FPGAs and CAD tools in the scripts.

Link to the generator: archive file HTMM_generator.tar.gz in the Files tab/pane of the HTMM repository.

CeCILL-B

## Usage

The main script is: htmm_generator.sh in the root directory.

Each target HTMM is specified in a dedicated file provided by the user. If several specifications are provided, all the corresponding HTMMs will be generated (simulated and implemented for result analysis).

Specification examples are presented below.

## Proof of the latency reduction optimization

In the Files tab/pane of the HTMM repository, we provide a short proof (proof_latency_reduction.pdf) of the latency reduction optimization proposed in the paper reference.

# Results Examples

A set of 36 HTMMs specifications, with commonly used specifications, have been generated and implemented on several Xilinx FPGAs. The corresponding inputs and outputs are accessible as archive files in the Files tab/pane of the HTMM repository.

• Examples of HTMM specifications (HTMM_CONFIGS.tar.gz);

• Generated results for those specifications:

• HTMM multipliers on Virtex-4 (HTMM_PROJECTS_V4.tar.gz);
• HTMM multipliers on Virtex-5 (HTMM_PROJECTS_V5.tar.gz);
• HTMM multipliers on Virtex-7 (HTMM_PROJECTS_V7.tar.gz);
• HTMM multipliers on Spartan-6 (HTMM_PROJECTS_S6.tar.gz);
• FPGA implementation results for all those HTMMs on the 4 FPGAs (results_htmms.html).

# Validation

The complete validation scheme is detailed in the paper. We also provide in the HTMM repository a Sage script which generates some detailed traces of the algorithm behavior with the latency optimization (file: algo_trace.sage).

# Acknowledgments

This work was done in the HAH project partially funded by Labex CominLab, Labex Lebesgue and Brittany Region. We sincerely thank Xilinx for University Program donations.