Skip to main content
Riya Bisht

Hardcoded Inference on FPGA

A Short Primer on My Ongoing Research at Vicharak #

During my R&D internship at Vicharak, I had the opportunity to dive deep into a fascinating project that reimagines how Deep Neural Networks (DNNs) can be deployed efficiently on FPGAs — with a special focus on low-bit quantized models and hardcoded inference flows.

DNNs have revolutionized fields like computer vision and natural language processing, but deploying them on resource-constrained edge devices remains a challenge.
GPUs and NPUs, though powerful, suffer from the Von Neumann bottleneck — the cost of frequently moving data between memory and compute units.
This becomes especially inefficient for low-bit models like Binary Neural Networks (BNNs), where memory access can negate the advantages of quantization in terms of performance and energy savings.

Key Challenge #

To design a system that completely eliminates memory access during inference by embedding the model directly into the FPGA’s logic fabric.


The Core Idea: Dataflow-Style Inference on FPGAs #

FPGAs (Field-Programmable Gate Arrays) are reconfigurable chips that allow defining custom data paths using logic gates and Look-Up Tables (LUTs).
Unlike CPUs or GPUs, FPGAs can execute logic operations in parallel without fetching instructions, making them perfect for static, hardwired models.

We experimented with an architecture that:


Why This Matters #

This work demonstrates that ultra-efficient inference is possible without relying on GPUs or large memory systems — using only:

It paves the way for deploying AI in:


Project setup #

Vaaman (FPGA board), FTDI(USB TO UART Convertor), JTAG Debugger/Programmar

project setup

For a brief overview, watch "Training Logic Gate Networks" and go through my Bachelor's Thesis Presentation

References #