Close

Harideep Nair

Research Scientist, Meta

Computer Architecture, Machine/Deep Learning, Neuromorphic Computing

PhD Thesis - Cortical Columns Computing Systems: Microarchitecture Model, Functional Building Blocks, and Design Framework

View Resume View PhD Dissertation View Google Scholar

About Me

I am a Computer Architect passionate about the future of "intelligent" computing hardware. Current manifestation (althought not exactly "intelligent" like the brain) of such processors are Machine/Deep Learning Accelerators. However, I believe brain-inspired neuromorphic processors (early manifestation of silicon neocortex) are right around the corner. I received my PhD from ECE at Carnegie Mellon University advised by Prof. John Paul Shen (NCAL). As part of my research, I also worked closely with Prof. Jim Smith, Emeritus at UW-Madison. My main area of research focus is Neuromorphic Computer Architecture, wherein I am exploring new brain-inspired paradigms of computing. Currently, I am building on Prof. Smith's work on Temporal Neural Networks and Space-Time Algebra to design microarchitecture for implementing energy-efficient sensory processing units using standard CMOS technology. Besides, I also enjoy working on projects broadly related to Computer Architecture as well as Machine Learning.

During my PhD, I have had close collaboration with MediaTek where I completed a 2-year long continuous internship. At MediaTek, I have been fortunate to work on bleeding-edge Deep Learning Accelerator architecture, compiler stack, and performance modeling. At CMU, I have helped co-create and proliferate two graduate courses: Modern Computer Architecture and Design (18-740) and Neuromorphic Computer Architecture and Processor Design (18-743). I have 5+ years of teaching experience over 11 semesters across the two courses (Head TA for 10 semesters), delivering lectures, developing lab assignments, leading multiple teams of TAs and mentoring 35+ teams of graduate students on research projects.

I speak four languages fluently (English, Malayalam, Hindi, Marathi), have finished a basic course in Sanskrit (long time ago) and German, and am a beginner in Spanish and French. In my free time, I like to go on short as well as long drives around my home state of California. I also enjoy playing chess, badminton and cricket. I used to actively collect coins in high school - currently possess a foreign currency collection of coins (72 countries) and notes (15 countries).

Feel free to connect with me on LinkedIn.

Education

Carnegie Mellon University

Aug 2018 - Oct 2024

Ph.D. in Electrical and Computer Engineering

Key Courses: Foundations of Computer Systems (18-600), Machine Learning (10-601), Deep Learning (11-785), Hardware Architectures for Machine Learning (18-663), Neural Computation (15-686), Systems and Toolchains for AI engineers (18-813), Modern Computer Architecture and Design (18-740 | TA), Neuromorphic Computer Architecture (18-743 | TA)

Indian Institute of Technology (IIT) Bombay

Jul 2013 - Jul 2018

Dual Degree (B.Tech + M.Tech) in Electrical Engineering with Minor in Computer Science

Key Courses: Microprocessors, Advanced Computer Architecture, VLSI Design, Physics of Transistors, Operating Systems, Computer/Network Security, Statistics, Calculus, Linear Algebra, Quantum Physics

National University of Singapore

Aug 2016 - Dec 2016

Semester Exchange Program in Electrical and Computer Engineering

Key Courses: Computer Vision and Image Processing, Integrated Analog Design, Embedded Hardware System Design, Fuzzy/Neural Systems

Publications and Talks

Book Chapters

Conference and Journal Publications

Invited Talks

Honors and Awards

International Level

National Level

State Level

Institute Level

Experience

Oct 2024 – Present

Meta

Research Scientist, Machine Learning Hardware Architect

Working on next-gen architecture for ML accelerator chip targeting wearables.

Jan 2021 – Dec 2022

MediaTek Inc.

AI Computer Architecture Research Intern

Worked on architectural simulator and ISA finetuning for in-house AI accelerator within production mobile SoCs. Developed efficient microarchitecture designs for components within next-generation AI accelerator targeting future mobile SoCs. Implemented the designs in Verilog RTL, performed functional verification and further assisted with UVM verification.

May 2020 – Aug 2020

MediaTek Inc.

AI Computer Architecture Research Intern

Explored the usability and robustness of in-house AI software ecosystem, NeuroPilot, and contributed to its documentation. Further developed AI applications using NeuroPilot for edge inferencing on Dimensity SoC.

May 2016 – Aug 2016

Purdue University

Visiting Research Scholar

Modeled a multi-layered IC stack in Ansys HFSS and determined the IC stack layers responsible for significant EM signal leakage. Further simulated a theoretical EM Side Channel Analysis using MATLAB and successfully extracted the correct key byte using correlation analysis.

Nov 2015 – Dec 2015

USHVA Clean Technology Pvt. Ltd.

Embedded System Intern

Part of the team building a Smart Home Solar Power System with wireless load control and data monitoring. Created a Wireless Central hub and 6 Mini hubs using PIC MCUs, and RF/Wi-Fi Modules (an IoT system). Managed transmission of control signals from server to appliances and power dissipation data back to server.

Projects

Facial Emotion Recognition using Efficient Deep Neural Networks

Collaborated with three other graduate students to develop a CNN-based solution for Facial Emotion Recognition (FER) problem, with the goal of efficient edge inferencing. The idea was to take a small baseline CNN and inject it with an appropriate attention mechanism to focus on relevant facial features. The proposed solution achieved 83% and 63.5% accuracies on CK+ and FER2013 (among top 10 in ICML 2013 FER Challenge) datasets respectively, while being 8x faster and 3x more power-efficient compared to state-of-the-art VGG-19, on Snapdragon 855 mobile platform.

View Project

Hardware Aware Neural Network Architectures Using FBNets

The goal was to propose a differentiable Neural Architecture Search (NAS) approach inspired from FBNets, to generate effective neural network (NN) architectures that are heavily optimized for a given target device. The key idea was to extend the loss function to include an energy constraint along with the typical loss function and a latency constraint as used by FBNets. After extensive experimentation and loss function tuning via PyTorch, the new loss function was successfully able to generate NN architectures that were optimal in terms of accuracy, latency and energy consumption, for Raspberry Pi (used as an example target device). The trained child architectures were able to provide upto 2.5x speedup and 3.8x reduction in energy with tolerable accuracy loss (4-5%). This work (with equal contribution from all three authors) is available for perusal on arXiv.

View Project

A Hybrid Energy-Efficient Microarchitecture with Dynamic Renaming Optimizations

The goal was to design a hybrid microarchitecture with similar energy efficiency as inorder (InO) processors while providing close to out-of-order (OoO) performance. The proposed architecture used only InO structures without any expensive dynamic scheduling hardware. It consisted of a free-flow front-end consisting of functional units, and in-order queues at the back-end for exposing Memory Level Parallelism. It also implements certain dynamic optimizations at the renaming stage of the pipeline, namely, Move Elimination, Memory Bypassing and Constant Folding. These three optimizations collapse the corresponding instruction dependencies, reducing the total cycle count for execution. This hybrid architecture on simulation in Snipersim was able to display very high energy-efficiency (150% improvement over OoO and 80% over InO), while maintaining performance decently close to the OoO processor (lags by only 3.5%).

View Publication

Hardware Acceleration of Advanced Encryption Standard (AES)

Implemented an AES decryption engine on Xilinx Zynq-7000 FPGA using VHDL, which takes an encrypted image and displays the decrypted one on an LCD monitor. It consisted of two custom coprocessors, one for executing the AES decryption algorithm and another for implementing a TFT controller peripheral to display graphics on the LCD, with AXI-Stream (AXIS) communication between Processing System (PS) and Programmable Logic (PL). The AES decryption algorithm was also executed in C to validate the algorithm implementation and successfully displayed the decrypted image on the monitor.

Cycle-Level Modeling of a Front-end Execution Architecture

Developed hardware description model for a processor consisting of a free-flowing in-order front-end coupled with a shrunken OoO backend, using Verilog for a simplified 16-bit RISC instruction set with 15 instructions. Tomasulo dynamic scheduling along with register renaming and a hybrid Reservation Station structure/Re-Order Buffer were implemented. Performed cycle-level simulations in Altera Quartus Prime to validate the model.

View More Projects

Computer Systems

As part of the Computer Systems course, I developed a dynamic memory allocator for C programs and optimized it for space utilization, improving its throughput by almost 1.5x. I also designed an interactive command-line interpreter using appropriate signal handlers for running user programs, and a multi-cache simulator in C with MSI cache coherence protocol.

Microarchitecture HDL Implementation

Designed non-pipelined as well as pipelined versions of a multi-cycle 16-bit RISC processor consisting of 15 instructions in Verilog HDL and simulated it in Modelsim-Altera Simulator. The designs were demonstrated successfully by implementing on a DE0-Nano Development Board. As part of another project, I also designed the data path and controller of a CISC microcode based processor consisting of 19 instructions.

Voltage-Actuated MEMS Sensor

The goal was to design a suspended beam MEMS metal switch with better sensitivity than conventional sensors. The proposed structure basically consisted of a metal beam with an insulator and air gap beneath it. When voltage across the beam and the ground electrode is varied, the beam deflects and ultimately collapses after pull-in voltage. Its sensitivity was optimised using appropriate dimensions and materials through extensive C-V experimentations using MEMS+ software, to achieve a subthreshold swing below 60 mV/decade.

GSM-based AC Remote Control

Devised a system using Pt-51 (8052 architecture) Board, a GSM Modem (SIM800C) and an IR LED to control the temperature of an AC using GSM-based text messages. AT commands were sent via UART interface to initialize the GSM Module as well as to retrieve temperature information from the text message received by it. This temperature information was transmitted to the AC via the IR LED with the help of NEC Protocol. Decoding of AT commands and NEC encoding of temperature information were performed by the 8052 microcontroller.

Ultrasonic Local Positioning System

Designed a prototype of Local Positioning System (LPS) using three ultrasound transmitters, a receiver, two Pt-51 Boards and two Xbee Modules, which could display the position of the receiver on an LCD. Used the Xbee Modules to enable synchronized sequential sending of Ultrasonic pulses from the transmitters. Implemented Trilateration Method at the receiver side to calculate the receiver’s position.

Teaching

18-743: Neuromorphic Computer Architecture | CMU

As a Head TA since the course's inception, I helped develop course material, delivered lectures and co-ordinated work among five TAs over four offerings. I was the primary developer of the hardware as well as software framework used in the class.

18-740: Modern Computer Architecture and Design | CMU

As a Head TA since the course's inception, I helped develop course material, delivered lectures and co-ordinated work among six TAs over three offerings. I served as the primary student liaison with Qualcomm and MediaTek, and helped establish industry collaboration for lab assignments exploring CPU, GPU and NPU cores inside Qualcomm/MediaTek's state-of-the-art mobile SoCs.

EE309: Microprocessors | IIT Bombay

As an undergraduate TA, I mainly helped with grading and proctoring of quizzes and exams.

Skills

Extra-Curriculars

Volunteer Experience

Extra-Curricular Activities

Get in Touch