中国科学院半导体研究所

A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment

2021-04-01

Author(s): Mo, HY (Mo, Huiyu); Liu, LB (Liu, Leibo); Zhu, WP (Zhu, Wenping); Li, Q (Li, Qiang); Yin, SY (Yin, Shouyi); Wei, SJ (Wei, Shaojun)

Source: IEEE TRANSACTIONS ON MULTIMEDIA Volume: 23 Pages: 1122-1135 DOI: 10.1109/TMM.2020.2993943 Published: 2021

Abstract: The mnemonic descent method (MDM) algorithm is the first end-to-end recurrent convolutional system for high-accuracy face alignment. However, the heavy computational complexity and high memory access demands make it difficult to satisfy the requirements of real-time applications. To address this problem, an improved MDM (I-MDM) algorithm is proposed for efficient hardware implementation based on several hardware-oriented optimizations. First, a patch merging mechanism is introduced to dynamically cluster and eliminate redundant landmarks, which significantly reduces computational complexity with minimal accuracy loss. Second, a dedicated convolutional layer is inserted to halve the number of computations and memory access of the subsequent fully connected layer, yielding a 4.42% decrease in the failure rate. Third, a lightweight preprocessing method named dual regressors is proposed to reinitialize face images, which can greatly improve the overall accuracy. Moreover, compared with a similar method, the DR method can reduce computations and memory storage by nearly 99.9%. Overall and compared with the MDM algorithm, I-MDM not only reduces the number of computations by 23.5% but also decreases the failure rate by 17.9% on the 300 W test set. Based on the proposed I-MDM algorithm, an I-MDM-based hardwired accelerator is presented using the TSMC 65 nm CMOS process. First, compared with similar solutions, the gradient calculation operation is rearranged and loaded pixels are reused in the HoG feature extraction to eliminate all division operations and 25% off-chip memory access. Second, patch-independent central activations are used to enable patch-level pipelined operations, yielding a 2x acceleration in the overall process. This accelerator achieves 460 GOPS/W energy efficiency at 330 MHz, which is 38x higher than the most recent face alignment accelerator with the same process.

Accession Number: WOS:000623420300023

Author Identifiers:

Author Web of Science ResearcherID ORCID Number

Mo, Huiyu 0000-0002-3373-7178

ISSN: 1520-9210

eISSN: 1941-0077

Full Text: https://ieeexplore.ieee.org/document/9091213

A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment

关于我们

下载视频观看

联系方式

通信地址

电话

E-mail

交通地图

友情链接

中华人民共和国科学技术部

中国科学院

中国工程院

国家自然科学基金委员会

中国科学院大学

中国科学技术大学

中国科学院科技产业网

版权所有中国科学院半导体研究所

备案号：京ICP备05085259-1号京公网安备110402500052 中国科学院半导体所声明

A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment

关于我们

联系方式

通信地址

电话

E-mail

友情链接

版权所有 中国科学院半导体研究所

备案号：京ICP备05085259-1号 京公网安备110402500052 中国科学院半导体所声明

document.write(unescape("%3Cspan id='_ideConac' %3E%3C/span%3E%3Cscript src='http://dcs.conac.cn/js/33/000/0000/60430803/CA330000000604308030001.js' type='text/javascript'%3E%3C/script%3E"));

版权所有中国科学院半导体研究所

备案号：京ICP备05085259-1号京公网安备110402500052 中国科学院半导体所声明