Welcome to the IKCEST
Journal
IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Archives Papers: 805
IEEE Xplore
Please choose volume & issue:
Low-Power Redundant-Transition-Free TSPC Dual-Edge-Triggering Flip-Flop Using Single-Transistor-Clocked Buffer
Zisong WangPeiyi ZhaoTom SpringerCongyi ZhuJaccob MauAndrew WellsYinshui XiaLingli Wang
Keywords:ClocksLatchesTransistorsInvertersMOS devicesSwitchesPower demandDual edge triggering (DET)dynamic powerflip-flop (FF)
Abstracts:In the modern graphics processing unit (GPU)/artificial intelligence (AI) era, flip-flop (FF) has become one of the most power-hungry blocks in processors. To address this issue, a novel single-phase-clock dual-edge-triggering (DET) FF using a single-transistor-clocked (STC) buffer (STCB) is proposed. The STCB uses a single-clocked transistor in the data sampling path, which completely removes clock redundant transitions (RTs) and internal RTs that exist in other DET designs. Verified by post-layout simulations in 22 nm fully depleted silicon on insulator (FD-SOI) CMOS, when operating at 10% switching activity, the proposed STC-DET outperforms prior state-of-the-art low-power DET in power consumption by 14% and 9.5%, at 0.4 and 0.8 V, respectively. It also achieves the lowest power-delay-product (PDP) among the DETs.
Energy-Efficient Wide-Range Level Shifter With a Logic Error Detection Circuit
Jihwan ParkHanwool Jeong
Keywords:VoltageTransistorsLight emitting diodesTurningCurrent limitersVery large scale integrationPreamplifiersCurrent mirrorlevel shifter (LS)low powernear-threshold operationwide range
Abstracts:In this brief, an energy-efficient, wide-range level shifter (LS) with a logic error detection circuit (LEDC) is proposed. The proposed LS is designed based on a current mirror-based LS (CMLS), and a feedback pFET is added to solve the static current, which is a limitation of the CMLS. Similarly, Wilson&#x2019;s CMLS (WCMLS) solves the problem of the CMLS through the feedback pFET; however, it cannot convert low supply voltage (<inline-formula> <tex-math notation="LaTeX">$V_{mathrm {DDL}}$ </tex-math></inline-formula>) to high supply voltage (<inline-formula> <tex-math notation="LaTeX">$V_{mathrm {DDH}}$ </tex-math></inline-formula>) fully due to the feedback pFET. In contrast, the proposed LS can convert <inline-formula> <tex-math notation="LaTeX">$V_{mathrm {DDL}}$ </tex-math></inline-formula> to full <inline-formula> <tex-math notation="LaTeX">$V_{mathrm {DDH}}$ </tex-math></inline-formula> using the LEDC. To verify the performance between the proposed LS and the previously proposed LS, the postlayout simulation was performed using the 7-nm finFET model. The simulation results of the proposed LS show that the propagation delay and energy are 0.21 ns and 20.43 fJ, respectively, at a low/high voltage of 0.4/1.2 V and an input frequency of 1 MHz.
An Efficient Massive MIMO Detector Based on Approximate Expectation Propagation
Yangyang ChenSuwen SongZhongfeng WangJun Lin
Keywords:Approximation algorithmsDetectorsModulationConvergenceThroughputHardwareSignal to noise ratioApproximate expectation propagation (EP)hardware implementationhigh throughputmassive multiple-input–multiple-output (MIMO)second-order Richardson iteration (SORI)
Abstracts:Among expectation propagation (EP)-based massive multiple-input&#x2013;multiple-output (MIMO) detection algorithms, EP with weighted Neumann-series approximation (EPA-wNSA) has the lowest computational complexity while requiring many iterations to guarantee the detection performance, which severely limits the throughput of hardware implementations. Through the joint optimization of algorithm and hardware architecture, we propose an EP-based detector with higher throughput and area efficiency. First, the second-order Richardson iteration (SORI) algorithm is employed to replace the wNSA algorithm for higher convergence speed. Then three algorithmic transformations are proposed to minimize the overall complexity. Simulation results show that the proposed EPA-SORI algorithm requires much fewer iterations to achieve comparable or even better detection performance compared with EPA-wNSA. Furthermore, an efficient detector architecture is delicately designed by incorporating multiple optimization methods, such as reverse data flow, advanced addition, and rounding cells. Implemented with the Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm CMOS technology, the proposed detector has <inline-formula> <tex-math notation="LaTeX">$2.2 times $ </tex-math></inline-formula> higher throughput than the state-of-the-art EP-based detector.
A Reconfigurable Multiple Transform Selection Architecture for VVC
Zhijian HaoHeming SunGuoqing XiangPeng ZhangXiaoyang ZengYibo Fan
Keywords:TransformsComputer architectureHardwareEncodingDiscrete cosine transformsMatrix decompositionFeature extractionArea-efficientdata arrangementimproved calculate schemetransform architectureunified shift-adder unit (SAU)Versatile Video Coding (VVC)
Abstracts:Video coding plays an important role in the highly information-based world as videos contribute the largest part of network traffic. The latest video coding standard Versatile Video Coding (VVC) introduces a new transform scheme multiple transform selection (MTS), which brings considerable coding gains at the expense of high coding complexity. In this article, we propose a reconfigurable MTS architecture that supports all transform types in VVC with square and rectangular sizes ranging from <inline-formula> <tex-math notation="LaTeX">$4times $ </tex-math></inline-formula> 4 to 32 <inline-formula> <tex-math notation="LaTeX">$times32$ </tex-math></inline-formula>. Firstly, we explore the features of three types of transform matrices and extract the features that are beneficial to designing a unified architecture. Then, we present an improved calculation scheme for general transforms, where the transform matrix is decomposed into two simpler matrices to increase the similarity and decrease the complexity of matrices involved in three types of transform operations. Thanks to the improved calculated scheme, a unified shift-adder unit (SAU) is designed and highly reused by different types. Moreover, we provide a twirling two-point splicing (T2S) scheme to improve reusability and deal with issues of data mismatch when conducting discrete cosine transform (DCT)-II of different sizes. As a consequence, an architecture with constant throughput of 32 pixels/cycle is implemented and specified in Verilog HDL. The synthesis results indicate that the application specific integrated circuit (ASIC)-based and field-programmable gate array (FPGA)-based hardware architectures achieve significant advantages both in area reduction and power consumption compared to existing methods in the literature.
An OOK and Binary FSK Reconfigurable Dual-Band Noncoherent IR-UWB Receiver Supporting Ternary Signaling
Nakisa ShamsAmin Pourvali KakhkiMorteza NabaviFrederic Nabki
Keywords:ReceiversFrequency shift keyingDual bandRF signalsRadio frequencyModulationDemodulationBinary frequency-shift keying (FSK)concurrent dual bandimpulse radiononcoherent architectureON-OFF keying (OOK)programmable current switchingreconfigurable structureternary signalingultrawideband
Abstracts:This article presents a multiband low-power and low-complexity impulse radio ultrawideband (IR-UWB) noncoherent receiver. The proposed receiver can be digitally reconfigured in three different modes of operation, including two single-band modes and one concurrent dual-band mode. In the two single-band modes, the proposed envelope detection architecture is capable of receiving and demodulating an ON&#x2013;OFF keying (OOK) pulse stream at RF center frequencies of 2.8 or 4.8 GHz. In the concurrent dual-band mode, the proposed architecture is able to demodulate binary frequency-shift keying (FSK), in addition to OOK demodulation, at center frequencies of 3 and 5 GHz. The receiver is composed of a reconfigurable low-power differential low noise amplifier (LNA), a fully differential squarer (self-mixer circuit), low-pass filter (LPF), and variable gain baseband (BB) amplifiers. The receiver is fabricated in TSMC 130-nm CMOS process technology. The receiver can operate at up to 150 Mb/s with the ternary signaling that is enabled by the binary FSK modulation combined with the OOK modulation in concurrent dual-band mode. At its maximum gain, the receiver achieves a sensitivity of &#x2212;72 dBm at a bit error rate (BER) of <inline-formula> <tex-math notation="LaTeX">$10^{-3}$ </tex-math></inline-formula> at a 100-Mb/s data rate. It consumes 11.9 and 13.2 mW from a 1.2-V supply in the single-band modes and concurrent dual-band mode, respectively.
A Speculative Divide-and-Conquer Optimization Method for Large Analog/Mixed-Signal Circuits: A High-Speed FFE SST Transmitter Example
Kwangmin KimHyoseok SongByeongcheol LeeByungsub Kim
Keywords:OptimizationOptimization methodsComplexity theoryIntegrated circuit modelingLoad modelingMathematical modelsCircuit optimizationAnalog/mixed-signal (AMS) design optimizationcircuit optimizationfeed forward equalization (FFE)high-speed link transmitter (TX)large AMS circuit
Abstracts:We propose a speculative divide-and-conquer (SDnC) method that enables optimization of a large analog/mixed-signal (AMS) circuit. Because modules of AMS circuits strongly interact with neighbor modules, they cannot be optimized individually. Therefore, design parameters of all modules must be co-optimized for the global optimization, and thus, the design space exponentially grows with the circuit size. Although metaheuristic algorithms can enhance optimization efficiency, they cannot handle very large circuits due to the exponentially increased size of design space to explore. The proposed method utilized the divide-and-conquer (DnC) strategy in the circuit optimization while taking into account modules&#x2019; interactions, allowing modules to be individually evaluated and optimized. Therefore, this DnC-based optimization method hierarchically and systematically reduces both the optimization complexity and the evaluation complexity, and thus, this method is very scalable with the circuit size. The proposed method can also be combined with other metaheuristic algorithms such as particle swarm optimization (PSO) or artificial intelligence for faster optimization. In experiment, the proposed method enabled optimization of a high-speed link transmitter that has 2400&#x2013;15 000 transistors and 45&#x2013;47 independent design parameters for the first time. The optimization time was improved by 43 and three orders of magnitudes compared to parameter sweep and PSO, respectively. The penalties of speed-up by DnC were only 3.4&#x0025; and 9.3&#x0025; estimation errors in power consumption and eye height, respectively. Because of the greatly improved speed, the proposed method also enables quantitative analysis on performance-power tradeoff of a large AMS circuit.
Eliminating Minimum Implant Area Violations With Design Quality Preservation
Eunsol JeongTaewhan KimHeechun Park
Keywords:TimingImplantsPower demandStandardsDelaysVery large scale integrationTask analysisIntra-row minimum implant area (MIA) violationsmixed integer-linear programming (MILP)multi-Vₜ designs
Abstracts:Minimum implant area (MIA) violation has emerged in the sub-micrometer technology which requires a certain amount of threshold voltage (<inline-formula> <tex-math notation="LaTeX">$V_{text {t}}$ </tex-math></inline-formula>) area for the fabrication. Elimination of MIA violations in the sign-off layout thus becomes an inevitable task for a high-performance multiple-<inline-formula> <tex-math notation="LaTeX">$V_{text {t}}$ </tex-math></inline-formula> design. Conventional approaches as well as the previous efforts to remove MIA violations bring severe defects to the final design in that locally moving cells or reassigning <inline-formula> <tex-math notation="LaTeX">$V_{text {t}}text{s}$ </tex-math></inline-formula> make the timing constraints unsatisfied or power consumption to be exploded. In this article, we propose a comprehensive MIA violation removal algorithm that fully and systematically controls the timing budget and power overhead with three sequential steps: 1) removing intra-row MIA violations by <inline-formula> <tex-math notation="LaTeX">$V_{text {t}}$ </tex-math></inline-formula> reassignment under timing preservation and minimal power increments; 2) removing inter-row MIA violations with a theoretically optimal <inline-formula> <tex-math notation="LaTeX">$V_{text {t}}$ </tex-math></inline-formula> reassignment while satisfying timing constraints; and 3) refining <inline-formula> <tex-math notation="LaTeX">$V_{text {t}}$ </tex-math></inline-formula> reassignment to recover the power loss without violating both MIA constraints and timing closure. Moreover, we introduce a preprocessing algorithm at the preroute stage to remove a huge amount of MIA violations in advance for an additional runtime reduction without design quality degradation. Experiments through benchmark circuits show that our proposed approach completely resolve MIA violations while ensuring no timing violation and using 34.6&#x0025; less power - verhead on average than the conventional approaches and previous works. In addition, our preprocessing step reduces 45&#x0025;&#x2013;88&#x0025; of MIA violations before the routing stage, which incurs 41&#x0025; faster MIA removal on average in the final stage with similar design quality.
Hot Journals