José Flich; Giovanni Agosta; Philipp Ampletzer; David Atienza Alonso; Carlo Brandolese; Etienne Cappe; Alessandro Cilardo; Leon Dragić; Alexandre Dray; Alen Duspara; William Fornaciari; Edoardo Fusella; Mirko Gagliardi; Gerald Guillaume; Daniel Hofman; Ynse Hoornenborg; Arman Iranfar; Mario Kovač; Simone Libutti; Bruno Maitre; José Maria Martínez; Giuseppe Massari; Koen Meinds; Hrvoje Mlinarić; Ermis Papastefanakis; Tomás Picornell; Igor Piljić; Anna Pupykina; Federico Reghenzani; Isabelle Staub; Rafael Tornero; Michele Zanella; Marina Zapater; Davide Zoni;
Abstracts:The Horizon 2020 MANGO project aims at exploring deeply heterogeneous accelerators for use in High-Performance Computing systems running multiple applications with different Quality of Service (QoS) levels. The main goal of the project is to exploit customization to adapt computing resources to reach the desired QoS. For this purpose, it explores different but interrelated mechanisms across the architecture and system software. In particular, in this paper we focus on the runtime resource management, the thermal management, and support provided for parallel programming, as well as introducing three applications on which the project foreground will be validated.
Macarena C. Martínez-Rodríguez; Piedad Brox; Iluminada Baturone;
Abstracts:This paper analyzes three cryptographic modules suitable for digital designs of trusted virtual sensors into integrated circuits, using 90-nm CMOS technology. One of them, based on the keyed-hash message authentication code (HMAC) standard employing a PHOTON-80/20/16 lightweight hash function, ensures integrity and authentication of the virtual measurement. The other two, based on CAESAR (the Competition for Authenticated Encryption: Security, Applicability, and Robustness) third-round candidates AEGIS-128 and ASCON-128, ensure also confidentiality. The cryptographic key required is not stored in the sensor but recovered in a configuration operation mode from non-sensitive data stored in the non-volatile memory of the sensor and from the start-up values of the sensor SRAM acting as a Physical Unclonable Function (PUF), thus ensuring that the sensor is not counterfeit. The start-up values of the SRAM are also employed in the configuration operation mode to generate the seed of the nonces that make sensor outputs different and, hence, resistant to replay attacks. The configuration operation mode is slower if using CAESAR candidates because the cryptographic key and nonce have 128 bits instead of the 60 bits of the key and 32 bits of the nonce in HMAC. Configuration takes 416.8 µ s working at 50 MHz using HMAC and 426.2 µ s using CAESAR candidates. In the other side, the trusted sensing mode is much faster with CAESAR candidates with similar power consumption. Trusted sensing takes 212.62 µ s at 50 MHz using HMAC, 0.72 µ s using ASCON, and 0.42 µ s using AEGIS. AEGIS allows the fastest trusted measurements at the cost of more silicon area, 4.4 times more area than HMAC and 5.4 times more than ASCON. ASCON allows fast measurements with the smallest area occupation. The module implementing ASCON occupies 0.026 mm2 in a 90-nm CMOS technology.
Abstracts:Linear detection algorithms require a series of sequential and complex process that needs high-performance processors to reduce computing time when the software is implemented. In this paper, we design a linear detection hardware accelerator with parallel computing capability through our proposed pipelined multiprocessor system-on-a-chip (SoC) design methodology; it contains an upper pipelined controller that controls the operation of the underlying Canny edge detection module and the Hough transform module. That is, we first use the edge detection module to get the edge information, and then use Hough transform to improve the accuracy of linear detection results. Finally, the pipeline control is adopted to enhance the effectiveness of the module. Based on the Canny process and the Gaussian blurring method, this study can reduce the false detection caused by noise, and decrease the number of operations and resource usage without affecting the straight line detection. Compared with Xu and Chen    , the proposed method can reduce 84% and 74% of the circuit resources, respectively. The hardware function circuit generated from our methodology has a good decentralized architecture and scalability, and it is easier to use in all kinds of embedded systems.
E. Torti; A. Fontanella; G. Florimbi; F. Leporati; H. Fabelo; S. Ortega; G.M. Callico;
Abstracts:The HypErspectraL Imaging Cancer Detection (HELICoiD) European project aims at developing a methodology for tumor tissue classification through hyperspectral imaging (HSI) techniques. This paper describes the development of a parallel implementation of the Support Vector Machines (SVMs) algorithm employed for the classification of hyperspectral (HS) images of in vivo human brain tissue. SVM has demonstrated high accuracy in the supervised classification of biological tissues, and especially in the classification of human brain tumor. In this work, both the training and the classification stages of the SVMs were accelerated using Graphics Processing Units (GPUs). The acceleration of the training stage allows incorporating new samples during the surgical procedures to create new mathematical models of the classifier. Results show that the developed system is capable to perform efficient training and real-time compliant classification.
Wasif Afzal; Hugo Bruneliere; Davide Di Ruscio; Andrey Sadovykh; Silvia Mazzini; Eric Cariou; Dragos Truscan; Jordi Cabot; Abel Gómez; Jesús Gorroñogoitia; Luigi Pomante; Pavel Smrz;
Abstracts:A major challenge for the European electronic industry is to enhance productivity by ensuring quality of development, integration and maintenance while reducing the associated costs. Model-Driven Engineering (MDE) principles and techniques have already shown promising capabilities, but they still need to scale up to support real-world scenarios implied by the full deployment and use of complex electronic components and systems. Moreover, maintaining efficient traceability, integration, and communication between two fundamental system life cycle phases (design time and runtime) is another challenge requiring the scalability of MDE. This paper presents an overview of the ECSEL 1http://www.ecsel-ju.eu/web/index.php. project entitled “MegaModelling at runtime – Scalable model-based framework for continuous development and runtime validation of complex systems” (MegaM@Rt2), whose aim is to address the above mentioned challenges facing MDE. Driven by both large and small industrial enterprises, with the support of research partners and technology providers, MegaM@Rt2 aims to deliver a framework of tools and methods for: 1) system engineering/design and continuous development, 2) related runtime analysis and 3) global models and traceability management. Diverse industrial use cases (covering strategic domains such as aeronautics, railway, construction and telecommunications) will integrate and demonstrate the validity of the MegaM@Rt2 solution. This paper provides an overview of the MegaM@Rt2 project with respect to its approach, mission, objectives as well as to its implementation details. It further introduces the consortium as well as describes the work packages and few already produced deliverables.
Abstracts:This paper presents a technique using a genetic algorithm to compute an efficient routing for an application-specific NoC (Network-on-Chip). The main goal of this paper is to introduce multi-objective optimization techniques to address the NoC routing. Thus, Pareto optimization has been used to determine non-dominated solutions according to two fixed objectives: (i) avoiding the reuse of same links as far as possible to reduce congestion; (ii) reducing the number of loops to limit the risk of deadlocks. The proposed method called MORGA (Multi-Objective Routing based on Genetic Algorithm) uses two steps: (i) an off-line process consisting at selecting a non-dominated solution among a pre-calculated population of solutions; (ii) an on-line process allowing the data transmission based on the built solution by the use of routing tables. MORGA is also applicable in the presence of permanent faulty links by calculating fault-free solutions. A reconfiguration of routing tables is performed when a new application is loaded on the system. Results show how a selection of the most appropriate solution can provide considerable improvement in performance.
Abstracts:In this work, an embedded system for point- to-point secure transmission of encrypted signals was developed. This portable system, which is also experimentally analyzed includes, includes a pair of digital signal controllers dsPIC33FJ128MC802 and the algorithm is based on the Rössler oscillator with constant and chaotic parameters. Therefore, the synchronization between both devices was analyzed via the measurement of the synchronization error and the transient time. The viability of the complete system was experimentally studied, by sending encrypted signals by wired and wireless methods from the master device to the slave, where these are decrypted. As the originally acquired signals show some contamination of white noise, the transmitted and decoded signals are filtered through a Kalman filter embedded in the same algorithm, reaching -8 dB of noise diminution approximately. Afterwards, the decoded signals are compared with the initial ones by the Pearson correlation coefficient. When a synchronization error is fixed at 1 × 10−5, the experimental results exhibit transient times of 4.45 s and 2.69 min for the wired and wireless transmitting methods, respectively. However, as the estimated correlation coefficients are ranging in the interval 0.99963 < R < to 0.999999. Hence, the initial encrypted signal is entirely received by the slave device. The system is able to works in real-time, nevertheless, the determined sampling frequency is drastically diminished when the point-to-point communication is carried out via wireless
Lampros Pyrgas; Paris Kitsos; Athanassios Skodras;
Abstracts:The discrete Hartley transform is a real valued transform similar to the complex Fourier transform that finds numerous applications in a variety of fields including pattern recognition and signal and image processing. In this paper, we propose and study two compact and versatile hardware architectures for the computation of the 8-point, 16-point and 32-point Two-Band Fast Discrete Hartley Transform. These highly modular architectures have a symmetric and regular structure consisting of two blocks, a multiplication block and an addition/subtraction block. The first architecture utilizes 8 multipliers and 16 adders/subtractors, achieving a maximum clock frequency of 95 MHz. The second architecture utilizes only 4 multipliers and 8 adders/subtractors, achieving a maximum clock frequency of 100 MHz; however it requires additional multiplexers and more clock cycles (from 1 to 58 clock cycles depends on the points) for the computation. As a result, the proposed hardware architectures constitute an efficient choice for area-restricted applications such as embedded or pervasive computing systems.
Abstracts:Aliasing in test response compaction is an important source of fault coverage loss. Methods to avoid the aliasing mostly require modification of the compactor to some extent. This can lead to a higher compactor complexity and consequently to higher area overhead, longer signal propagation delays, etc.
Abstracts:Financial and commercial applications depend on decimal arithmetic because they must produce results that match exactly those obtained by human calculations. Decimal multiplication is a frequently used operation in these applications and also in the design of decimal floating-point units. In this paper we propose a new architecture for parallel decimal multiplication that improves the area of previous decimal multipliers while keeping the best performances. A decimal adder  based on a mixed BCD/excess-6 representation of the operands is utilized. A new partial product generation unit is proposed based on a 5221 recoding of the multiplier digits. With the proposed multiplier, we are able to improve on state-of-the-art parallel decimal multipliers targeting LUT-6 FPGAs. Compared to previous decimal multipliers, implementation results for 2, 4, 8, 16, 32 and 34-digits show that the proposed multiplier achieves over 20% better area without performance degradation.