Reliability modeling of dynamic thermal management in multicore processor

With the continuous downscaling in semiconductor technology, the growing power density and thermal issues in multi-core processors are challenging and crucial. The system reliability associated with increased power dissipation affect the reliability of thermal management. High temperatures and l...

Full description

Bibliographic Details
Main Author: Pour, Somayeh Rahimi
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104250/
http://psasir.upm.edu.my/id/eprint/104250/1/SOMAYEH%20RAHIMI%20POUR%20-%20IR.pdf
_version_ 1848864237423689728
author Pour, Somayeh Rahimi
author_facet Pour, Somayeh Rahimi
author_sort Pour, Somayeh Rahimi
building UPM Institutional Repository
collection Online Access
description With the continuous downscaling in semiconductor technology, the growing power density and thermal issues in multi-core processors are challenging and crucial. The system reliability associated with increased power dissipation affect the reliability of thermal management. High temperatures and large thermal variations on the die create severe challenges in system reliability, performance, leakage power, and cooling costs. Dynamic thermal management (DTM) methods regulate the operating temperature based on the provided temperature profile from thermal sensors, which is transmitted using network-on-chip (NoC) in multi-core systems. DTM efficiency is highly dependent on the accuracy of thermal data. Temperature profile inaccuracies are caused by various factors including sensor placement, sensor device imprecision, and interconnection deep sub-micron (DSM) noise. While temperature profile inaccuracies due to sensor placement and sensor device imprecision have been widely addressed, limited study performed on the impact of interconnection DSM noise on DTM efficiency. Hence, this thesis develops a comprehensive simulator model to investigate the impact of interconnect DSM noise on thermal data accuracy and DTM efficiency. The simulation results demonstrate that DSM noise severely affecting the MSbs of thermal data that leads to significant degradation of DTM performance. To mitigate the DSM noise impact on DTM efficiency, an NoC fault tolerance scheme, exploiting inherent characteristics of DSM noise impacting the thermal data, is proposed that comparing to the standard coding scheme achieves lower cost in term of area and power consumption while increasing DTM efficiency by 38%. The second source of chip reliability involves power delivery network (PDN). PDN suffers from long-term reliability threats such as electro- migration (EM). Loss of limited Controlled Collapse Chip Connection (C4) pads to electro-migration makes delivering a stable supply voltage more critical. C4 bumps failure mechanism depends on current density, on-chip voltage noise, and temperature. In this thesis, the C4 bumps failure mechanisms dependency on each individual bumps' temperature value is explored that leads to more accurate mean-time-to-failure (MTTF) of the whole system. The simulation results demonstrate that using uniform temperature leads underestimating the system MTTF by up to 16 times due to exponentially dependency of C4 bump failure to temperature.
first_indexed 2025-11-15T13:45:37Z
format Thesis
id upm-104250
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T13:45:37Z
publishDate 2018
recordtype eprints
repository_type Digital Repository
spelling upm-1042502023-07-25T02:02:00Z http://psasir.upm.edu.my/id/eprint/104250/ Reliability modeling of dynamic thermal management in multicore processor Pour, Somayeh Rahimi With the continuous downscaling in semiconductor technology, the growing power density and thermal issues in multi-core processors are challenging and crucial. The system reliability associated with increased power dissipation affect the reliability of thermal management. High temperatures and large thermal variations on the die create severe challenges in system reliability, performance, leakage power, and cooling costs. Dynamic thermal management (DTM) methods regulate the operating temperature based on the provided temperature profile from thermal sensors, which is transmitted using network-on-chip (NoC) in multi-core systems. DTM efficiency is highly dependent on the accuracy of thermal data. Temperature profile inaccuracies are caused by various factors including sensor placement, sensor device imprecision, and interconnection deep sub-micron (DSM) noise. While temperature profile inaccuracies due to sensor placement and sensor device imprecision have been widely addressed, limited study performed on the impact of interconnection DSM noise on DTM efficiency. Hence, this thesis develops a comprehensive simulator model to investigate the impact of interconnect DSM noise on thermal data accuracy and DTM efficiency. The simulation results demonstrate that DSM noise severely affecting the MSbs of thermal data that leads to significant degradation of DTM performance. To mitigate the DSM noise impact on DTM efficiency, an NoC fault tolerance scheme, exploiting inherent characteristics of DSM noise impacting the thermal data, is proposed that comparing to the standard coding scheme achieves lower cost in term of area and power consumption while increasing DTM efficiency by 38%. The second source of chip reliability involves power delivery network (PDN). PDN suffers from long-term reliability threats such as electro- migration (EM). Loss of limited Controlled Collapse Chip Connection (C4) pads to electro-migration makes delivering a stable supply voltage more critical. C4 bumps failure mechanism depends on current density, on-chip voltage noise, and temperature. In this thesis, the C4 bumps failure mechanisms dependency on each individual bumps' temperature value is explored that leads to more accurate mean-time-to-failure (MTTF) of the whole system. The simulation results demonstrate that using uniform temperature leads underestimating the system MTTF by up to 16 times due to exponentially dependency of C4 bump failure to temperature. 2018-01 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/104250/1/SOMAYEH%20RAHIMI%20POUR%20-%20IR.pdf Pour, Somayeh Rahimi (2018) Reliability modeling of dynamic thermal management in multicore processor. Doctoral thesis, Universiti Putra Malaysia. Electronic apparatus and appliances - Temperature control Heat - Transmission Microprocessors
spellingShingle Electronic apparatus and appliances - Temperature control
Heat - Transmission
Microprocessors
Pour, Somayeh Rahimi
Reliability modeling of dynamic thermal management in multicore processor
title Reliability modeling of dynamic thermal management in multicore processor
title_full Reliability modeling of dynamic thermal management in multicore processor
title_fullStr Reliability modeling of dynamic thermal management in multicore processor
title_full_unstemmed Reliability modeling of dynamic thermal management in multicore processor
title_short Reliability modeling of dynamic thermal management in multicore processor
title_sort reliability modeling of dynamic thermal management in multicore processor
topic Electronic apparatus and appliances - Temperature control
Heat - Transmission
Microprocessors
url http://psasir.upm.edu.my/id/eprint/104250/
http://psasir.upm.edu.my/id/eprint/104250/1/SOMAYEH%20RAHIMI%20POUR%20-%20IR.pdf