Commit 7e6c0003 authored by Guillem Cabo's avatar Guillem Cabo
Browse files

Merge branch 'jk/FT-rebase' into 'develop'

add fault tolerance

See merge request !11
parents a383924a 9d9db298
junk
hdl/PMU_raw/ hdl/PMU_raw/
*.swp *.swp
hdl/tmp* hdl/tmp*
*.questa.log
*.lquesta.log
[submodule "tools/DAVOS"]
path = tools/DAVOS
url = https://github.com/GuillemCabo/DAVOS.git
MIT License MIT License
Copyright (c) 2021 CAOS_HW / HDL_IP Copyright (c) 2021 Barcelona Supercomputing Center (BSC-CNS)
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal
......
...@@ -4,7 +4,7 @@ This repository contains the RTL and documentation for the unit. ...@@ -4,7 +4,7 @@ This repository contains the RTL and documentation for the unit.
* The specs for each feature and memory map calculator can be found under the ```docs``` folder. * The specs for each feature and memory map calculator can be found under the ```docs``` folder.
* Top levels for different configurations or wrappers are foun in ```rtl```. * Top levels for different configurations or wrappers are found in ```rtl```.
* RTL for Submodules (MCCU, RDC, Counters, etc..) can be foun in ```submodules```. * RTL for Submodules (MCCU, RDC, Counters, etc..) can be found in ```submodules```.
* Synth contains scripts for early area and frequency evaluation with yosys. * Synth contains scripts for early area and frequency evaluation with yosys.
* ```tb``` contains testbenes and verification scripts. * ```tb``` contains testbenches and verification scripts.
\ No newline at end of file
...@@ -2,3 +2,4 @@ AXI_PMU/ ...@@ -2,3 +2,4 @@ AXI_PMU/
tmp/ tmp/
pmu_ahb/ pmu_ahb/
\.*\.log \.*\.log
.verilator.log
...@@ -14,7 +14,6 @@ rm -rf ./AXI_PMU ...@@ -14,7 +14,6 @@ rm -rf ./AXI_PMU
############ ############
## TOP pmu_ahb.sv ## TOP pmu_ahb.sv
############ ############
# Run Verilator # Run Verilator
printf "Please wait, running Verilator\n" printf "Please wait, running Verilator\n"
verilator --lint-only ../hdl/pmu_ahb.sv \ verilator --lint-only ../hdl/pmu_ahb.sv \
...@@ -24,7 +23,13 @@ verilator --lint-only ../hdl/pmu_ahb.sv \ ...@@ -24,7 +23,13 @@ verilator --lint-only ../hdl/pmu_ahb.sv \
../submodules/RDC/hdl/RDC.sv \ ../submodules/RDC/hdl/RDC.sv \
../submodules/quota/PMU_quota.sv \ ../submodules/quota/PMU_quota.sv \
../submodules/counters/PMU_counters.sv \ ../submodules/counters/PMU_counters.sv \
../submodules/overflow/PMU_overflow.sv 2> $VERILATOR_LOG0 ../submodules/overflow/PMU_overflow.sv \
../submodules/seu_ip/hamming32t26d_enc.sv \
../submodules/seu_ip/hamming32t26d_dec.sv \
../submodules/seu_ip/triple_reg.sv \
../submodules/seu_ip/way3_voter.sv \
../submodules/seu_ip/way3u2a_voter.sv \
../submodules/seu_ip/way3ua_voter.sv 2> $VERILATOR_LOG0
# Run Questa # Run Questa
printf "Please wait, running Spyglass\n" printf "Please wait, running Spyglass\n"
...@@ -33,6 +38,13 @@ printf "Please wait, running Spyglass\n" ...@@ -33,6 +38,13 @@ printf "Please wait, running Spyglass\n"
../submodules/crossbar/hdl/crossbar.sv \ ../submodules/crossbar/hdl/crossbar.sv \
../submodules/MCCU/hdl/MCCU.sv \ ../submodules/MCCU/hdl/MCCU.sv \
../submodules/RDC/hdl/RDC.sv \ ../submodules/RDC/hdl/RDC.sv \
../submodules/overflow/PMU_overflow.sv \
../submodules/seu_ip/hamming32t26d_enc.sv \
../submodules/seu_ip/hamming32t26d_dec.sv \
../submodules/seu_ip/triple_reg.sv \
../submodules/seu_ip/way3_voter.sv \
../submodules/seu_ip/way3u2a_voter.sv \
../submodules/seu_ip/way3ua_voter.sv \
../submodules/quota/PMU_quota.sv \ ../submodules/quota/PMU_quota.sv \
../submodules/counters/PMU_counters.sv \ ../submodules/counters/PMU_counters.sv \
../submodules/overflow/PMU_overflow.sv 1> /dev/null ../submodules/overflow/PMU_overflow.sv 1> /dev/null
......
...@@ -12,7 +12,9 @@ rm -f $LOG ...@@ -12,7 +12,9 @@ rm -f $LOG
# Go to target folder # Go to target folder
cd ../tb/questa_sim/ || exit 1 cd ../tb/questa_sim/ || exit 1
# Declare folders of tests to be executed # Declare folders of tests to be executed
declare -a StringArray=("tb_axi_pmu/" "tb_pmu_ahb/" "tb_pmu_raw/") declare -a StringArray=("tb_axi_pmu/" "tb_com_tr/" "tb_hamming16td11/" "tb_hamming32td26/"
"tb_pmu_ahb/" "tb_pmu_raw/" "tb_reg_sbf/" "tb_MCCU" "tb_crossbar"
"tb_com_tr/")
# Iterate the string array using for loop # Iterate the string array using for loop
for val in ${StringArray[@]}; do for val in ${StringArray[@]}; do
......
*.out
*.toc
*.gz
*.aux
*.log
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
\hline \hline
clk\_i & INPUT & 1 & - & Width of data registers & module port clk\_i & INPUT & 1 & - & Width of data registers & module port
\\ \\
rstn\_i & INPUT & 1 & - & Active low asyncronous reset. It... & module port rstn\_i & INPUT & 1 & - & Active low syncronous reset. It... & module port
\\ \\
enable\_i & INPUT & 1 & - & can be generated & module port enable\_i & INPUT & 1 & - & can be generated & module port
\\ \\
......
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
\hline \hline
clk\_i & INPUT & 1 & - & Width of data registers & module port clk\_i & INPUT & 1 & - & Width of data registers & module port
\\ \\
rstn\_i & INPUT & 1 & - & Active low asyncronous reset. It... & module port rstn\_i & INPUT & 1 & - & Active low syncronous reset. It... & module port
\\ \\
enable\_i & INPUT & 1 & - & can be generated & module port enable\_i & INPUT & 1 & - & can be generated & module port
\\ \\
......
No preview for this file type
\newpage
\section{Overview}
\label{chapter1}
The SafePMU is capable of changing the execution of critical tasks, either if a control kernel is using its measures to perform software control or by the use of interrupt service routines signaled by the unit. Given this scenario the unit shall be analyzed to detect possible failure modes. We focus our efforts on failures due to single event upsets (SEU) and single transient effects (SET) that can be mitigated at RTL level. Additional measures can be taken at a physical level such us the use of hardened memory cells on ASIC or periodical reconfiguration on FPGAs, but they shall be undertaken by the IP integrator in a project by project basis.\\
Failure modes and fault tolerance measures have been analyzed for each RTL file. Common considerations among files and features are described under the general section.\\
\begin{itemize}
\item \textbf{pmu\_ahb.sv:} Interface with AHB bus. Contains a PMU values and configuration registers, state machines for AHB control, combinational logic to manage register updates.
\item \textbf{PMU\_raw.sv:} Signal routing among instances, some signals with combinational logic for enables, IMPLEMENTATION OF SELFTEST MODE, registers RDC and MCCU signals.
\item \textbf{PMU\_counters.sv:} Internally registered counter values, combinational logic for adders and external update control.
\item \textbf{crossbar.sv:} Externally driven muxes and registered outputs.
\item \textbf{MCCU.sv:}Internally registered quotas. Capable of signaling interrupts.
\item \textbf{PMU\_overflow:} Mostly combinational with several internal registers. Interruption capable.
\item \textbf{RDC.sv:} Mainly combinational logic but it has several internal registers. Capable of signaling interrupts.\\
\end{itemize}
The following sections describe possible failure modes and potential mitigations. Each solution will result in a tradeoff between performance, resources, ease of use, and development time. Thus the recommended reliefs may change at a later date to match project goals.\\
\ No newline at end of file
\newpage
\section{Fail modes and proposed corrective actions}
\label{chapter2}
\subsection{General}
\begin{enumerate}
\item \underline{Fail mode:} Single transient events on rstn\_i will upset the whole design, this applies to all modules with resets sensitive to falling edges of the reset. \textbf{Very high priority}.\\
\underline{Corrective action:} Replace asynchronous reset with synchronous reset.\\
\\
\item \underline{Fail mode:} Event generators (Signal gathering on SoC) have not been hardened. Purely combinational CCS signal generators that suffer a transient event have little consequences, counts may be out by one. CCS signals that depend on sequential logic may be analyzed on further detail, since upsets may cause prolonged misbehavior on the resulting event signal. \textbf{Medium priority}\\
\underline{Corrective action:} Protect sequential logic with fault detection and user correction or hardware error correction and recovery. Transient errors on combinational logic are allowed due to their overall low impact, given the low probability of errors.
\end{enumerate}
\subsection{pmu\_ahb}
\begin{enumerate}
\item \underline{Fail mode:} Failure on AHB external signals may cause miss behaviors thought the unit. Transient errors on data, address or control signals may cause miss configuration, incorrect lectures and invalid internal states. AHB external signals could be hardened by Cobham Gaisler. \textbf{Low priority}.\\
\underline{Corrective action:} The unit assumes good behavior, even after a fault, from external interfaces. Thus adequate fault detection or fault correction shall be provided by the SoC design.\\
\\
\item \underline{Fail mode:} Single event upset (SEU) on slv\_reg configuration registers (Any feature) may cause complete unit failure. At least registers in range BASE\_CFG, BASE\_MCCU\_CFG, BASE\_MCCU\_WEIGHTS, BASE\_CROSSBAR to their respective END range shall be protected. Error detection is required. \textbf{High priority.}\\
\underline{Corrective action:} In this case, either error detection or error correction is recommended. On a resource-restricted system, we could use a hash function that is updated at each cycle. If changes in the hash function are detected at any other time than after an AHB write transaction, the IP can signal an error interrupt. When hardware resources are available, error correction is recommended through all configuration registers since it simplifies the use of the IP.\\
\\
\item \underline{Fail mode:} Single event upset (SEU) on slv\_reg result registers have different consequences depending on the instant of the event and the state of the system. Event upsets during a write request on BASE\_COUNTERS for instance may have a critical effect since it has effect over overflow interrupts, while the same upset on any idle state may be harmless since the upset will be cleared in the next cycle by the refresh of the unit. If only upsets on write are deem dangerous the issue can be handled by software forcing several reads after a write. The same happen for reads, temporal redundancy on software can mitigate the issue. Hardware solutions may be considered. \textbf{Priority medium}\\
\underline{Corrective action:} It is recommended to perform a read after each write to the PMU in order to detect transient errors on writes.\\
\\
\item \underline{Fail mode:} State machines depend on internally registered signals such as \textit{ state}, \textit{address\_phase.select}. Upsets may result on misbehavior regarding AHB protocol but also internal updates. The number of bits seems to be small and hardware redundancy may be feasible. \textbf{Priority high}\\
\underline{Corrective action:} Due to the small number of signals, it is recommended to triplicate the registers and implement a voting mechanism that allows error correction.\\
\end{enumerate}
\subsection{PMU\_raw}
\begin{enumerate}
\item \underline{Fail mode:} \textit{MCCU\_enable\_int} and \textit{RDC\_enable\_int} are internally registered. Reset is active low. While active, the values are updated each cycle based on the corresponding \textit{regs\_i} values. Given that \textit{regs\_i} has error detection against permanent errors on higher levels, transient faults may cause the unit to be disabled for a single cycle. \textbf{Priority Low}.\\
\underline{Corrective action:} Since a transient error will be self-corrected in the following cycles, and the consequences of the failures are not deemed catastrophic, additional protection can be ignored for most of the systems. In extreme cases, delayed sampling of the bus could be used to detect and recover from transients.\\
\\
\item \underline{Fail mode:} \textit{MCCU\_rstn} and \textit{RDC\_rstn} are internally registered. Reset is active low. While active, the values are updated each cycle based on the corresponding \textit{regs\_i} values and current reset state. Transient errors on \textit{regs\_i} during the clock's positive edge can cause the propagation of unexpected resets. \textbf{Priority high}.\\
\underline{Corrective action:}
Internally registered signals shall be replicated. Protection against transients on regs\_i can be provided by hardware at the driver side, detecting mismatches between the past output and the current output during periods without external updates. Another solution would be to add redundancy bits on the driver side and compare them at the receiver. On the receiver end, time-delayed sampling could be implemented. Only one mechanism is required.\\
\\
\item \underline{Fail mode:} The self-test configuration feature is implemented as a combinational block within this unit. While permanent failures are addressed at the signal source, the feature may still be affected by transient errors. If \textit{selftest\_mode} is disturbed, all the input events may take incorrect values for a single cycle. \textbf{Priority Low}.\\
\underline{Corrective action:}
Transient errors in these signals are a low priority since they will correct themselves. If transients need to be mitigated, error detection can be implemented at the driver side by checking that regs\_i remains stable unless a new ahb write to the corresponding slv\_reg registers occurs.
\\
\\
\item \underline{Fail mode:} This module's primary purpose is signal routing, thus point-to-point connections between the ports of \textit{PMU\_raw} and each of the PMU features. Given the combinatorial nature of the circuit, transient faults can occur. Such fail modes shall be considered and mitigated at the signals destination.\\
\underline{Corrective action:} Consumers of PMU\_raw signals shall include transient error detection if needed. Detection/correction can be placed at the source or destination of the signals.\\
\\
\end{enumerate}
\subsection{PMU\_counters}
\begin{enumerate}
\item \underline{Fail mode:} \textit{Softrst\_i} transient errors can occur. The unit shall handle error detection internally. One error can reset all counters. \textbf{Priority Medium}.\\
\underline{Corrective action:} Provide redundant signals from the source of Softrst\_i and do error detection within the module.\\
\\
\item \underline{Fail mode:} \textit{en\_i} transient errors can occur. The unit shall handle error detection internally. One error can disable the counters for a single cycle. \textbf{Priority low}.\\
\underline{Corrective action:}
Given that normal operation will be recovered after the transient priority is low. Nevertheless, en\_i can be replicated at the source. Error correction mechanisms can be added if needed. \\
\\
\item \underline{Fail mode:} \textit{we\_i }transient errors can cause the unit to miss a user update to the counter values. Counter values are internally registered as \textit{sly\_reg} and mirrored in the wrapper interface (\textit{pmu\_ahb.sv}). Missing a we\_i may cause metastability. If \textit{we\_i} is altered, the new value is not bypassed. In this scenario, the contents of the mirror and internal registers diverge. This failure mode is hard to detect since the unit will always swap two values around the mirror and the internal register. \textbf{Priority high}.\\
\underline{Corrective action:} Add a hardware check to detect if counter values decrease or increase by more than one without any reasonable cause such as reset, write, overflow. On a constrained system, error detection can be added by checking parity of the current value with a single-bit counter set and reset accordingly with the counter's initial values. As an alternative, extend the we\_i signal with error recovery mechanisms such as replication and voting.\\
\\
\end{enumerate}
\subsection{crossbar}
\begin{enumerate}
\item \underline{Fail mode:} Transient errors on \textit{vector\_i} may cause routing faults. Input events may end up assigned to the incorrect output for a single cycle.\textbf{ Priority low}.\\
\underline{Corrective action:}
Since transients shall occur at the clock's rising edge to allow the counters to register incorrect events, and the consequence is adding or dropping one event occurrence, priority is low. If needed, resource-constrained systems could compare a hash of vector\_i could be recorded and compared with the source register after each cycle. When an error is detected, the unit could sign an interrupt.\\
\\
\item \underline{Fail mode:} \textit{vector\_o} is internally registered. Values are updated at each cycle. Upsets may cause an incorrect output for a single cycle, but the failure will be cleared afterward. \textbf{Priority low}.\\
\underline{Corrective action:} Since vector\_o is updated at each cycle, no further action shall be taken to correct the upsets. Upsets may cause a single event to have an unreliable value for a single cycle. Software safety margins shall account for such small tolerances.\\
\\
\end{enumerate}
\subsection{MCCU}
\begin{enumerate}
\item \underline{Fail mode:} Transient errors on\textit{ enable\_i} may cause unintended updates of \textit{quota\_int} or missing updates. Propagation of transient errors on \textit{interruption\_quota\_o}. \textbf{Priority medium}.\\
\underline{Corrective action:} \textit{ enable\_i} could be replicated or registered and compared with the source in the following cycle.\\
\\
\item \underline{Fail mode:} Transient errors on \textit{events\_i} may cause \textit{events\_weights\_int} to have an incorrect value. This propagates to \textit{ccc\_suma\_int} and \textit{interruption\_quota\_o}. As far as failures are uncommon enough, faults can be absorbed by the margins implemented on the MCCU quota limits. \textbf{Priority Low}.\\
\underline{Corrective action:} Upsets may cause a single event to have an unreliable value for a single cycle. Software safety margins shall account for such small tolerances.\\
\\
\item \underline{Fail mode:} \textit{quota\_i} can suffer from transient errors. If such an error happens while \textit{enable\_i} is low and \textit{update\_quota\_i} is high, \underline{Fail mode:} \textit{quota\_int} will get miss configured. Users can detect faults by reading \textit{quota\_o}. A transient error may cause incorrect interrupts.\textbf{ Priority medium. }\\
\underline{Corrective action:} Configuration registers shall be read after a wite on the software side to ensure correct configuration. \\
\\
\item \underline{Fail mode:} \textit{quota\_int} is an internal register. It is forwarded to top modules with \textit{quota\_o}. \textit{quota\_int} can suffer permanent faults that are not cleared automatically. This failure mode may result in interrupts not triggering or triggering early. \textbf{Priority High}.\\
\underline{Corrective action:} Replicating the internal register \textit{quota\_int} has a low overhead. Two instances will provide error detection, but tree instances are recommended, allowing for seamless recovery if a failure occurs.\\
\\
\item \underline{Fail mode:}\textit{update\_quota\_i} transient errors can result in incorrect configurations due to unexpected or dropped updates. Misconfiguration affects \textit{quota\_int} and thus the resulting interrupts.\textbf{ Low priority.}\\
\underline{Corrective action:} Software can read quota\_o values after each write to ensure no transients have occurred.\\
\\
\item \underline{Fail mode:} \textit{quota\_o} is a wire that takes the value of \textit{quota\_int}. Given that \textit{quota\_int} is protected against permanent upsets, a transient error on the output line may cause incorrect readings for a single cycle.\textit{ Quota\_o} is not used as a control signal and does not affect interrupt generation. \textbf{Priority low}.\\
\underline{Corrective action:} \textit{quota\_o} is signaled to the user-accessible registers to provide more information. Several readings could be performed in quick succession and determine if there was an update. Note that the values will be updated at each cycle if the unit is active. If transients over this signal are a real concern for a particular implementation, hardware error detection is recommended. \\
\\
\item \underline{Fail mode:} \textit{events\_weights\_i} determines the contention impact of each MCCU input event. The source of this signal is the register bank in \textit{pmu\_ahb.sv}, and the source registers shall protect it against persistent errors. Transient errors could disturb the value of the weight for one signal. Such an event would cause quota mismatches of \textpm 128 cycles over the intended value for a single event upsets. \textbf{Priority Low}.\\
\underline{Corrective action:} Since the source register would have error detection and correction, transient errors would have a small effect. Software shall account for sporadical errors within the safety margins of the application. \\
\\
\item \underline{Fail mode:} \textit{ccc\_suma\_int} contains the addition of all active events at a particular cycle. The value is updated at every cycle based on the incoming events and weights. The value is used to determine the interrupt value and remaining quota. Errors could significantly change the available quota if the bit-flip occurs on the MSB or trigger unintended interrupts. The potential severity of the error depends on the number of input events and the register's width. \textbf{Priority High}.\\
\underline{Corrective action:} It is recommended to add error detection or correction by replication of the register.\\
\\
\end{enumerate}
\subsection{PMU\_overflow}
\begin{enumerate}
\item \underline{Fail mode:} \textit{softrst\_i} is signaled from a register outside the module. A permanent fault on this signal can render the unit disabled or rise unexpected interrupts. Permanent failures shall be prevented at the source register. \textbf{Priority High}.\\
\underline{Corrective action:} Source register shall provide error detection or correction based on its particular implementation.\\
\\
\item \underline{Fail mode:} \textit{softrst\_i} transient errors can clear the interruption vector if the error aligns with the clock's positive edge. \textbf{Priority Low}\\
\underline{Corrective action:} Reset signals could add redundancy bit to determine the intended value regardless of an upset. Error recovery is recommended.\\
\\
\item \underline{Fail mode:} \textit{en\_i }is signaled from a register outside the module. A permanent fault on this signal can render the unit disabled or rise unexpected interrupts. Permanent failures shall be prevented at the source register. \textbf{Priority High.}\\
\underline{Corrective action:} Enable signals could add redundancy bit to determine the intended value regardless of an upset. Error recovery is recommended.\\
\\
\item \underline{Fail mode:} \textit{en\_i} transient errors can cause glitches on interrupts. Since\textit{ en\_i} determines the value of \textit{unit\_disabled} and, consequently, the value of \textit{intr\_overflow\_o} and \textit{over\_intr\_vect\_o} . \textbf{Priority Low}\\
\underline{Corrective action:} Add redundancy bit to determine the intended value regardless of an upset. Recovery is recommended over detection to avoid conflicts of priority between interrupts.\\
\\
\item \underline{Fail mode:} \textit{counter\_regs\_i} can suffer from transient errors, and as a consequence, trigger or hide overflow signals. Most of the temporal errors would result in harmless scenarios that will correct themselves in the following cycles. Nevertheless, there is the potential to miss an overflow if the transient occurs while the counter reaches the maximum value, such scenario may but have severe effects. \textbf{Priority medium}.\\
\underline{Corrective action:}\\
\\
\item \underline{Fail mode:} \textit{over\_intr\_mask\_i} is signaled from a register outside the module. A permanent fault on this signal can render the unit disabled or rise unexpected interrupts. \textbf{ Priority high}.\\
\underline{Corrective action:} Permanent failures shall be prevented at the source register.\\
\\
\item \underline{Fail mode:} \textit{over\_intr\_mask\_i} transient errors can cause glitches on interrupts. Transient errors can change the values of \textit{past\_intr\_vect} (inducing a permanent error) by enabling overflow detection on signals that are not intended to be monitored. The transient shall occur at the clock's rising edge and trigger quota monitoring on a counter about to overflow. \textbf{Priority medium}.\\
\underline{Corrective action:} The failure mode is considered unlikely, and results on a fail safe scenario that could be handled by software. If needed hardware error detection can be added to \textit{over\_intr\_mask\_i} by checking a hash of the signal and the source register.\\
\\
\item \underline{Fail mode:} overflow transient errors can cause glitches on interrupts. Transient errors can change the values of\textit{ past\_intr\_vect} (inducing a permanent error) by recording unexpected interrupts on monitored counters. An error shall align with a positive edge of the clock. \textbf{Priority medium}.\\
\underline{Corrective action:} Overflow signal could be replicated and voted. Since it is one bit width for each counter hardware cost shall be acceptable.\\
\\
\item \underline{Fail mode:} \textit{unit\_disabled} transient errors can cause glitches on interrupts. Since it determines the value of \textit{intr\_overflow\_o} and \textit{over\_intr\_vect\_o} with combinational logic. \textbf{Priority low}.\\
\underline{Corrective action:} Interrupts remain high until they are cleared by software. Thus, a transient error may delay the actions of the processor by a cycle but will not cause significant consequences. No action is required.\\
\\
\item \underline{Fail mode:} \textit{intr\_overflow\_o} and \textit{over\_intr\_vect\_o} are susceptible to transients and can generate glitches on interrupt lines. \textbf{Priority low}.\\
\underline{Corrective action:} Interrupts remain high until they are cleared by software. Thus, a transient error may delay the actions of the processor by a cycle but will not cause significant consequences. No action is required.\\
\\
\end{enumerate}
\subsection{RDC}
\begin{enumerate}
\item \underline{Fail mode:} \textit{enable\_i} is signaled from a register outside the module. A permanent fault on this signal can render the unit disabled or rise unexpected interrupts. Permanent failures shall be prevented at the source register. \textbf{Priority High}.\\
\underline{Corrective action:} Fault-tolerance shall be granted by the source of the signal.\\
\\
\item \underline{Fail mode:} \textit{enable\_i} transient errors can cause permanent errors on \textit{interruption\_vector\_rdc\_o}, \textit{past\_interruption\_rdc\_o} ,\textit{ max\_value} and \textit{watermark\_int} if transients align with the clock. \textbf{Priority high}.\\
\underline{Corrective action:} Enable signals could add redundancy bit to determine the intended value regardless of an upset. Error recovery is recommended.\\
\\
\item \underline{Fail mode:} \textit{events\_i} transient errors can cause discrepancies on \textit{max\_value}. After a transient, depending on its nature, \textit{max\_value} can contain the value of two consecutive events or a fraction of the actual event. As a consequence, \textit{watermark\_int} and \textit{interruption\_vector\_int} may be disturbed. Transients must align with the positive edge of the clock. \textbf{Priority low}.\\
\underline{Corrective action:} Disturbances on the events would need to align with an exceedance of the event's max value to cause a problem. Most of the systems shall be able to accommodate such failure safely. Otherwise, events\_i shall be hardened at the source.\\
\\
\item \underline{Fail mode:} \textit{events\_weights\_i}, given protection against permanent faults on the source register, can suffer transient errors that may produce glitches on \textit{interruption\_vector\_int}.\textbf{ Priority low}.\\
\underline{Corrective action:} A glitch on events\_weights\_i can delay interrupts for one cycle or trigger an unexpected RDC interrupt. Overall the failure will be harmless, but it will require processing an additional interrupt. No corrective measures are required. \\
\\
\item \underline{Fail mode:} \textit{interruption\_rdc\_o }can suffer transients that will generate a glitch on the interrupt line. \textbf{Priority low}.\\
\underline{Corrective action:}
Overall the failure will be harmless, but it will require processing an unexpected interrupt. No corrective measures are needed. \\
\\
\item \underline{Fail mode:} \textit{interruption\_vector\_rdc\_o} can suffer of permanent faults. The current value of the signal depends on the previous, so faults have the potential to remain present over time. \textbf{Priority high}.\\
\underline{Corrective action:} Given the size of interruption\_vector\_rdc\_o it is recommended to add triple redundancy to enable error recovery.\\
\\
\item \underline{Fail mode:} \textit{watermark\_o }is a wire coming from register\textit{ watermark\_int}. Bitflips on the second register can produce permanent faults if the resulting value is higher than \textit{max\_value}. This may lead to incorrect profiling. \textbf{Priority High}.\\
\underline{Corrective action:} Error detection by duplicating the watermark registers is recommended.\\
\\
\item \underline{Fail mode:} \textit{past\_interruption\_rdc\_o} is a registered signal, and holds the previous state of the RDC interrupt. Its content can be affected by transients on \textit{rstn\_i} and \textit{enable\_i}. The error could clear RDC interrupts without user notice. \textbf{Priority medium}.\\
\underline{Corrective action:} Given the small size, adding error correction with redundancy and a voting mechanism is recommended.\\
\\
\item \underline{Fail mode:} \textit{max\_value} is an internal register. Its value depends on the previous value of its own. It has the potential to hold permanent upsets. Such fail mode could cause \textit{interruption\_rdc\_o} to become active and induce errors on the watermark register. \textbf{Priority high}.\\
\underline{Corrective action:} max\_value can only transition from its current value to the value plus one or to zero. So most upsets can be detected if a different transition is detected. Once the error is detected an interrupt can be signaled.\\
\\
\item \underline{Fail mode:} \textit{interruption\_vector\_int} is an internal wire. Upsets on the signal can induce permanent upsets on \textit{interruption\_vector\_rdc\_o} if they occur at the rising edge of the clock. \textbf{ Priority low}.\\
\underline{Corrective action:}
Given a failure in this signal, the most likely scenario is to trigger an unexpected interrupt or delay by one a legitimate interrupt. Both results can be accommodated by most of the systems, and no further action is required.\\
\\
\end{enumerate}
\ No newline at end of file
\newpage
\section{Dependencies}
\label{chapter3}
\subsection{pmu\_ahb}
Correct behavior depends on external AHB signals, event generators and \textit{PMU\_raw} outputs.
\subsection{PMU\_raw}
It depends on \textit{pmu\_ahb}, and assumes that registers driving the inputs are fault-tolerant to upsets. Outputs of the module are affected by the correct behavior of \textit{PMU\_counters}, \textit{PMU\_overflow}, \textit{MCCU}, and\textit{ PMU\_counters}.
\subsection{PMU\_counters}
The module depends on the correctness of \textit{pmu\_ahb} configuration register \textit{slv\_reg} and glitchless propagation of all module inputs through\textit{ PMU\_raw}.
\subsection{Crossbar}
The module depends on the correctness of \textit{pmu\_ahb} configuration register \textit{slv\_reg} and glitchless propagation of all module inputs through \textit{PMU\_raw}.
\subsection{MCCU}
This module depends on the correctness of \textit{pmu\_ahb} configuration register \textit{slv\_re}g and glitchless propagation of all module inputs through \textit{PMU\_raw}.
\subsection{PMU\_overflow}
This module depends on the correctness of \textit{pmu\_ahb} configuration register \textit{slv\_reg}. Glitchless propagation of all module inputs through \textit{PMU\_raw}, and correct behavior of \textit{PMU\_counters} is assumed.
\subsection{RDC}
This module depends on the correctness of \textit{pmu\_ahb} configuration register \textit{slv\_reg} and glitchless propagation of all module inputs through \textit{PMU\_raw}.
\newpage
\section{Fault Tolerance IPs}
\label{chapter5}
\subsection{Parity encoder / decoder}
\subsection{Hamming encoder / decoder}
\subsection{Reed Solomon encoder / decoder}
\subsection{Triple simultaneous voter}
\subsection{Triple time delayed voter}
\ No newline at end of file
CC=pdflatex
all: spec
spec: main.tex 1-Section.tex 2-Section.tex 3-Section.tex 4-Section.tex 5-Section.tex 6-Section.tex 7-Section.tex 8-Section.tex
$(CC) main.tex
clean:
rm *.aux *.log *.blg *.bbl *.out
clear:
rm *.aux *.log *.blg *.bbl *.out *.pdf
% License:
% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%----------------------------------------------------------------------------------------
% PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------
\documentclass[paper=a4, fontsize=11pt]{scrartcl} % A4 paper and 11pt font size
\usepackage[T1]{fontenc} % Use 8-bit encoding that has 256 glyphs
\usepackage{fourier} % Use the Adobe Utopia font for the document - comment this line to return to the LaTeX default
\usepackage[english]{babel} % English language/hyphenation
\usepackage{amsmath,amsfonts,amsthm} % Math packages
\usepackage{lipsum} % Used for inserting dummy 'Lorem ipsum' text into the template
\usepackage{caption}
\usepackage{subcaption}
\usepackage{graphicx}
\usepackage{float}
\usepackage{blindtext} %for enumarations
\usepackage[]{hyperref} %link collor
%talbe layout to the right
%\usepackage[labelfont=bf]{caption}
%\captionsetup[table]{labelsep=space,justification=raggedright,singlelinecheck=off}
%\captionsetup[figure]{labelsep=quad}
\usepackage{sectsty} % Allows customizing section commands
\allsectionsfont{\centering \normalfont\scshape} % Make all sections centered, the default font and small caps
\usepackage{fancyhdr} % Custom headers and footers
\usepackage{register} % Custom headers and footers
\pagestyle{fancyplain} % Makes all pages in the document conform to the custom headers and footers
\fancyhead{} % No page header - if you want one, create it in the same way as the footers below
\fancyfoot[L]{} % Empty left footer
\fancyfoot[C]{} % Empty center footer
\fancyfoot[R]{\thepage} % Page numbering for right footer
\renewcommand{\headrulewidth}{0pt} % Remove header underlines
\renewcommand{\footrulewidth}{0pt} % Remove footer underlines
\setlength{\headheight}{13.6pt} % Customize the height of the header
\numberwithin{equation}{section} % Number equations within sections (i.e. 1.1, 1.2, 2.1, 2.2 instead of 1, 2, 3, 4)
\numberwithin{figure}{section} % Number figures within sections (i.e. 1.1, 1.2, 2.1, 2.2 instead of 1, 2, 3, 4)
\numberwithin{table}{section} % Number tables within sections (i.e. 1.1, 1.2, 2.1, 2.2 instead of 1, 2, 3, 4)
%\setlength\parindent{0pt} % Removes all indentation from paragraphs - comment this line for an assignment with lots of text
\usepackage{enumitem}
\setenumerate[1]{label=\thesubsection.\arabic*.}
\setenumerate[2]{label*=\arabic*.}
\setlength\parskip{4pt}
%----------------------------------------------------------------------------------------
% TITLE SECTION
%----------------------------------------------------------------------------------------
\newcommand{\horrule}[1]{\rule{\linewidth}{#1}} % Create horizontal rule command with 1 argument of height
\title{
\normalfont \normalsize º
\horrule{0.5pt} \\[0.4cm] % Thin top horizontal rule
\huge SafePMU Fault Tolerance Measures\\ % The assignment title
\horrule{2pt} \\[0.5cm] % Thick bottom horizontal rule
}
\author{ Guillem Cabo Pitarch} % Your name
\date{\today} % Today's date or a custom date
\usepackage{listings}
\lstdefinelanguage{VHDL}{
morekeywords={
library,use,all,entity,is,port,in,out,end,architecture,of,
begin,and
},
morecomment=[l]--
}
\usepackage{textcomp}
\usepackage{xcolor}
\colorlet{keyword}{blue!100!black!80}
\colorlet{comment}{green!90!black!90}
\lstdefinestyle{vhdl}{
language = VHDL,
basicstyle = \ttfamily,
keywordstyle = \color{keyword}\bfseries,
commentstyle = \color{comment},
framexleftmargin = 15pt
}
\usepackage{caption}
\DeclareCaptionFont{white}{\color{white}}
\DeclareCaptionFormat{listing}{%
\parbox{\textwidth}{\colorbox{gray}{\parbox{\textwidth}{#1#2#3}}\vskip-4pt}}
\captionsetup[lstlisting]{format=listing,labelfont=white,textfont=white}
\lstset{frame=lrb,xleftmargin=\fboxsep,xrightmargin=-\fboxsep}
\begin{document}
%\nocite{*}
\maketitle % Print the title
\newpage
\tableofcontents
%----------------------------------------------------------------------------------------
% Section 1
%----------------------------------------------------------------------------------------
\input{1-Section.tex}
\input{2-Section.tex}
\input{3-Section.tex}
\input{4-Section.tex}
\input{5-Section.tex}
%----------------------------------------------------------------------------------------
% Section 2
%----------------------------------------------------------------------------------------
\end{document}
--Crossbar
\regfield{Output 6 [1:0]}{2}{30}{{00}}
\regfield{Output 5}{5}{25}{{00}}
\regfield{Output 4}{5}{20}{{00}}
\regfield{Output 3}{5}{15}{{00}}
\regfield{Output 2}{5}{10}{{00}}
\regfield{Output 1}{5}{5}{{00}}
\regfield{Output 0}{5}{0}{{00}}
\regfield{Output 12[3:0]}{4}{28}{{00}}
\regfield{Output 11}{5}{23}{{00}}
\regfield{Output 10}{5}{18}{{00}}
\regfield{Output 7}{5}{3}{{00}}
\regfield{Output 9}{5}{13}{{00}}
\regfield{Output 8}{5}{8}{{00}}
\regfield{Output 6 [4:2]}{3}{0}{{00}}
\regfield{Output 19 [0:0]}{1}{31}{{00}}
\regfield{Output 18}{5}{26}{{00}}
\regfield{Output 17}{5}{21}{{00}}
\regfield{Output 16}{5}{16}{{00}}
\regfield{Output 15}{5}{11}{{00}}
\regfield{Output 14}{5}{6}{{00}}
\regfield{Output 13}{5}{1}{{00}}
\regfield{Output 12[4:4]}{1}{0}{{00}}
\regfield{Reserved}{3}{29}{{x}}
\regfield{Output 24}{5}{24}{{00}}
\regfield{Output 23}{5}{19}{{00}}
\regfield{Output 22}{5}{14}{{00}}
\regfield{Output 21}{5}{9}{{00}}
\regfield{Output 20}{5}{4}{{00}}
\regfield{Output 19[4:1]}{4}{0}{{00}}
--OVERFLOW
\begin{register}{H}{Overflow interrupt enable mask}{0x064}
\label{over_cfg0}
\regfield{Reserved}{8}{24}{{x}}
\regfield{Input 23}{1}{23}{{00}}
\regfield{Input 22}{1}{22}{{00}}
\regfield{Input 21}{1}{21}{{00}}
\regfield{Input 20}{1}{20}{{00}}
\regfield{Input 19}{1}{19}{{00}}
\regfield{Input 18}{1}{18}{{00}}
\regfield{Input 17}{1}{17}{{00}}
\regfield{Input 16}{1}{16}{{00}}
\regfield{Input 15}{1}{15}{{00}}
\regfield{Input 14}{1}{14}{{00}}
\regfield{Input 13}{1}{13}{{00}}
\regfield{Input 12}{1}{12}{{00}}
\regfield{Input 11}{1}{11}{{00}}
\regfield{Input 10}{1}{10}{{00}}
\regfield{Input 9}{1}{9}{{00}}
\regfield{Input 8}{1}{8}{{00}}
\regfield{Input 7}{1}{7}{{00}}
\regfield{Input 6}{1}{6}{{00}}
\regfield{Input 5}{1}{5}{{00}}
\regfield{Input 4}{1}{4}{{00}}
\regfield{Input 3}{1}{3}{{00}}
\regfield{Input 2}{1}{2}{{00}}
\regfield{Input 1}{1}{1}{{00}}
\regfield{Input 0}{1}{0}{{00}}
\reglabel{Reset value}\regnewline
\end{register}
\begin{register}{H}{Overflow interrupt vector }{0x068}
\label{over_cfg1}
\regfield{Reserved}{8}{24}{{x}}
\regfield{Input 23}{1}{23}{{00}}
\regfield{Input 22}{1}{22}{{00}}
\regfield{Input 21}{1}{21}{{00}}
\regfield{Input 20}{1}{20}{{00}}
\regfield{Input 19}{1}{19}{{00}}
\regfield{Input 18}{1}{18}{{00}}
\regfield{Input 17}{1}{17}{{00}}
\regfield{Input 16}{1}{16}{{00}}
\regfield{Input 15}{1}{15}{{00}}
\regfield{Input 14}{1}{14}{{00}}
\regfield{Input 13}{1}{13}{{00}}
\regfield{Input 12}{1}{12}{{00}}
\regfield{Input 11}{1}{11}{{00}}
\regfield{Input 10}{1}{10}{{00}}
\regfield{Input 9}{1}{9}{{00}}
\regfield{Input 8}{1}{8}{{00}}
\regfield{Input 7}{1}{7}{{00}}
\regfield{Input 6}{1}{6}{{00}}