# ELEVEN WAYS TO BOOST YOUR SYNCHRONIZER

Salomon Beer, *Member*, IEEE, Ran Ginosar, *Senior Member*, IEEE Electrical Engineering Dept., Technion—Israel Institute of Technology, Haifa, Israel

Abstract— Synchronizers play an essential role in multiple clock domain systems-on-chip. The most common synchronizer consists of a series of pipelined flip-flops. Several factors influence the performance of synchronizers: circuit design, process technology and operating conditions. Global factors apply to the entire integrated circuit, while others can be adjusted for each individual synchronizer in the design. Guidelines are provided to improve synchronizers: Avoiding scan and reset, selecting minimum size flip-flop cells, minimizing routing, reducing jitter in coherent CDC, opting for HP process flavor and minimum  $V_{TH}$ , overprovisioning to account for variations, maximizing supply voltage and manipulating clock duty cycle.

*Index Terms*—Metastability, MTBF, multistage synchronizers, synchronization, synchronizer, tau effective.

#### I. INTRODUCTION

**C** YNCHRONIZERS play a key role in modern multiple clock System-on-chip (SoC) designs [1]. Such designs present thousands of clock domain crossings (CDC) where the system is prone to metastability errors. To mitigate those failures and ensure reliable signal transition between CDCs, synchronizers are used to convert domain timings. The type of synchronizer to be used for each CDC is determined by the specific properties of the two clock domains involved. Different classifications of CDC have been studied. In [1][2],[3],[4] the classification of CDC is based on their frequency and phase relations, such as mesochronous, plesiochronous and heterochronous CDC. The latter group may be further subdivided into ratiochronous and non-ratiochronous [5][6] CDCs. When there is no frequency and phase relationship, the clock domains are assumed mutually asynchronous. A different classification is based on clock sources [7]. Clocks are classified as non-coherent when they are sourced from different references and coherent when they share a common reference clock. The latter is the case when several phase locked loops (PLLs) are sourced from the same oscillator. For each category, specialized synchronizers have been developed to exploit the CDC relationship and ensure correct operation improving performance and reliability. In [8]-[12] synchronizers for mesochronous, plesiochronous and The *N*-flip-flop ratiochronous CDCs are proposed. synchronizer is usually employed in the asynchronous case [13]. The N-flip-flop synchronizer comprises a concatenated series of flip-flops as shown in Figure 1. This concatenated flip-flop structure not only can be used as a

standalone solution but it is also a central part in many other synchronizers such as FIFO synchronizers [14] and represents a critical part that has been studied intensively.



Figure 1. A typical N-flip-flop synchronizer synchronizer

The VLSI designer that is to use concatenated flip-flops in her circuit is usually faced with questions about how many stages to use in the N-flip-flop synchronizer. The designer who wishes to use flip-flops from a standard cell library would like to know what the parameters are that influence the MTBF (Mean Time between Failures) of the system before signing off the design. This knowledge is increasingly valuable in nanoscale SoC designs because several factors have emerged that challenge the reliability of synchronizers. In particular, the required number of synchronizers in a design is growing rapidly; the variability of semiconductor parameters, as well as the sensitivity to operational conditions, have increased with scaling. Prediction of MTBF in CDC depends on a variety of parameters, categorized as circuit parameters, process technology parameters and operating conditions parameters, as shown in Figure 2. The Circuit considerations include questions such as what the necessary number of stages to include in the synchronizer is, as well as what the appropriate flip-flops to use in each stage are.



Figure 2. Classification of factors affecting metastability. \* items are discussed in Sect. II, numbered ones are detailed in Sect. III *Italicized* items are global factors, **bolded** ones are design guidelines

Placement and routing of the flips-flops in the pipeline is also classified as a circuit consideration. Process relates to the choice of technology node as well the process family and variability of each node. The selection of the threshold voltage of the transistors can be considered a process property, but since modern technologies allow mixing different threshold levels in the same design we consider it a circuit/process property and is presented in the intersection of both areas. Operating conditions are frequencies of the CDCs, supply voltages, duty cycle and temperature. Jitter is considered in between process and operating conditions because it is affected by both. This classification can be sub-divided into factors affecting metastability in a global or local way. Global factors affect all transistors in the design in the same way, while the effects of local factors may vary for different CDCs within the same IC. Local parameters are bolded in Figure 2 while global ones are italicized. For global parameters we provide analytical insight on how they affect metastability. For the local, we provide guidelines for how to choose the flipflops forming the synchronizer and techniques that either improve reliability or prevent errors.

All the parameters described above are essential for determining the settling time-constants  $(\tau)$  and the aperture width  $T_W$  of the flip-flops within the synchronizer. Previously, [15] presented guidelines for how not to build a synchronizer. A decade later and after many publications on the topic we present ground rules of how to improve the performance of N-flip-flop synchronizers. The paper summarizes publications and accumulated industry expertise which we believe is useful for designers in order to optimize the benefit from their synchronizers. Sections III.A.1-III.A.3, III.B.6-III.B.8 contain new results. Section III.A.4, III.A.5, III.C.9-III.C.11 are based on previous publications and are analyzed here from a design perspective.

The paper is organized as follows. In section II we present a framework of the synchronization problem and introduce a baseline circuit. In section III we provide eleven rules to improve the performance of synchronizers and in section IV we conclude the work.

#### II. SYNCHRONIZATION FRAMEWORK

In this section we describe the synchronization framework of the N-flip-flop synchronizer. The equations and derivations of this section form a common ground for subsequent sections.

As stated above, most synchronizers include a N-flip-flop synchronizer comprising a pipeline of flip-flops. These concatenated flip-flops are designed to reduce the probability of synchronization failure.

Generally, to reduce the probability of failures, the number of flip-flops in the pipe is increased. Increasing the number of stages increases resolution time which decreases the chance of metastable state at the subsequent logic. When the number of stages increases, latency through the pipeline increases reducing performance. Thus, latency is traded off for failure probability. Usually the probability of failure of the N-flip-flop synchronizer is measured by the mean time between failures (*MTBF*):

$$MTBF = \frac{e^{S/\tau}}{T_W \cdot F_C \cdot F_D} \tag{1}$$

where  $F_c$  and  $F_D$  are the clock and data transition frequencies, S is a pre-determined time allowed for metastability resolution,  $\tau$  is the resolution time constant, and  $T_W$  is a parameter describing a vulnerable time window which is determined experimentally.  $T_W$  and  $\tau$  are intrinsic circuit parameters which depend on the flip-flops used in the synchronizer and on the technology. The resolution time (S) is determined by the number of flip-flops in the synchronizer. The larger the number of flip-flops the larger the resolution time allowed. Ignoring propagation and setup times, the resolution time is given by [14]

$$S = (N-1)T_C \tag{2}$$

where N is the number of flip-flop stages and  $T_C$  is the clock period of the receiving clock domain.

For each stage in the N-flip-flop synchronizer we consider a generalized flip-flop circuit, similar to the one shown in Figure 3. The circuit comprises a master and a slave latch. Each one of these latches is characterized by a resolution time constant  $\tau_i$  ( $i \in \{M, S\}$ ). The scheme in Figure 3 is an abstract scheme that serves as a framework and most flip-flop circuits are derivations of a similar form. We consider some of those derivations in the following sections.



Figure 3. Generalized Master-slave circuit

Based on the resolution time constant for each latch in a flip-flop, the overall effective resolution time constant for the flip-flop is given by [41]

$$\tau_{eff} = \left(\frac{\alpha}{\tau_M} + \frac{(1-\alpha)}{\tau_S}\right)^{-1}$$
(3)

where  $\alpha$  represents the duty cycle of the clock. Using this formula, a model for the resolution time constant of each latch can be obtained and then combined in (3). From small signal analysis,  $\tau_i$  ( $i \in \{M, S\}$ ) can be approximated by [14]:

$$\tau_i \propto \frac{C_Q}{g_m} \qquad i \in \{M, S\} \tag{4}$$

where  $C_Q$  includes the gate and diffusion capacitances of the metastable synchronizer nodes  $(Q_i, \overline{Q}_i, i \in (M, S))$  and the coupling capacitance between the gate and the source and drain of the transistors connected to the metastable nodes.  $g_m$ is the transconductance of the transitors in the latch.

Near metastability, the transistors operate in the linear region, and hence the transconductance can be approximated by:

$$g_{m} = g_{mn} + g_{mp} = \left(\mu_{n}C_{ox}\frac{W_{n}}{L}\frac{1}{1+\sqrt{a}} + \mu_{p}C_{ox}\frac{W_{p}}{L}\frac{\sqrt{a}}{1+\sqrt{a}}\right)(V_{DD} - |V_{ThP}| - V_{ThN})^{\alpha}$$
(5)

where  $a = \frac{\mu_n W_n}{\mu_p W_p}$ ,  $V_{TN}$  and  $V_{TP}$  are the transistor threshold voltage for the *N* and *P* transistors respectively,  $\mu_n$  and  $\mu_p$  are the electron and hole mobilities and  $\alpha$  is the velocity saturation index [16].

#### III. BOOSTING SYNCHRONIZERS

This section describes eleven methods to improve the performance of synchronizers. The methods are divided into three categories, circuit, process and operating conditions, with each section containing the boosting methods for each category as described in Figure 2. Minimum threshold voltage (#7) lies in between circuit and process categories and is described in the process sub-section. Jitter (#9) lies between process and operating conditions category and is described in the operating conditions sub-section. Temperature, included in operating conditions, cannot be directly manipulated by the designer and hence its impact is included in the process variation sub-section (#8). The influence of factors marked by \* in Figure 2 such as the number of stages, process node and frequencies are addressed in section II above. Since those parameters are included in (1), their influence on *MTBF* is straightforward.

## A. Boosting the Synchronizer Circuit

# 1. No Scan/BIST in synchronizer flip-flops

Scan is used in design-for-test (DFT) circuits. The objective is to make testing easier by providing a simple way to set and observe every flip-flop in the integrated circuit. In general, a scan-enable pin is added to each flip-flop. When that signal is asserted, all flip-flops in the design are connected in a long shift register. For this purpose, additional transistors are added to the standard flip-flop circuit. One example of such a circuit is shown in Figure 4 [17].



Figure 4. Scan D-flip-flop with nor gate

The scan path element receives its input either from the *D* input or from the previous scan element via *SDI*. It is controlled by the scan-enable signal *MODE*, and the NOR gate is transparent when scan is disabled (MODE=0). The scan flip-flop presented is only one example of many derivatives and topologies existing in industry applications and academic publications. Other variants of scan flip-flops are presented in [18],[19].

Most of these configurations produce a negative effect on metastability resolution and induce an increase in  $\tau$ . In the circuit of Figure 4, the capacitance of the metastable node of

the slave latch  $(Q_s)$  is increased by the diffusion capacitances of the transmission gate (TG1), generating a higher  $\tau$ , according to (4). To demonstrate the effect quantitatively we simulated the circuit of Figure 3, Figure 4 and [18]. The transistors were sized to enable comparison of the different circuits, following sizing in commercial libraries. Table I shows the results of circuit simulations confirming the increase in  $\tau$  for the scan flip-flops examined. The table includes simulations for  $\tau$  of master and slave latch and calculation of the effective  $\tau$  of the flip-flop based on a 50% duty cycle, following [41]. All  $\tau$  values are normalized to  $\tau_M$ of the circuit in Figure 3, and are simulated in functional rather than scan mode. The different flip-flop circuits do not affect the regeneration nature of the master latch and hence  $\tau_M$ is almost the same in all flip-flops. However, the slave latch is affected by the scan transmission gate and SDO inverter, significantly increasing both  $\tau_s$  and the resulting effective  $\tau_{eff}$ of the flip-flop. The increase in  $\tau_s$  for the circuit of Figure 4 with respect to Figure 3 is about 24%, which induces a significant decrease in MTBF. For instance, for a CDC with  $F_C = 500 Mhz$ ,  $F_D = 100 Mhz$ , using a two flip-flop synchronizer with  $\tau_{eff} = 55 \ psec$  and  $T_W = 30 \ psec$ , MTBF reduces from almost 130 years (for Fig 3) to one month (for Fig. 4), a reduction of three orders of magnitude.

| TABLE I<br>Normalized $\tau$ in Scan flip-flops (Simulations, 65nm CMOS) |          |         |              |  |
|--------------------------------------------------------------------------|----------|---------|--------------|--|
| CIRCUIT                                                                  | $\tau_M$ | $	au_s$ | $\tau_{eff}$ |  |
| Figure 3                                                                 | 1.00     | 1.07    | 1.04         |  |
| Figure 4 (scan)                                                          | 1.00     | 1.33    | 1.14         |  |
| [18] (scan)                                                              | 1.00     | 1.22    | 1.10         |  |

In summary, the use of flip-flops with scan capabilities has many benefits for IC test and quality control. However those benefits usually increase the effective capacitance in metastable prone latches, increasing  $\tau$  and reducing synchronizer *MTBF*. Thus, we recommend avoiding the use of flip-flops with scan capabilities in synchronizers when possible.

# 2. No Reset in synchronizer flip-flops

Another topology frequently available in flip-flops is the use of asynchronous reset. The main advantage of this technique is to force the circuit into a known state in order to initialize hardware. There exist many different circuits that implement reset. One implementation of such a circuit is presented in Figure 5. When the flip-flop is reset (RST = 0), the *N*-type transistor discharges node  $\overline{Q_M}$ , setting  $\overline{Q_M} = 0$ ,  $Q_M = 1$ , and the node  $Q_S$  is charged through the *P*-type transistor setting  $\overline{Q_S} = 0$ , and Q = 0. The two reset transistors add parasitic capacitances to nodes  $\overline{Q_M}$  and  $\overline{Q_S}$ , and, following (4), increase  $\tau$  for both the master and the slave latches of the flip-flop. Table II provides simulation results comparing  $\tau$  for the circuits of Figure 3 and Figure 5. The results confirm the prediction that the reset transistors induce an increase in both  $\tau_M$  and  $\tau_S$ . The overall increase is of about 6% in the effective

 $\tau$ , and should be accounted for by the designer. In other circuit implementations of asynchronous or synchronous reset, this increment may be more pronounced.



Figure 5. Flip-flop with asynchronous reset

| TABLE II                                         |  |
|--------------------------------------------------|--|
| Normalized $	au$ in asynchronous reset flip-flop |  |
| (SIMULATIONS, 65NM CMOS)                         |  |
|                                                  |  |

| CIRCUIT          | $	au_M$ | $	au_S$ | τ    |
|------------------|---------|---------|------|
| Figure 3         | 1.00    | 1.07    | 1.04 |
| Figure 5 (reset) | 1.06    | 1.17    | 1.11 |

In summary, the use of asynchronous reset needs to be considered by the designer. The impact on synchronizer flipflops needs to be evaluated. To achieve a minimum  $\tau$ , asynchronous reset should be avoided.

# 3. Minimum flip-flop cell size

One of the challenges facing the design engineer is to determine which flip-flop cell size from the available library to use in the synchronizer.

According to (4),  $\tau$  is affected by both the capacitance of the latch and its transconductance. Increasing gate size will increase both its capacitance and its  $g_m$ . In a first order approximation both changes cancel out and the value of  $\tau$ remains unchanged. Second order effects, especially the external load connected to the latch, should be considered to determine appropriate sizing. If the VLSI designer can determine the size of the each transistor inside the library flipflop, then the loads on each latch should be chosen small in order to decrease  $\tau$ . For the circuit of Figure 3, that would imply reducing the size of transistors of transmission gate TG1 and inverter INVD for the master latch and TG1 and inverter INVQ for the slave latch. However, in general the designer cannot directly affect the sizing of the internal inverters in the flip-flop but has to choose from a pre-defined list of sizes that represent a general measure for the cell size. In most digital libraries, the differently sized flip-flop cells are optimized so that they handle different fan-out loads without drastically increasing the delay of the flip-flop. This is generally achieved by increasing the size of the output stage inverter (INVO) for the different cell sizes. The internal portions of the flip-flops, however, are typically unchanged among these differently sized cells. Thus, the increased INVQ size dramatically loads the slave latch, increasing its  $\tau$ , when the flip-flop cell size is increased.

Figure 6 shows  $\tau$  for different flip-flop sizes available in the

simulated library. The library provides five sizing options for flip-flop cells:  $\times 3$ ,  $\times 5$ ,  $\times 10$ ,  $\times 15$  and  $\times 20$ , the only difference between the cells being the size of the INVQ transistors.  $\tau_s$  increases almost linearly with the increase in cell size. Since  $\tau_M$  is not affected by INVQ, the resulting  $\tau$  based on (3) increases as well.

In summary, the use of the smallest available flip-flops in the library is encouraged in order obtain minimum  $\tau$  and maximum *MTBF*. This sometimes counter-intuitive guideline should be used for all library flip-flop cells in the synchronizer.

As a final remark we note that careful attention should be made when applying this guideline since even though this trend is valid in all reviewed libraries, there may exist other libraries were flip-flop cell sizing may behave differently. Some due diligence is encouraged.



Figure 6. Normalized  $\tau_s$  and  $\tau_{eff}$  vs. cell sizes in library

#### 4. Minimum routing between flip-flops

To achieve desired MTBF values, high speed synchronizers are usually built as pipelines of N flip-flops (Figure 7). From (1) the resolution time (S) is determined by [14]:

$$S = (N - 1) \cdot (T_c - t_{co} - t_{pd} - t_{su})$$
(6)

where  $t_{CQ}$  is the clock-to-Q propagation delay of each flip flop in the synchronizer,  $t_{pd}$  is the routing delay to the next flip-flop in the pipeline,  $t_{su}$  is the setup time and  $T_c$  is the clock period ( $T_c = 1/f_c$ ). When  $T_c$  is long compared to  $t_{CQ}$ ,  $t_{pd}$  and  $t_{su}$  they can be neglected. However, when the receiving domain frequency is high, they should be taken into account. In order to increase *S* to the maximum possible value, the IC designer needs to reduce the routing delay  $t_{pd}$  to a minimum. This can be achieved by imposing stringent constraints on these delays (Figure 7) and by placing them closely together.



Figure 7. Placing and routing constraints for multistage synchronizer

#### 5. CDC coherency

When the two clocks are related in frequency or phase, the standard *N*-flip-flop pipeline synchronizer may result in much lower MTBF than predicted by (1).

When no frequency or phase relationship is assumed, the CDC is considered asynchronous and a *N* flip flop pipeline synchronizer is used. However, as shown in [42][43], the *MTBF* of a CDC can be worse than (1) when an asynchronous CDC is incorrectly assumed. Thus, understanding the nature of the CDC and selecting the appropriate synchronizer for it is eritical.

A metastability event may occur when data and clock signals at the input of a flip-flop or latch toggle within a certain time window (W). If the system constraints do not allow data and clock to toggle within the W window, metastability is avoided. If for every cycle the toggle occurs within the W window, the probability of failure increases drastically. In [42][43] it is shown that the time differences between clock and data in coherent clock domains (where the two clocks are generated from a common source) can achieve only discrete possible values. For example, when the frequency of the two clock domains are  $f_d=125$ Mhz and  $f_c=$ 150Mhz, the clock-data phase can achieve only five possible values, as shown in Figure 8. If the metastability window (blue) happens to fall in between phases, the probability of metastability is very low and negligible. On the other hand, if the metastability window happens to overlap one of the phases (red), the probability of failure may be higher, and the MTBF may be lower, than predicted by (1). This is because (1) assumes a uniform distribution of the clock-data time phases differences, while in coherent CDC the phase distribution may be non-uniform as is shown in Figure 8. The exact form of the phase probability distribution in the case of coherent clock domains is further discussed in [43] along with mitigation techniques to reduce MTBF in such cases.

In coherent clock domains, the relative position of the phase distribution relative to the metastability window defines the MTBF. Defining  $Q = f_d/\gcd(f_d, f_c)$ , and  $\sigma$  being the clock jitter. When  $T_c > 2Q\sigma$ , the phase represents a non-uniform distribution, having maxima and minima. This happens because the distance between the ideal phase positions ( $T_c/Q$ ) is larger than the standard deviation ( $\sigma$ ) of the jitter and the maxima are well separated. Only when  $T_c < 2Q\sigma$ , can the phase be approximated by a continuous uniform distribution and (1) holds (Figure 9).

In summary, formula (1) may not always apply and may not provide a lower bound on *MTBF*. In coherent CDC cases,

special caution must be exercised when assessing MTBF. Specific measures for addressing this issue are described in [43].



Figure 8. Clock-data phase histogram for  $f_d$ =125Mhz and  $f_c$  = 150Mhz



Figure 9. Clock-data phase probability density function diagram (a) for  $T_c > 2Q\sigma$ . (b)  $T_c < 2Q\sigma$ 

# B. Boosting the Process Technology

## 6. Process flavor

The selection of the process technology to fabricate an IC has diverse criteria, power and performance being the most critical ones. Foundries provide a variety of process families tailored to different needs that are often denominated *process flavors*. The aim of this sub-section is to analyze the different flavors with respect to metastability performance.

Process flavors differ in terminology and type depending on the vendor. A popular classification divides the technology node into low-power (LP) and high-performance (HP) flavors. However, in modern technologies, more detailed classifications are available. In [21] two classifications are added, the low-power-high-k metal gates (HPL) and the highperformance-for-mobile (HPM) flavors. In [22] three flavors are available, LP, high-performance-low-power (HLP) and HPM, while in [23] the denominations are super-low-power (SLP), low-power-high-performance (LPH) and highperformance-plus (HPP).

Physical factors affecting the different flavors include nominal supply voltage, threshold voltage of the transistors, gate fabrication and stress memorization techniques (SMT). The exact 'ingredients' behind each process flavor depend on the foundry vendor and are a combination of the above mentioned factors. The exact proportion of each factor is usually a carefully guarded secret. An additional important factor that influences future designs is the use of non-planar transistor architectures. We now analyze how each factor affects metastability parameters.

The effect of supply voltage and threshold voltage on metastability is evaluated in separate sub-sections below since they can vary within the chip, due to multiple power domains on chip, or multi-threshold circuits within the same IC.

Gate fabrication refers to how the transistor gate and gate dielectric are generated. Oxynitride gate dielectrics [24] have been employed for many years, where the silicon oxide dielectric is infused with a small amount of nitrogen. The nitride content raises the dielectric constant and increases resistance against dopant diffusion through the gate dielectric [25]. In recent years, high-k dielectrics were introduced in conjunction with metal gates [26]. The exact material employed for the dielectric varies between foundries and is a topic of constant research.

The gate oxide in a MOSFET can be modeled as a parallel plate capacitor [28]:

$$C_{ox} = \frac{\kappa \,\varepsilon_0}{t} A \tag{7}$$

where A is the capacitor area, t is the thickness of the capacitor oxide insulator,  $\varepsilon_0$  is the permittivity of free space and  $\kappa$  is the relative dielectric constant of the utilized material. The value of  $\kappa$  ranges from 3.9 in silicon dioxide to almost 80 for high-k materials [20]. The high permittivity ( $\kappa$ ) of the high-k dielectrics allow the device engineer to achieve higher capacitances and current while keeping the dielectric thicker significantly reducing gate leakage. From (4), (5) and (7) we conclude that using high-k dielectrics increases  $C_{ox}$ , increasing  $g_m$  and reducing  $\tau$ .

Enhancement of channel mobility in high-k/metal gate transistors is achieved by channel strain engineering. In general, the application of tensile strain in NMOS and compressive strain in PMOS channel enhances device performance [29]. These stress memorization techniques affect mobility, which, by (4) and (5), affect  $\tau$ .

While high-k/metal gate technologies and strained silicon play a significant role in today's fabrication process, evolution to non-planar transistor architectures is expected as scaling advances. Three-dimensional (3D) transistors, such as trigate [30][26] or FinFET [31] will become important to solving further short channel effects and to improving performance. 3D transistors impact on metastability should be evaluated by means of the effective  $g_m$  and  $C_Q$  (4).

Table III shows simulations for a commercial 65nm process comparing LP and HP flavors. The comparison has been performed under the same supply voltage and standard threshold voltage conditions. The results are normalized to the  $\tau$  HP value. The LP flavor shows a  $\tau$  of almost 3.5 times higher compared to HP. This enormous difference cannot be neglected, especially when migrating circuits among different technology flavors. If the designer can choose the process flavor, HP is preferred from a maximum MTBF point of view.

| TABL                   | E III         |      |
|------------------------|---------------|------|
| au VALUES FOR DIFFEREN | T PROCESS FLA | VORS |
| (SIMULATIONS, 6        | 65NM CMOS)    |      |
| FLAVOR                 | τ             |      |
| LP                     | 3.4           |      |
| HP                     | 1.0           |      |

In summary, the choice of a process flavor has a large influence on metastability as  $\tau$  and *MTBF* are directly influenced by gate fabrication techniques and SMT. When process flavor changes,  $\tau$  should be reevaluated, and the number of stages in all synchronizers need to be re-examined in order to maintain the desired *MTBF* as predicted by (1).

## 7. Minimum threshold voltage ( $V_{TH}$ )

While traditionally a single level of transistor threshold voltage was available in the chip, as determined by the fabrication process, modern technologies offer a choice of a variety of threshold voltages for different transistors in the same IC. Multi-threshold CMOS (MTCMOS) technology has emerged as an increasingly popular technique to reduce leakage power in high performance ICs [35][36]. The choice of which threshold voltage to use is a compromise between performance and power. Lowering the threshold voltage generates faster transistors (5) with higher leakage currents, while increasing the threshold reduces leakage but slows down the transistors. In modern technologies the choice of  $V_{TH}$  is usually made among three of five predetermined values such as ultra-low, low, standard, high and ultra-high  $V_{TH}$ . As determined by (5), using low threshold transistors increases  $g_m$  and reduces  $\tau$ . Simulations of such examples can be seen in Table IV. Three different threshold levels were simulated generating different  $\tau$  values. The lowest value is achieved for the lowest  $V_{TH}$ . In summary, for the flip-flops forming the synchronizer, transistors with minimum  $V_{TH}$  are preferred in order to reduce  $\tau$ .

TABLE IV τ VALUES FOR DIFFERENT THRESHOLD VOLTAGES (SIMULATIONS, 65NM CMOS)

| FLAVOR | HP   | LP   |
|--------|------|------|
| LVT    | 1    | 2.4  |
| SVT    | 1.12 | 3.78 |
| HVT    | 1.53 | 10.9 |

#### 8. Process variations

An important challenge facing the IC designer when considering synchronization is the number of flip-flop stages to use in order to achieve certain reliability (*MTBF*). The number of such stages ( $N_S$ ), following (1) and (6), is given by:

$$N_S = \left[\frac{\tau \cdot ln(MTBF \cdot f_c \cdot f_d \cdot T_W)}{T_c - t_{cQ} - t_{pd} - t_{su}}\right] + 1$$
(8)

However, the designer, who must also assure correct operation under process, supply voltage and junction temperature variations (*PVT*), needs to account for this variability when calculating the number of stages of the synchronizer. The need for over-provisioning in the number of stages comes at the cost of augmented latency and power. The challenge is to determine the minimum over-provisioning needed to provide acceptable *MTBF* with the minimum number of stages. Assuming that under worst-case (*wc*) *PVT* conditions the value of  $\tau$  becomes  $\tau^{wc}$  and the value of  $T_W$  becomes  $T_W^{wc}$ , the number of needed stages for wc ( $N_S^{wc}$ ) becomes:

$$N_{S}^{wc} = \frac{\tau^{wc}}{\tau^{nom}} (N_{S}^{nom} - 1) + 2$$
(9)

The constant 2 in (9) is due to [x] < x + 1. The equation indicates that the number of stages needed in *wc* may be much higher than in nominal operation. In (9) we have neglected the influence of the change of  $T_W$  in *wc PVT* since its effect is minor compared to the effect of  $\tau$ .

In modern technologies, process variation can be high resulting in large  $\tau$  variation [37][38][39]. Figure 10 shows  $\tau$ simulations versus supply voltage for different process corners, fast-fast (FF), typical-typical (TT) and slow-slow (SS) process corners, which constitute  $\pm 3\sigma$  deviations. The  $\tau$ variability can be of several tens of percent. The ratio between  $\tau_{SS}$  and  $\tau_{TT}$  ranges 0.4-0.7 for the supply voltages studied. In nominal  $V_{DD}$ ,  $\tau_{SS}$  is near half of  $\tau_{TT}$ . On the other hand the ratio between  $\tau_{FF}$  and  $\tau_{TT}$  is in the range of 2.2-1.5 as is shown in Figure 11.

 $T_W$  simulations versus supply voltage for different process corners are shown in Figure 12. Note that simulation precision is limited. However, the figure demonstrates that  $T_W$ variations over process and supply voltage are bounded. Further,  $T_W$  influences *MTBF* only linearly (while  $\tau$  influence is exponential (1)). Hence, computing  $T_W$  at, e.g., nominal voltage and typical process corner and assuming twice that value as upper bound on  $T_W$  in (1) is acceptable.

Temperature influence on  $\tau$  depends on the supply voltage and threshold of the transistors. A complete study was introduced in [40][33]. Both  $\mu$  and  $V_{TH}$  decrease with increasing temperature [44][44], however decreasing  $\mu$ increases  $\tau$  while decreasing  $V_{TH}$  decreases  $\tau$ . When the impact of a change in  $\mu$  on  $\tau$  is larger than the impact of a change in  $V_{TH}$  on  $\tau$ , increasing temperature causes an increase in  $\tau$ . Conversely, when the impact of  $V_{TH}$  dominates over that of  $\mu$ , increasing temperature causes a decrease in  $\tau$ . The dominant factor is determined by the ratio of the supply voltage to the threshold voltage. In modern technologies, where multiple supply voltages can be selected and  $V_{DD}$ approaches the value of  $2V_{TH}$ ,  $\tau$  decreases when the temperature is increased. When supply voltage is high and threshold voltage is low, the trend is reversed and  $\tau$  increases with temperature. Simulations of  $\tau$  versus temperature for different supply voltages are shown in Figure 13 illustrating this effect.



Figure 10. Normalized  $\tau$  simulations for different process corners.



Figure 11.  $\tau/\tau_{TT}$  vs.  $V_{DD}/V_{DDnom}$  simulations for different process corners.



Figure 12. Normalized  $T_W$  simulations for different process corners



Figure 13. Normalized  $\tau$  simulations versus temperature for nominal and reduced supply voltage

The last PVT factor to consider, supply voltage, is examined in Section III.C.10 below. We note that when calculating the number of stages using (8), the designer must bear in mind that usually when the supply voltage is decreased, clock frequencies should decrease in order to fulfill critical path timings. As a consequence, the value of  $T_c$  in (8) increases, reducing the number of stages in the worst case.

In summary, the number of stages in the synchronizer must account for worst case *PVT* variations using equation (9). A summary of  $\tau$  relations for different corners is shown in Table V.

TABLE V $\tau$  values for different process corners at nominal VDD<br/>(Simulations, 65nm CMOS)

| CORNER | τ    |
|--------|------|
| FF     | 0.57 |
| TT     | 1.0  |
| SS     | 1.81 |

# C. Boosting the synchronizer operating conditions

# 9. Reduce jitter

Noise in ICs is manifested as jitter in signals. In coherent clocks CDC, when jitter is present, and if it is normally distributed  $N(0, \sigma^2)$ , Figure 8 turns into Figure 14 [43]. As seen above (sub-section A.5), the overlap between the metastability window and the phase peaks determine the probability of failure. The number of phase peaks Q and the value of jitter determine the spread of each peak and the nature of overall phase distribution. When  $T_c < 2Q\sigma$ , the overall phase distribution can be approximated by a continuous uniform distribution and (1) holds. When  $T_c > 2Q\sigma$ , the distribution is non-uniform (Figure 14) presenting maxima and minima. In this case the amount of jitter is critical. When the system is in a distribution valley (blue window), reducing the jitter can result in almost negligible metastability. This is exemplified in [43].

In summary, in coherent clock CDC, jitter should be minimized.



Figure 14. Clock-data phase histogram for  $f_d$ =125Mhz and  $f_c$  = 150Mhz, Q=5, with Gaussian noise jitter

#### 10. Supply voltage

The selection of supply voltage is a significant system decision which directly determines performance and power of a circuit. As shown in [32],[33] supply voltage has a substantial effect on  $\tau$ . Supply voltage may vary within an IC due to different power domains on chip and also dynamically by means of techniques such as dynamic voltage scaling and due to IR drops. Figure 15 presents simulations of normalized  $\tau$  versus normalized supply voltage. When voltage is reduced by 15% from nominal  $V_{DD}$ ,  $\tau$  increases more than 3.5 times its nominal value. This can be explained by means of (5): When  $V_{DD}$  is decreased,  $g_m$  decreases, which according to (4), generates an increase in  $\tau$  [34]. Hence, when the supply voltage is decreased for power considerations, the repercussion on MTBF should be considered. In summary the use of higher supply voltages is recommended when synchronization issues are critical.



Figure 15. Simulations of normalized  $\tau$  vs. normalized supply voltage.

#### 11. Duty cycle

Flip-flops used in synchronizers are made of two concatenated latches as shown in Figure 3. Since (1) only accounts for one value of  $\tau$ , the imperative question is what value of  $\tau$  should be used, the master's or the slave's? Recent advances on multistage metastability modeling refined equation (1) to a generalized form for multistage flip-flops [41][45]:

$$MTBF(N) = \frac{1}{T_W(N)f_D f_C} \exp\left(\frac{NT}{\tau_N}\right)$$
(10)

where  $\tau_N$  is defined by:

$$\frac{1}{\tau_N} = \frac{1}{N} \sum_{1}^{N} \frac{1}{\tau_{eff}(i)} \tag{11}$$

and  $\tau_{eff}(i)$  is the effective  $\tau$  of flip-flop *i*, defined in (3). From (10),(11) and (3), it is clear that the resolution time of both the master and the slave should be considered. Flip-flop designers should not improve the design of the master in detriment of that of the slave and vice versa.

An interesting aspect of (3) is its dependence on duty cycle. Since duty cycle can be changed after fabrication, it can be used retroactively to minimize  $\tau_{eff}$ , maximizing MTBF. Figure 16 shows  $\tau_{eff}$  as a function duty cycle and Figure 17 shows MTBF as function of duty cycle. The variations of MTBF due to duty cycle can span several orders of magnitude, motivating calibration of circuits after fabrication. In Figure 16,  $\tau_{eff}$  for different ratios of  $\tau_M$  and  $\tau_S$  is shown. All plots are normalized to  $\tau_M$  which remains unchanged in all cases. When  $\tau_M = \tau_S$  the duty cycle does not influence the effective  $\tau$  and its value is constant for every  $\alpha$ . When  $\tau_s > \tau_s$  $\tau_M$  increasing duty cycle reduces  $\tau_{eff}$ , while when  $\tau_S < \tau_M$ decreasing duty cycle increases  $\tau_{eff}$ . The scenario of difference between  $\tau_M$  and  $\tau_S$  is possible even when master and slave are designed with the same  $\tau$  but after fabrication a large mismatch appears due to process variations.

In summary, clock duty cycle may be adjusted to improve *MTBF*.



Figure 16.  $\tau_{eff}/\tau_{M}$  vs. duty cycle for different  $\tau_{s}/\tau_{M}$  ratios.



Figure 17. *MTBF* vs. duty cycle for different  $\tau_s/\tau_m$  ratios.

# IV. CONCLUSIONS

In this paper we presented guidelines and techniques to improve the performance of *N*-flip-flop synchronizers in order to achieve minimum  $\tau$  and maximum *MTBF*. We described the tradeoff between capacitance and trans-conductance in the synchronizer nodes that is useful to evaluate other aspects affecting metastability. The factors affecting metastability were divided into circuit, process and operating conditions and presented boosting techniques for each. We distinguish global factors that affect the complete IC from design guidelines that can be applied for individual synchronizers. For the global perspective, we advocate fabrication in high performance flavor technologies with the minimum threshold voltage allowed by the process. Increasing the supply voltage improves metastability resolution. The use of scan and reset flip-flops should be avoided when possible in order to improve reliability. The minimum flip-flop cell size should be selected to improve  $\tau$ . We have shown a formula to account for process variability and analyzed the minimal required amount of overprovisioning. Since the master and slave latches may have different  $\tau$ , overall *MTBF* is affected by the clock duty cycle. Clean power supplies and low jitter clock distribution networks provide better reliability in coherent clock domains where several clocks are sourced from a common oscillator.

#### ACKNOWLEDGMENT

The work of Salomon Beer was supported in part by HPI institute for scalable computing.

#### REFERENCES

- R. Ginosar, "Metastability and synchronizers: A tutorial," IEEE Design & Test of Computers, vol. 28, no. 5, pp. 23-35, 2011.
- D.G. Messerschmitt, "Synchronization in Digital System Design," IEEE J. Selected Areas in Communications, 8(8):1404-1419,1990.
- [3] P. Teehan, M. Greenstreet, G. Lemieux, "A Survey and Taxonomy of GALS Design Styles," IEEE Design & Test of Computers, 24(5):418-428, 2007.
- [4] W.J. Dally and J.W. Poulton. *Digital systems engineering*. Cambridge university press, 1998.
- [5] M.W. Heath, W.P. Burleson, and I.G. Harris, "Synchrotokens:A Deterministic GALS Methodology for Chip-Level Debug and Test," IEEE Trans. Computers, 54(12):1532-1546, 2005.
- [6] J. M. Chabloz and A. Hemani, "A Flexible Interface for Rationally Related Frequencies," ICCD, pp. 109-116, 2009.
- [7] S. Beer, R. Ginosar, R. Dobkin, Y. Weizman, "MTBF Estimation in Coherent Clock Domains," Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), pp.166,173, May 2013
- [8] U. Frank, T. Kapschitz and R. Ginosar, "A Predictive Synchronizer for Periodic Clock Domains," Formal Methods in System Design, 28(2):171-186, 2004.
- [9] R. Kol and R. Ginosar, "Adaptive Synchronization", ICCD, 1998.
- [10] W.K. Stewart, S.Ward, "A solution to a special case of Synchronization Problem," IEEE Trans. Comp., 37(1):123-125, 1988.
- [11] L.F.G. Sarmenta, G.A. Pratt, S.A. Ward, "Rational clocking," ICCD, 271-278, 1995.
- [12] W.J. Dally, S.G. Tell, "The Even/Odd Synchronizer: A Fast, All-Digital, Periodic Synchronizer," ASYNC, pp. 75-84, 2010.
- [13] L. Kleeman and A. Cantoni, "Metastable behavior in Digital Systems", IEEE Design & Test of Computers, 4(6), 4-19, 1987.
- [14] D. Kinniment, Synchronization and Arbitration in Digital Systems, Wiley 2007.
- [15] R. Ginosar, "Fourteen ways to fool your synchronizer," Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), pp. 89-96, 2003.
- [16] K. Kanda, K. Nose, H. Kawaguchi, and T. Sakurai, "Design Impact of Positive Temperature Dependence on Drain Current in Sub-1-V CMOS VLSIs", IEEE J. Solid-State Circuits, vol. 36, no. 10, October 2001.
- [17] S. Gerstendorfer and H. Wunderlich, "Minimized power consumption for scan-based BIST," Proceedings. International Test Conference, vol. 77, no. 88, 1999
- [18] S. Ganesan and S.P. Khatri "A Modified Scan-D Flip-flop Design to Reduce Test Power", 15th IEEE/TTTC International Test Synthesis Workshop (ITSW), 2008
- [19] N. Parimi and S. Xiaoling, "Design of a low-power D flip-flop for testper-scan circuits," Electrical and Computer Engineering, Canadian Conference on , vol.2, no., pp.777,780 Vol.2, 2-5 May 2004.
- [20] G. D. Wilk, R. M. Wallace, and J. M. Anthony, "High-κ gate dielectrics: Current status and materials properties considerations", Journal of Applied Physics, vol. 89, no. 10, 2001.
- [21] TSMC 28nm document description http://www.tsmc.com/english/dedicatedFoundry/technology/28nm.htm
- [22] UMC 28nm document description

http://www.umc.com/english/process/i.asp

- [23] Global foundries 28nm document description http://www.globalfoundries.com/technology/28nm.aspx
- [24] S. Chakraborty, T. Yoshida, T. Hashizume, H. Hasegawa, and T. Sakai, "Formation of ultrathin oxynitride layers on Si(100) by low-temperature electron cyclotron resonance N2O plasma oxynitridation process", Journal of Vacuum Science Technology. vol. 16, 1998.
- [25] Yugami, J.; Tsujikawa, S.; Tsuchiya, Ryuta; Saito, S.; Shimamoto, Yasuhiro; Torii, K.; Mine, T.; Onai, T., "Advanced oxynitride gate dielectrics for CMOS applications,", Extended Abstracts of International Workshop on Gate Insulator (IWGI), vol. 140, no 145, 2003
- [26] S. Datta, "Recent Advances in High Performance CMOS Transistors: From Planar to Non-Planar." Electrochemical Society Interface, 2013.
- [27] N. Agrawal; Y. Kimura, R. Arghavani; S. Datta, "Impact of Transistor Architecture (Bulk Planar, Trigate on Bulk, Ultrathin-Body Planar SOI) and Material (Silicon or III–V Semiconductor) on Variation for Logic and SRAM Applications," Electron Devices, IEEE Transactions on , vol.60, no.10, pp.3298,3304, Oct. 2013
- [28] Jan M. Rabaey. 1996. Digital Integrated Circuits: A Design Perspective. Prentice-Hall, Inc., Upper Saddle River, NJ, USA
- [29] K. Mistry, *et al.* "A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-free Packaging," IEEE International Electron Devices Meeting, (IEDM), vol. 247, no.250, 2007.
- [30] B. S. Doyle, *et al.* "High performance fully-depleted tri-gate CMOS transistors," Electron Device Letters, IEEE, vol. 24, no.4, pp.263,265, 2003.
- [31] H.-S. P. Wong, D. J. Frank, and P. M. Solomon, "Device design considerations for double-gate, ground-plane, and single-gated ultra-thin SOI MOSFET's at the 25 nm channel length generation," in IEDM Tech. 1998, pp. 407–410, 1998.
- [32] J.Zhou, D.Kinniment, G.Russell, and A. Yakovlev, "Adapting synchronizers to the effects of on chip variability," Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), 2008.
- [33] S. Beer, R. Ginosar, J. Cox, T. Chaney, D. Zar, "Metastability challenges for 65nm and beyond; simulation and measurements," Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.1297,1302, 2013
- [34] S. Beer, R. Ginosar, "Supply voltage and temperature variations in synchronization circuits," Technical Report, 2013
- [35] L.Wei *et al.* "Design and optimization of dual-threshold circuits for lowvoltage low-power applications," IEEE Trans. on VU1 Systems, pp. 16-24, 1999.
- [36] S. Srichotiyakul *et al.*, "Stand-by Power Minimization through Simultaneous Threshold Voltage Selection and Circuit Sizing," Proceedings of DAC, vol. 43, no.41, 1999.
- [37] J.Zhou, D.Kinniment, G.Russell, and A. Yakovlev, "Adapting synchronizers to the effects of on chip variability," Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), 2008.
- [38] S. Nassif, K. Bernstein, D. Frank, A. Gattiker, W. Haensch, B. Ji, E. Nowak, D. Pearson and N.J Rohrer, "High Performance CMOS Variability in the 65nm Regime and Beyond," Electron Devices Meeting, IEEE International, vol. 10, no. 12, pp. 569,571, Dec. 2007.
- [39] International Technology Roadmap for Semiconductors (ITRS), 2006 update.
- [40] D. Kinniment, C. Dikes, K. Heron, G. Russell, and A. Yakovlev, "Measuring deep metastability and its effect on synchronizer performance," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 9, pp. 1028–1039, Sep. 2007.
- [41] S. Beer, J. Cox, T. Chaney, D. Zar, "MTBF Bounds for Multistage Synchronizers," Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), vol. 19, no. 22, pp.158-165, May 2013
- [42] A. Cantoni, J. Walker and T. Tomlin, "Characterization of a Flip-Flop Metastability Measurement System," IEEE Trans. Circuits and Systems 54(5):1032-1040, 2007.
- [43] S. Beer, R. Ginosar, R. Dobkin, Y. Weizman, "MTBF Estimation in Coherent Clock Domains," Proc. IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), pp.166,173, 2013.

- [44] D. Wolpert and P. Ampadu, "A Sensor to Detect Normal or Reverse Temperature Dependence in Nanoscale CMOS Circuits," IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, DFT '09, vol. 7, no. 9, pp. 193,201, Oct. 2009.
- [45] I.W. Jones, S. Yang and M. Greenstreet, "Synchronizer Behavior and Analysis," ASYNC 2009