# Design and analysis of 4-bit absolute-value comparator in 65nm technology using hybrid TG/CMOS

#### Yujie Wang

Department of Microelectronics, Northwestern Polytechnical University, 710072, Xi'an, China

#### wangyujie@nwpu.edu.cn

Abstract. Comparators are an important part of the calculator architecture, and rapid advances in semiconductor and electronics technology have placed higher demands on their performance. Optimization of digital systems involves several levels in order to make improvements in their power consumption, delay, and other parameters. This paper designs a low-power, high speed, and area efficient 4-bit absolute-value comparator, which utilizes a static Complementary Metal-Oxide-Semiconductor (CMOS) and Transmission Gate (TG) hybrid structure. The design optimizes the system at the level of separate circuit blocks, logic gates and transistors. The logic functions are realized by means of transcoding and magnitude Comparison, and the input signals are all driven by two-stage inverters. MUX and XOR with excellent performance of TG structure are implemented using simulation and analysis. In this paper, the transistor sizes and supply voltages of each logic gate are calculated and optimized using MATLAB by applying logic effort theory. The design uses a 65nm technology, and transient simulation of the overall system in Cadence successfully realizes its logic functions with 0.83ns delay and 49.6uw power consumption, proving the effectiveness of the design. This study based on theoretical calculations and simulations is a good reference for the design and theoretical study of very largescale integrated circuits (VLSI).

Keywords: CMOS technology, VLSI, surface area, comparator.

#### 1. Introduction

Comparator is a very important arithmetic component used for computer architecture and is widely used in various digital modules. For the past years the technology of the electronic industry has developed rapidly. Significant advances in integration and chip manufacturing technology have led to a growing need for fast, reliable, low-energy device designs. Reducing power consumption in digital systems requires optimization across all design levels, spanning from the implementation techniques of digital circuits to transistor-level configurations, gate-level logic structures, and high-level algorithms. Various logic synthesis methods, such as Pass Transistor Logic (PTL), TG, and Gate Diffusion Input (GDI), have been developed to enhance performance compared to traditional CMOS logic for specific logic gate structures. However, each of these approaches has its own set of pros and cons concerning power consumption, delay, and area usage. At the circuit level, combining these methods with static CMOS can result in significantly reduced power consumption, power delay product, and occupied area. For instance, hybrid PTL/CMOS logic circuits have demonstrated power savings of over 60% when compared to standard static CMOS circuits [1, 2].

This paper presents a four-bit absolute value comparator designed for the 65nm technology node, achieving a critical path with a delay of 0.83ns and an energy consumption of 49.6uW at VDD of 0.844V. The subsequent sections of this paper are organized as follows: Section 2 delves into the background and fundamental functions of absolute value comparators, providing a comparative analysis of different implementations of basic logic gates in digital integrated circuits, including static CMOS, GDI, PTL, and TG. In Section 3, the basic functions of the four-bit absolute value comparator are explained and the overall system structure is given. In addition to this, the basic principles of the transcoder and the three-bit magnitude comparator are explained. The design of the logic gate level circuit is given through the truth table. For different structures of complex logic gates, we measured various parameters such as delay, power consumption, and swing, and finally chose the hybrid design of static CMOS and TG to realize the overall circuit function through compromise analysis and comparison. In Section 4, the delay and power consumption of the critical path is calculated through logical effort theory and optimized for each stage size and supply voltage using MATLAB. In Section 5, the logic function, delay, and power consumption of the overall circuit is simulated using Cadence IC 6.1.8. In Section 6, the findings of this paper are summarized, with an outlook on its shortcomings and follow-up work. This type of simulationbased study is very much beneficial for the VLSI circuit design-related theory and lab-based courses.

#### 2. Literature Review of Four-bit Absolute Value Comparators designs

Absolute value circuits have a wide range of applications in the field of analogy signal processing, such as the comparison of absolute values to control the clamping voltage to achieve high-precision rectification, or for accurate sampling of filters, these excellent characteristics make it an essential component of the new biomedical signal processing system [3-6]. Absolute value circuits can also be applied in AD conversion [7].

A magnitude comparator is a combinational logic circuit used to compare two binary numbers, with the output indicating whether one number is greater than, equal to, or less than the other. Because the digital magnitude comparator is a decision-making device, it forms an important part of many control devices, such as biometric authentication and password verification processes, analogy-to-digital converters, address decoding for computers and microprocessor-based devices (selecting a particular input/output device for storing data), sensors (where in the binary digits represent non-electrical signals for measurement, such as speed, temperature, position, etc. ), and comparison with a threshold value.

The most commonly used implementation of logic gates in digital integrated circuits is static CMOS, and Fig. 1 shows the NAND gate and NOR gate based on this structure, where A and B are the input signals and F is the output signal.



Figure 1. Static CMOS implementation of NAND and NOR gate (Photo/Picture credit: Original).

Two-bit MC design using CMOS logic has been reported in [8, 9]. It's a great rail-to-rail structure, but when the number of bits increases, the structure of complex gates requires more transistors, and their high input impedance leads to high delays.

PTL can realize the same logic function with fewer transistor counts and smaller interconnect effects. Many PTL circuit implementations have been proposed in [10, 11]. However, due to its voltage threshold loss, it is often necessary to add level restore circuits or use special process transistors. The TG structure well avoids threshold loss through the N/PMOS complementary feature, but it still depends on the pull-up and pull-down paths of the former stage for its normal operation, which sometimes requires the additional buffer units. Fig. 2 and 3 respectively shows a XOR gate realized by the PTL and TG, A and B are inputs, F is output.



Figure 2. PTL Implementation of XOR gate.



Figure 3. TG implementation of XOR gate (Photo/Picture credit: Original).

GDI is an inventive logic structure that is simple and can change its logic functions according to different biases, as shown in Fig. 4 and Tabel.1. A GDI implementation of a low-power combinational circuit is discussed, which greatly saves the number of transistors, but unfortunately many of these functions are difficult to realize in a p-well process [11].



Figure 4. GDI basic circuit (Photo/Picture credit: Original).

| Ν | Р | G | Out                  | Function |
|---|---|---|----------------------|----------|
| 0 | В | А | ĀB                   | F1       |
| В | 1 | А | $\overline{A} + B$   | F2       |
| 1 | В | А | A + B                | OR       |
| В | 0 | А | AB                   | AND      |
| С | В | А | $\overline{A}B + AC$ | MUX      |
| 0 | 1 | А | Ā                    | NOT      |

Table 1. GDI basic circuit logic functions.

# 3. Circuit design

# 3.1. Overall design

The digital signal is output from the MCU and enters the absolute value comparison circuit to be compared with a determined threshold value, which here can be arbitrarily selected by the MCU within a certain range and is programmable. Since the input digital signal is in the form of a complementary code, the circuit is composed of two parts, an absolute value converter and a magnitude comparator The absolute value converter converts the signal input in the form of a complementary code into the corresponding absolute value signal and feeds it into the amplitude comparator in the form of an original code to obtain the final comparison result, which is greater than the threshold value and outputs 1, and is smaller or equal to the threshold value and outputs 0 [12]. The overall circuit outputs the signal in the form of a complementary code, and outputs 0. The overall system block diagram is shown in Fig. 5.



Figure 5. Absolute comparator (Photo/Picture credit: Original).

# 3.2. Absolute Converter

First detect the value of the highest bit, if it is 0, it represents a positive number, then the absolute value converter does not process it, directly output to the amplitude comparator. If its value is 1, it represents the value of the negative number, by inverting each bit and adding 1 to obtain a new binary number, take the last three bits that is its absolute value. Fig.6 shows the arithmetic process for positive and negative numbers respectively.

|   | 1001 (-7)                  |   | 1011 (-5)                  |
|---|----------------------------|---|----------------------------|
|   | 0 1 1 0 (bits are flipped) |   | 0 1 0 0 (bits are flipped) |
| + | <u>0 0 0 1</u> (add 1)     | + | <u>0 0 0 1</u> (add 1)     |
|   | 0011 (Magnitude)           |   | 0101 (Magnitude)           |
|   |                            |   |                            |

Figure 6. Example of transcoding circuit operation (Photo/Picture credit: Original).

| <i>C</i> <sub>3</sub> | <i>C</i> <sub>2</sub> | <i>C</i> <sub>1</sub> | <i>C</i> <sub>0</sub> | <i>B</i> <sub>3</sub> | <i>B</i> <sub>2</sub> | <i>B</i> <sub>1</sub> | B <sub>0</sub> |
|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|----------------|
| 0                     | 0                     | 0                     | 0                     | 0                     | 0                     | 0                     | 0              |
| 0                     | 0                     | 0                     | 1                     | 0                     | 0                     | 0                     | 1              |
| 0                     | 0                     | 1                     | 0                     | 0                     | 0                     | 1                     | 0              |
| 0                     | 0                     | 1                     | 1                     | 0                     | 0                     | 1                     | 1              |
| 0                     | 1                     | 0                     | 0                     | 0                     | 1                     | 0                     | 0              |
| 0                     | 1                     | 0                     | 1                     | 0                     | 1                     | 0                     | 1              |
| 0                     | 1                     | 1                     | 0                     | 0                     | 1                     | 1                     | 0              |
| 0                     | 1                     | 1                     | 1                     | 0                     | 1                     | 1                     | 1              |
| 1                     | 0                     | 0                     | 0                     | 0                     | 0                     | 0                     | 0              |
| 1                     | 0                     | 0                     | 1                     | 0                     | 1                     | 1                     | 1              |
| 1                     | 0                     | 1                     | 0                     | 0                     | 1                     | 1                     | 0              |
| 1                     | 0                     | 1                     | 1                     | 0                     | 1                     | 0                     | 1              |
| 1                     | 1                     | 0                     | 0                     | 0                     | 1                     | 0                     | 0              |
| 1                     | 1                     | 0                     | 1                     | 0                     | 0                     | 1                     | 1              |
| 1                     | 1                     | 1                     | 0                     | 0                     | 0                     | 1                     | 0              |
| 1                     | 1                     | 1                     | 1                     | 0                     | 0                     | 0                     | 1              |

The input signal is  $C_3 \sim C_0$ , its range is from -7 to 7, and its output result is  $B_3 \sim B_0$ , and its truth table is shown in Fig. 7.

Figure 7. Truth table of transcoding circuits (Photo/Picture credit: Original).

Observing the truth table, we can see that the result is output directly when  $C_3$  is 0, and the comparison continues when  $C_3$  is 1, so we realize this circuit function by a MUX.

# 3.3. Magnitude Comparator

In the three-bit magnitude comparison, the calculation is carried out sequentially from the highest bit, the high bit is greater than the threshold is directly output 1, less than the threshold is directly output 0, if the two are equal, then compare the second highest bit.

Although the serial structure design can reduce the overall power consumption of the circuit, but the cost of its generation delay is too large, so we have adopted another parallel structure to realize the amplitude comparator function. These two design options are shown in Fig. 8 and 9.



Figure 8. Magnitude Comparator in serial structure (Photo/Picture credit: Original).



Figure 9. Magnitude Comparator in parallel structure (Photo/Picture credit: Original).

#### 3.4. Structure of MUX and NOR

The use of static CMOS for simple gates is feasible, but the number of transistors and delay characteristics are not ideal for the more complex structures of XOR and MUX. TG and PTL are commonly used to implement complex logic gates with even better performance. The static CMOS structure of the heterodyne gate is shown in Fig 10. Fig. 11 and 12 show the MUX utilizing the PTL and TG structures respectively.



Figure 10. Static CMOS implementation of XOR gate (Photo/Picture credit: Original).



Figure 11. PTL implementation of MUX (Photo/Picture credit: Original).

Proceedings of the 2023 International Conference on Machine Learning and Automation DOI: 10.54254/2755-2721/39/20230599



Figure 12. PTL implementation of MUX (Photo/Picture credit: Original).

Transient simulations were performed on the XOR gates of each structure, and their delay, power consumption, count of transistors, and voltage swing were measured. The results are given in Table 2.

|            | TG    | CMOS  | PTL   |
|------------|-------|-------|-------|
| Swing (V)  | 0-1   | 0-1   | 0-0.7 |
| Delay (ps) | 26.25 | 52.5  | 42.1  |
| Power (uw) | 7.1   | 9.2uw | 3.0   |
| Count      | 8     | 12    | 6     |

**Table 2.** Parameters of logic gates in different structures.

It can be seen that using TG to implement XOR and MUX is the preferred choice when balancing the metrics.

#### 4. Calculation and optimization

#### 4.1. Delay calculation

Logic effort is a theory that describes the properties of logic gates and their interactions in a logic chain, and provides a technique to minimize delays. The total delay is derived by summing the delays of each stage. The critical path is shown in Fig. 13.



Figure 13. Critical path of absolute value comparators (Photo/Picture credit: Original).

According to the definition, the delay of the critical path can be expressed as:

$$Delay = D \times t_{p_0} \tag{1}$$

$$D = \sum_{i=1}^{N} d_i \tag{2}$$

 $t_{p_0}$  is parasitic delay of the unit inverter and d can be calculated by (3)

$$d = g \cdot h + p \cdot \gamma \tag{3}$$

where  $\gamma$  is a scaling factor, related only to the process, reflecting the relationship between the input gate capacitance of a unit inverter and the intrinsic output capacitance:

$$\gamma = \frac{C_s}{C_p} \tag{4}$$

h is electrical fanout, p is parasitic effort and g is logic effort.

$$h = \frac{C_{out}}{C_{in}} \tag{5}$$

$$p = \frac{C_{par,gate}}{C_{par,INV}} \tag{6}$$

The optimal size of the logic gates at each level can be calculated at the minimum critical path delay by using the following equation from (7) to (12).

$$b = \frac{C_{on-path} + C_{off-path}}{C_{on-path}}$$
(7)

$$H = \frac{C_L}{C_{g1}} \tag{8}$$

$$G = \prod_{1}^{N} g_{i} \tag{9}$$

$$B = \prod_{i=1}^{N} B_i \tag{10}$$

$$F = GBH \tag{11}$$

$$f = \sqrt[N]{F} = gh \tag{12}$$

#### 4.2. Power consumption calculation and optimization

To calculate the power consumption of the static gate in this design, it is divided into three parts. First is the static power consumption, i.e., the power consumption due to leakage current when there is no switching activity, which can be expressed by (13):

$$P_{static} = I_{static} \cdot V_{DD} \tag{13}$$

However, this part of the power consumption is very small compared to the whole, so we ignore it. The DC power consumption comes from the current that flows through the pull-up and pull-down paths at the same time as the logic gate is flipped, and can be expressed as:

$$P_{dp} = t_{sc} \cdot V_{DD} \cdot I_{peak} \cdot f = C_{sc} \cdot V_{DD}^{2} \cdot f$$
(14)

This design uses a supply voltage of up to 1V, and the MOSFET threshold voltage is about 0.31V, generating power consumption most of the time there is only a sub-threshold current, so this part of the power consumption is also negligible.

The dynamic power consumption is the energy consumption resulting from the charging and discharging of the capacitor at the output, which can be expressed as:

$$p_{dynamic} = (C_L + C_P) \cdot V_{DD}^{2} \cdot f \cdot \alpha_{0 \to 1}$$
(15)

 $\alpha_{0 \rightarrow 1}$  is the transition rate, the transition rate for 2-input NAND gate is:

$$(p_A p_B) p_A p_B \tag{16}$$

the transition rate for 2-input NAND gate is:

$$(p_A)(1-p_B)(1-(1-p_A)(1-p_B))$$
(17)

the transition rate for 2-input XOR gate is:

$$[1 - (p_A + p_B - 2p_A p_B)](p_A + p_B - 2p_A p_B)$$
(18)

The sizes and power consumption of each stage at minimum delay are shown in Table 3.

| stage                 | 1     | 2      | 3     | 4     | 5      | 6      | 7     | 8      | 9      |
|-----------------------|-------|--------|-------|-------|--------|--------|-------|--------|--------|
| S                     | 1.000 | 3.070  | 9.440 | 6.830 | 16.780 | 17.180 | 8.450 | 14.830 | 13.020 |
| α                     | 0.250 | 0.250  | 0.188 | 0.234 | 0.228  | 0.250  | 0.109 | 0.235  | 0.243  |
| delay/t <sub>p0</sub> | 5.120 | 10.864 | 3.874 | 6.232 | 8.345  | 15.306 | 6.232 | 4.227  | 6.233  |
| energy/C              | 0.980 | 3.012  | 3.000 | 6.108 | 7.166  | 7.954  | 2.518 | 6.434  | 12.064 |

Table 3. Parameters of the levels corresponding to the minimum delay.

Based on the above results, the dimensions and supply voltage of each stage were adjusted to minimize power consumption. For the VDD impact on the delay, assume that delay is proportional to VDD as:

$$Delay \sim \frac{V_{DD}}{\left(V_{DD} - V_T\right)^2} \tag{19}$$

Using MATLAB to optimize the size of each level and VDD, according to the calculation results, the total delay of the circuit is adjusted to reach 1.5 times the minimum delay, at this time the parameters of each level are shown in Table 4.

| stage                 | 1     | 2     | 3     | 4     | 5      | 6      | 7     | 8     | 9      |
|-----------------------|-------|-------|-------|-------|--------|--------|-------|-------|--------|
| S                     | 1.000 | 1.301 | 1.750 | 1.250 | 3.000  | 5.000  | 1.750 | 1.767 | 3.752  |
| α                     | 0.250 | 0.250 | 0.188 | 0.234 | 0.228  | 0.250  | 0.109 | 0.235 | 0.243  |
| delay/t <sub>po</sub> | 3.925 | 7.510 | 5.382 | 8.575 | 16.056 | 18.792 | 6.325 | 9.881 | 22.553 |
| energy/C              | 0.383 | 0.509 | 0.394 | 0.785 | 1.226  | 1.523  | 0.270 | 0.914 | 6.417  |

Table 4. Optimized parameters at all levels.

At this time, VDD is 0.844V, and the power consumption at delay 99.6 $t_{p_0}$  is 12.4C, which reduces the energy consumption by 75% at the cost of increasing the delay by a certain amount.

# 5. Simulation analysis

To validate the theoretical calculations in Section 4, circuit simulation has been conducted in 65 nm CMOS process. Cadence IC 6.1.8 tools have been utilized for this purpose. The transistor I-V characteristic curves for this process are shown in Fig. 14 and Fig. 15 below.



Figure 14. I-V characteristic curves of NMOS (Photo/Picture credit: Original).



Figure 15. I-V characteristic curves of PMOS (Photo/Picture credit: Original).

The transient simulation of the overall circuit was carried out at a supply voltage of 0.844 V and a temperature of  $25^{\circ}$ C. The overall system schematic is shown in Fig. 16.



Figure 16. 4-bit absolute value comparator schematic (Photo/Picture credit: Original).

When the threshold value  $A_2A_1A_0$  is set to "010", the input and output waveforms reacting to the logic function of the whole system are shown in Fig. 17.



Figure 17. Transient simulation of 4-bit absolute comparator (Photo/Picture credit: Original).

According to the simulation results, the intrinsic delay of the minimum inverter under this process is 0.761ps, and the total delay of the overall system is 0.83ns, which is about %9 difference from the theoretical calculation, and the power consumption is 49.6uw.

# 6. Conclusion

In order to meet the architectural requirements of computer systems for low latency and low power consumption, this paper designs a high speed four-bit absolute value comparator based on a 65nm technology node, using a hybrid CMOS and TG structure design with computational optimization of each logic gate size and supply voltage. A 75% power reduction is obtained at the expense of 50% delay. The power consumption and delay of the circuit is simulated in Cadence and the final design obtained has 0.83ns delay at  $25^{\circ}$ C, 0.844V supply voltage, 49.6uw power consumption. The design has good performance in terms of delay, power consumption, number of transistors and other indicators. There are still some problems in this design, there is a competition risk in some of the signal inversion, in the future work, the stability of the output signal will be improved by adjusting the logic structure or introducing selective pulses. This research based on theoretical calculations and simulations is very useful for theoretical and experimental courses related to the design of ultra-large scale integrated circuits.

# References

- Suri, L., Lamba, D., Kritarth, K., & Sharma, G. (2013). High performance and power efficient 32-bit carry select adder using hybrid PTL/CMOS logic style. International Multi-conference on Automation. IEEE.
- [2] Sharma, G., Nirmal, U., & Misra, Y. (2011). A Low Power 8-bit Magnitude Comparator with Small Transistor Count using Hybrid PTL/CMOS Logic.
- [3] Luan, Lili. (2017). Characterization and comparison of single-supply absolute value circuits. Electronic Fabrication (22), 2.
- [4] Li, Y.-J., Wu, Y.-W., Zhang, E.-S., & Liang, Y. (2017). Front-end sampling design of active power filter based on absolute value circuit. Yunnan Power Technology (Vol. 45, pp. 3).
- [5] Iranmanesh, S., Raikos, G., Jiang, Z., & Rodriguez-Villegas, E. (2016). CMOS implementation of a low power absolute value comparator circuit. New Circuits & Systems Conference. IEEE.

- [6] Kumngern, M. (2013). Absolute Value Circuit for Biological Signal Processing Applications. Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation. IEEE.
- [7] Wu, Jian, Wu, W., & Jia, Qianwei. (2012). Application of absolute value circuits in analog-todigital conversion. Automation Technology and Applications (8), 3.
- [8] Bhuyan, Muhibul & Riyadh, Md. Mubarak & Hossain, Md & Rahman, Md. (2020). Design and Simulation of a Low Design and Simulation of a Low Power and High-Speed 4-Bit Magnitude Comparator Circuit using CMOS in DSch and Microwind. 20. 82-94.
- [9] Mukherjee, D. N., Panda, S., & Maji, B. (2017). Design of Low Power 12-bit Magnitude Comparator. International Conference on Devices for Integrated Circuit.
- [10] Weste, N., & Eshraghian, K. (1993). Principles of CMOS VLSI Design: Second Edition.
- [11] Chandrakasan, A., & Brodersen, R. Low-Power CMOS Design. IEEE Xplore.
- [12] Morgenshtein, A., Fish, A., & Wagner, I. A. (2002). Gate-diffusion input (GDI): a powerefficient method for digital combinatorial circuits. IEEE Educational Activities Department.