Research Journal of Engineering Sciences ___________________________________________ ISSN 2278 – 9472Vol. 3(2), 6-11, February (2014) Res. J. Engineering Sci. International Science Congress Association 6 A Novel NOC Architecture for SoC based Ultra Lightweight Crypto-Processor Using Present and Katan Algorithm T. Blesslin Sheeba and P.RangarajanSathyabama University, Chennai, INDIA Department of EEE, RMD Engineering College, Chennai, INDIAAvailable online at: www.isca.in, www.isca.me Received 24th September 2013, revised 14th January 2014, accepted 6th February 2014 Abstract The performance computation of the Crypto-processor implemented on SoC platform is research of interest now. The traditional buses or wires have the problem of scalability, complexity and timing, from here we target this point and present a novel Network On Chip (NOC) architecture to overcome the cons. Network On Chip (NOC) consists of storage and I/O resources interconnected by network of switches for network computation. Two ultra lightweight cryptographic algorithms are presented in this paper namely PRESENT and KATAN. It is developed using Altera Cyclone IV E. The NOC architecture consists of different topology, switching and routing techniques based efficiency requirements. Finally the computed efficiency for the processor running at 330 MHz and taking 5.047 sec for computation using 0.323 mm cell area in 180 nm technology. Keywords: Present, Katan, Network On Chip (NOC), System On Chip (SOC), Cyclone IV E. Introduction Several years back the cryptography is introduced as evolution of secure communication. That is data or information shared must be protected from the trespassers (i.e. from unwanted users). The cryptography is the process of encryption and decryption taken place at transmitter and receiver end respectively. Lot of algorithm has been proposed for encryption and decryption process. In recent year there are several Ultra lightweight Cryptography has been proposed in advance to lightweight cryptography. In contrast to software implementation which will take more resources and computational time the hardware implementation also proposed but it has the demerits of compactness, soothe concept of crypto-processor was evolved by implementing it in SoC. The concept of SoC is evolved as solution for implementation of more complex algorithm and design with more core processor. The rise of Network On Chip (NOC) is due to the growing complexity interconnection design and chip architecture2,3. The design and implementation of computational intensive cryptographic algorithm in Hardware is a tough task but the disadvantage of compatibility and flexibility is experienced. To succeed this problem the ultimate choice is a multi core platform called as SoC which will maximize the computational power and flexibility. The three conceptual thinking if we consider an SoC is design time decision, computation, storage and I/O. 64 core and 80 core have been in corporate in the single chip which is presented in the articles4,5 respectively, However this rapid increase in the number of core in SoC implementation will lead to severe system degradation caused by bus architecture for inter connections. If we consider the Crypto-processor implementation using SoC means the bus constraints will result message stalling and signal interference etc. Thus Network On chip (NOC) is preferred as global solution for intra SoC communication. The Network on Chip architecture is made of network adaptor, routing nodes and links. The network adaptor is to interface the core to NOC and decouple the core from communication, the routing node will route the data from source to destination according to the protocol, and finally the link will connect the different nodes. There are several parameters that will decide the computational efficiency of the processor with Network On Chip (NOC). If we design NOC architecture then we have to consider the topology, routing techniques and switching method which will contribute to the QoS of the architecture. Ultra Lightweight Cryptography Algorithm: Present: Algorithm: The plaintext P of 64-bit length is defined by b63 … 64 which is initial state. The three stages addRoundkey, sBoxLayer, pLayer of Present algorithm will change the initial state in every iteration i for 0 i 15. The operation of addRoundkey is that it will XOR the current state bi,63… bi,0 with round key of ith term as RK = rki,63 … rki,0 follows. The non-linear sBoxLayer which consists of 16 copies of a 4-bit S-box is the second stage. The S (w) is applied to each S-box for w15, … , w0 where w is defined by i = b4*i+3||b4*i+2|| b4*i+2 || b4*i+1|| b4*IThe 64 most significant bits (MSB) of the current state of the key register K is the current round key RKi . By shifting the key register K= K127126…k to the left by 61 bits and passing the left most 8-bit through two S-boxes of present we can generate the key for next round i+1. The 5 bit round counter is XORed Research Journal of Engineering Sciences________________________________________________________ ISSN 2278 – 9472 Vol. 3(2), 6-11, February (2014) Res. J. Engineering Sci. International Science Congress Association 7 with 5-bit k66656463. The round key bit RKi+1 = k127 … k64 is formed by resultant 64 of MSB. Architecture of Present: The scaling of 64-bit implementation to 16- bit implementation will lead to implementation will lead to area reduction. Further scaling will lead to degradation in throughput with small amount of area reduction due to complexity operation of permutation. The top level of datapath is shown in figure 1. Data storage: The shift register SR1 is stored with b63 …b0 states. In one clock cycle it performs a 16-bit circular left shift. The 16, 4-bit clock is combined and consider as 64-bit shift register. For the round operation the 16 MSB are tapped out of SR1. Figure-1 16-bit Datapath of Present S-box Implementation and Permutation layer: The round key RKi is XORed with incoming data as result of round operation initialization and stored in four S-boxes. The architecture utilizes four S-boxes for round operation and two for key scheduling. SR is responsible for implementation of permutation operation which perform 4-bit shift during round operation and 16-bit shift while copy the content to SR. The input to the block position 12, 8, 4 and 0 of SR2 is taken from 16-bit S-box. 16 MSB of SR1 is used to compute the 4-bit block of 15, 11, 7 and 3 during first clock cycle of round operation. The block 14, 10, 6 and 2 are computed by shifting the SR2 by 4-bits in the next subsequent cycle. To complete the round function the above operation is continued for another two cycle as a result 8 clock cycle for each round operation is achieved. Figure-2 Key Scheduling of Present Key Storage and Scheduling: 128-bit shift register is used to store the key which performs 16-bit circular left shift. During the process of first four clock cycle by tapping the 16 MSB from key and passing them to RKGen the key RK1 for first round is obtained. The three extra taps are placed as shown in figure 2 to shift the 61 bits by 16-bit. The register A is used to store the lost 3bits during first round. Register A and 13MSB from the key are passed to RKGen for subsequent round key. Two S-boxes were stored in RKGen foe S-box operation, to compute the XOR with round counter a 5-bit XOR is needed and to choose the appropriate bit for round key generation multiplexers are contributed. Katan: Katan family consists of three block ciphers with various block sizes: 32, 48 and 64 bits. All ciphers have 80-bit keys. Each of the Katan algorithms loads the data block into two internal shift registers L1 and L2. Using nonlinear functions, it performs the 254 rounds which form the registers feedback figure3. One of nonlinear functions uses specific irregular value (IR) in addition to several register bits. It depends on the round number. The requirements of katan are extremely low because of the following collection in factors: i. Katan uses shift registers, which can be implemented easily; feedback functions are very simple too though they provide required nonlinearity, ii. it processes small blocks of data – 32 to 64 bits, iii. Its internal state is small and its size is a little bit greater than the block size. Block cipher of katan can be used as a cryptographic kernel of mounting other kinds of cryptographic primitives over it. The set of cryptographic functions over katan was recently proposed in. This set includes: i. block cipher – katan algorithm itself, ii. Pseudo random number and stream cipher generator, iii. Hash function. Research Journal of Engineering Sciences ___________ Vol. 3(2), 6-11, February (2014) International Science Congress Association Figure-3 Round Function of Katan To minimize expenses, the hashing add - lightweight as possible. One of hash function with a thin hashing layer over the internal block cipher is which took part in the first stage of SHA - crunch versions is based on the double- pipe Merkle construction. The double- pipe version allows reaching higher cryptographic strength comparably to the main version with practically the same overheads . Using the compression function structure similar to crunch (st rengthened version) and the 64 katan64 block cipher, we can build a lightweight compression function figure-4. Figure- 4 KATAN 64- based Compression Function Compression function of double-pipe crunch every block of the message twic e: concatenated with Hi and H,i values. We slightly modified the structure of the compression function: as the block size of the KATAN64 cipher is relatively small, every block of the message is separated into two halves: M’i and M’’i, which is proc essed by the block ___________ ________________________________ ______ Association Round Function of Katan - on should be as lightweight as possible. One of hash function with a thin hashing layer over the internal block cipher is crunch algorithm, - 3 contest. One of pipe Merkle -Damgård pipe version allows reaching higher cryptographic strength comparably to the main version with . Using the compression function rengthened version) and the 64 -bit block cipher, we can build a lightweight compression based Compression Function crunch version encrypts e: concatenated with Hi and H,i values. We slightly modified the structure of the crunch compression function: as the block size of the KATAN64 cipher is relatively small, every block of the message is separated into essed by the block cipher in parallel. Final hash value is a result of the final transformation of HN and H’N values (last message block processing output values). Proposed System: SoC implementation of Crypto using NOC: The figure- 5 sows an ove lightweight crypto- processor implemented in System On Chip (SoC) using Nios II processor with NOC for interconnection of cores or different modules. The two algorithm presented here are present and katan . The algorithm was store which is used by processor for different encryption and decryption of data based on the application, the control logic for selecting the cryptographic techniques is stored in processor internal memory. The DMA controller is used to access the pro cessor during the traffic of packet occur. Figure - Crypto- Processor using NOC The figure 6 shows the general architecture for System on chip connected with certain protocol for routing, switching and topology called as NOC. Figure - General NOC Arc ______ _______ ISSN 2278 – 9472 Res. J. Engineering Sci. 8 cipher in parallel. Final hash value is a result of the final transformation of HN and H’N values (last message block Proposed System: SoC implementation of Crypto -Processor 5 sows an ove rall architecture for ultra processor implemented in System On Chip (SoC) using Nios II processor with NOC for interconnection of cores or different modules. The two algorithm presented here . The algorithm was store d in SRAM which is used by processor for different encryption and decryption of data based on the application, the control logic for selecting the cryptographic techniques is stored in processor internal memory. The DMA controller is used to access the cessor during the traffic of packet occur. - 5 Processor using NOC 6 shows the general architecture for System on chip connected with certain protocol for routing, switching and - 6 General NOC Arc hitecture Research Journal of Engineering Sciences ___________ Vol. 3(2), 6-11, February (2014) International Science Congress Association It is clear from the block that even an IP core also consists of different type’s module as shown. The key role of network On Chip comes here the placement of the block should be placed in such a way that routing of signal should be simple and sh not cause any propagation delay and the latency be maintained. The topology must be selected in a way so that more number of nodes can be inserted without interference of the signals and node failure will not cause packet error. Finally the switching should be chosen for effective data transfer. Hence the before said parameters like Topology, Routing, Switching for proposed Network On Chip is discussed in the next section. Proposed NOC Architecture: The below figure cryptographic process of data with NOC interconnect. Figure-7 Proposed NOC Architecture The present architecture is based on Packet switching for data cryptography. As shown in architecture it has EEs (Encryption element), DEs (Decryption Element), CPE (Ciphertext Processing Element), PPE (Plaintext Processing Element) are connected via Network On Chip for encryption and decryption of data sent to NOC, the number of EE and DE is not restricted for hardware implementation. From figure 7 it is clear that PPE at the r ight side of the architecture is responsible for two jobs, they are slicing the input plaintext into N units and header should be added to the pack for sending them to each EE then it receive the decrypted packets from DEs and remove the headers before sen ding it to NOC. On the other hand CPE will do the same process but with the Ciphertext. Generally the NOC architecture is made of certain protocol for effectiveness, in this paper we are considering very two important parameters namely topology and switc mesh topology and wormhole switching are implemented in our novel NOC. 2D-MESH Topology: In this types of topology each of the processing element are interconnected with one another. Every node in this set up not only sends the bit but also r the other Processing Element. The above figure Number of Processing Element (i.e. Encryption Element, Decryption Element) connected in a 2D- mesh network. The ___________ ________________________________ ______ Association It is clear from the block that even an IP core also consists of different type’s module as shown. The key role of network On Chip comes here the placement of the block should be placed in such a way that routing of signal should be simple and sh ould not cause any propagation delay and the latency be maintained. The topology must be selected in a way so that more number of nodes can be inserted without interference of the signals and node failure will not cause packet error. Finally the switching should be chosen for effective data transfer. Hence the before said parameters like Topology, Routing, Switching for proposed Network On Chip is discussed in the next section. The below figure 7 shows data with NOC interconnect. Proposed NOC Architecture The present architecture is based on Packet switching for data cryptography. As shown in architecture it has EEs (Encryption element), DEs (Decryption Element), CPE (Ciphertext Processing Element), PPE (Plaintext Processing Element) are connected via Network On Chip for encryption and decryption of data sent to NOC, the number of EE and DE is not restricted 7 it is clear that PPE ight side of the architecture is responsible for two jobs, they are slicing the input plaintext into N units and header should be added to the pack for sending them to each EE then it receive the decrypted packets from DEs and remove the headers ding it to NOC. On the other hand CPE will do the Generally the NOC architecture is made of certain protocol for effectiveness, in this paper we are considering very two important parameters namely topology and switc hing. The 2D- mesh topology and wormhole switching are implemented in our In this types of topology each of the processing element are interconnected with one another. Every node in this set up not only sends the bit but also r elay data from the other Processing Element. The above figure 8 shows the Number of Processing Element (i.e. Encryption Element, mesh network. The description of PEs and Des are explained in previous section. The added advantages of using the 2D data can be transmitted from different processing element simultaneously which is efficient in terms for crypto Modification of the topology in reconfigurable techniques is done easily. There is increase proficiency to find the isolation and detection of error. It is very secure since dedicated line is providedFigure - 2D- MeshTopology Wormhole Switching: The proposed NOC will rely on Packet Switching, so here we consider Wormhole Switching bec has the advantages like the complete paper need not to be stored in the switch while waiting for the header flits to be routed to next stage, it not only reduce the delay but also need small buffer space. Channel allocation and bandwidth are decoup The process of wormhole switching is shown in below figure Figure - Process of Wormhole Switching The main term which should be considered during Wormhole switching is FLITS (low ontrol Dig the process are broken and called as flits, the flits are arranged ______ _______ ISSN 2278 – 9472 Res. J. Engineering Sci. 9 description of PEs and Des are explained in previous section. advantages of using the 2D -mesh topology is that data can be transmitted from different processing element simultaneously which is efficient in terms for crypto -processor. Modification of the topology in reconfigurable techniques is increase proficiency to find the isolation and detection of error. It is very secure since dedicated line is - 8 MeshTopology The proposed NOC will rely on Packet Switching, so here we consider Wormhole Switching bec ause it has the advantages like the complete paper need not to be stored in the switch while waiting for the header flits to be routed to next stage, it not only reduce the delay but also need small buffer space. Channel allocation and bandwidth are decoup led. The process of wormhole switching is shown in below figure 9. - 9 Process of Wormhole Switching The main term which should be considered during Wormhole ontrol Dig its). The large packet in called as flits, the flits are arranged Research Journal of Engineering Sciences________________________________________________________ ISSN 2278 – 9472 Vol. 3(2), 6-11, February (2014) Res. J. Engineering Sci. International Science Congress Association 10 as Head flit, body flit followed by tail flit as shown fig. during the transmission of flit the whole body of flit which approached first is transmitted and then the next (i.e.) the interference of the flit is not allowed. It is simply for control the flow of plaintext or cipher text during encryption and decryption process. Here in crypto-processor the data (plaintext) from the EE is given to PPE and the data (ciphertext) from the DE is given to CPE by the NOC With the help of these types of flit controlled Packet switching. Results and Discussion The various result of the proposed NOC architecture for the crypto-processor implemented on SoC is discussed below. The table 1 shows the comparison for operation time based on the NOC architecture over the conventional techniques, and found to be more efficient as the time was reduced than the conventional one. Table-1 Operation Time Types Operation Time (Sec) Conventional Architecture 6.219 NOC Architecture 5.047 The table 2 provides the comparison chart for the proposed NOC Architecture computed over a different parameters namely technology size, cell area and data frequency. It is found that three different technologies have been taken In to account as 130 nm, 90nm and proposed NOC in 180 nm and we conclude as technology scale reduces the efficient in terms of area and frequency is increased. Table-2 Comparison Table for Proposed NOC NOC Prototype [8] [9] Proposed NOC Technology Size 130nm 90nm 180nm Total cells Area 0.260 mm 2 0.082 mm 2 0.323 mm 2 Data Frequency 500 MHz 500 MHz 330 MHz The computation of the latency is found and the comparison graph for the same is shown in figure 10. The graph shows that the latency of the packet was increased in the conventional architecture that should be less which is achieved in the proposed NOC architecture. Finally efficient computation of the topology used in the proposed architecture was done and the comparison was done between the mesh and 2D-mesh topology. The Gate count for 2D-mesh was found to be far less than the traditional topology, and dynamic power consumption is increased in the conventional topology and reduced in better manner in our proposed architecture. Table-3 Comparison Table for Topology using Cyclone 1V E FLITS Mesh 2D-Mesh Gate Count Dynamic Power (mW) Gate Count Dynamic Power (mW) 2 32750 11.53 18368 6.99 4 46598 20.40 28654 12.29 8 75382 36.88 47038 22.32 16 134479 69.45 85362 42.03 32 250559 134.36 163330 81.27 64 490820 261.65 316173 159.33 Figure-10 Comparison graph for Latency Research Journal of Engineering Sciences________________________________________________________ ISSN 2278 – 9472 Vol. 3(2), 6-11, February (2014) Res. J. Engineering Sci. International Science Congress Association 11 Conclusion We develop a Crypto-Processor implemented on System On Chip (SoC) using novel Network On Chip (NOC) which is implemented with 2D-Mesh Topology, Wormhole Switching, and effective routing techniques for Cryptographic application. Then demonstrate it performance base on the parameter like Latency, Throughput, Area, Frequency and technology. Finally we find the proposed approach is efficient in terms of cycle time, throughput and area. References 1.Advances in Ultralightweight Cryptography for Low-cost RFID Tags: Gossamer Protocol Pedro Peris-Lopez, Julio Cesar Hernandez-Castro, Juan M. E. Tapiador, and Arturo Ribagorda (2009)2.W.J. Dally and B. Towles, Route Packets, Not Wires: On-Chip Inteconnection Networks, in the 38th annual Design AutomationConference, 684-689 (2001)3.Guerrier P. and Greiner A., A Generic Architecture for On-Chip Packet-Switched Interconnections," in Design, Automation and Test in Europe (DATE), 250-256 (2000)4.S. Bell et al., "TILE64 Processor: A 64-Core SoC with Mesh Interconnect," Solid-State Circuits Conference, 2008. Digest of Technical Papers. IEEE International, 88-598 (2008)5.S. Vangal et al., An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS, Solid-State Circuits Conference, 2007. Digest of TechnicalPapers. IEEE International, 98-589 (2007)6.S. Panasenko and S. Smagin. Energy-efficient cryptography: application of KATAN. SoftCOM 2011, 19 International Conference on Software, Telecommunications and Computer Networks. Split – Hvar – Dubrovnik, September 15-17, 2011, Proceedings (SS2 – Special Session on Green Networking) (2011)7.E. Volte. CRUNCH. A SHA-3 Candidate. // Available at http://www.voltee.com – 27 February 2009 (2009) 8.E. Rijpkema, K. Goossens, A. Radulescu, J. Dielissen, J. van Meerbergen, P. Wielage and E. Waterlander, Trade-offs in the design of a router with both guaranteed and best-effort services for networks on chip, IEE Proc. Computers and Digital Techniques,150(5), 294-302 (2003)9.M. Panades, A. Greiner and A. Sheibanyrad, A Low Cost Network-on-Chip with Guaranteed Service Well Suited to the GALS Approach, Proc. the 1st Int’l Conf. and Workshop on Nano-Networks), 1-5, (2006)