IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY,

Sunday, 12 July 2015

IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY,

ABSTRACT

Radio communication has highest energy consumption. To reduce the energy and power Round Robin algorithm is used. This paper describes the design and implementation of newly proposed folded tree architecture. Folded tree architecture has three input phases. They are trunk and twig phase. Normally the power is reduced to 60-70% compared to existing methods. Measurements of the silicon implementation show an improvement of 10-20 × in terms of energy as compared to traditional modern micro controllers found in sensor nodes. Commonly this technique is used to wireless communication.

EXISTING SYSTEM

In existing method binary tree is used. The disadvantages of this method is at a time only one node act as root nodes, other node act as leaves. So at a time only one data is send. Hence the power as well as energy is increased. Time requirement is high and interconnection is high. The proposed approach gives the limited power and energy. The time requirement is low as well as interconnection path is increased. So Folded tree architecture is proposed to send the data in the way of wireless communication technique.

EXISTING SYSTEM DRAWBACKS

The number of gate count is high so the area is not efficient.

The latency and critical-path of the designs is very high.

PROPOSED SYSTEM:

•      In Proposed system we have to apply the round robin algorithm for the input nodes. This will overcome the drawbacks created by the existing system. In this method we have to create the three tree based inputs ,that three inputs are connected to the outputs parrelley using mux structure.

•      At the Same time we are applying addressing for the purpose of reducing the overhead

PROPOSED SYSTEM ADVANTAGES:

•      Less area-time-power complexity compared with the existing designs.

•      Overall power also low compare to existing one

HARDWARE REQUIREMENT:

•      FPGA Spartan 3/ Spartan 3 AN

•      SOFTWARE REQUIREMENTS:

•      ModelSim 6.4c

•      Xilinx ISE 9.1/13.2

Introduction & Problem Formulation

The transition from the 20th to 21st century observed the emergence of low-cost, low-power, and miniature size electronics, enabling attractive solutions for numerous new application areas to be created as well as facilitating several existing ones to improved. One such example is the development of Wireless Sensor Network (WSN), providing a low-cost alternative to both manual monitoring solutions and traditional infrastructure-based monitoring solutions, in which both the power and the data are required to be transported over a physical media, such as cables. In contrast to an infrastructure-based monitoring network, a WSN is comprised of spatially distributed sensor nodes that, in addition to sensing the environment, are capable of communicating wirelessly to transport the acquired data to a desired destination. In addition, the wireless communication in a WSN also provides the means to establish a self-organizing wireless monitoring network. A WSN finds its applications in numerous fields spanning from home automation to industrial monitoring, and to battlefield tracking etc. [1]-[7]. Some of the notable applications include environmental monitoring [8]-[10], fire detection in forests [11]-[13], structural health monitoring for buildings and bridges [14]-[15], health care monitoring [16]-[17], industrial condition monitoring [18]-[19], battlefield monitoring [20]-[22], precision agriculture [23] and logistics monitoring [24].

In comparison to infrastructure-based monitoring networks, the advantages associated with WSNs are low-cost, ability to self-organize, scalability and ease of deployment [25]-[28].

 As the sensor nodes in a WSN communicate wirelessly and, the means of energy (for example, batteries) are integrated within them, the cost associated with the development and maintenance of communication and power related infrastructure is low as compared to infrastructure-based networks. In addition, typically, sensor nodes have a simple design that leads to the low-cost development and thus, enables the realizing of cost-effective WSNs.

 The ability of wireless sensor nodes to self-organize and establish a network plays an important role in achieving highly scalable monitoring networks. In contrast to infrastructure-based networks, the low-complexity and the associated cost of adding new sensor nodes in a WSN provides the means to achieve a highly scalable solution. The scalability enables applications in which sensor nodes may be added or removed without halting the operation of the network which, in fact, in certain cases this characteristic might be essential in achieving the desired goals. For example, in monitoring and tracking applications consisting of non-stationary sensor nodes, some nodes may collaborate in one network for a specific time or event and then leave that network and join another nearby WSN.

 In relation to physical deployment, wireless sensor nodes provide high-flexibility as compared to infrastructure-based monitoring solutions. For example, in addition to regular deployment in which sensor nodes are placed according to pre-planned fixed locations, wireless sensor nodes can also be deployed in a random manner. Such flexibility could be highly desirable in certain applications, for example, in order to detect and monitor fire in a forest; such nodes can be dropped from an airplane. In addition, infrastructure-less operability of wireless sensor networks enables the sensor nodes to be deployed in hard-to-reach as well as non-stationary locations.

In a wireless monitoring application that may monitor the desired parameters at a few locations over a small geographical area or at hundred of locations over a large area, it is a wireless sensor node that plays a central role in realizing monitoring solutions based on WSNs. A typical wireless sensor node is a compact sized hardware unit that acquires desired data from its environment and communicates wirelessly to other nodes in a network so as to relay that data or extracted information to a central station. The architecture and data flow of a typical wireless sensor node [29] is shown in Figure 1.1. Depending upon what and how many parameters are to be monitored in an application, it may consist of one or m

ore transducers that measure physical phenomena and produce equivalent electrical outputs, that is, in the form of electrical current or voltage. The output signal from a transducer, which is typically low in amplitude, is amplified with the assistance of signal conditioning circuits so as to ensure that it matches the requirements of the digitization circuits. Following on from the amplification

the cost. In order to achieve cost-effective high-spatial resolution monitoring through densely deployed sensor nodes, as is one of main objective of WSNs [35]-[37], both the size and the cost of a sensor node is desired to be as low as possible. The small size and low-cost, in turn, set constraints on the size, type and capacity of associated energy source i.e. a battery or an, alternate, energy scavenging unit that can be integrated in a wireless sensor node. Given the limited amount of energy available in typical small size batteries and the low power-to-area (or volume) ratio of alternate energy sources [38]-[41], a small size and low-cost wireless sensor node is restricted to operate on a limited energy budget.

In order to achieve a long operation lifetime with a limited energy budget, a wireless sensor node is typically designed using modules that offer ultra low-power characteristics and enable efficient power management [34],[42]-[43]. A wireless sensor node built on such low-power modules typically lacks high performance capabilities. For example, low-power micro-controllers that are typically used in wireless sensor nodes provide limited computational performance in terms of operating frequency and on-chip memory [44]. Similarly, the low-power and low-cost wireless transceivers provide limited communication data rates. Thus, a WSN, consisting of typical wireless sensor nodes that are constrained by energy, processing, and communication resources [44]-[45], is better suited to applications that acquire data at a low-sample rate and operate in intermittent pattern [27],[46]-[48]. In such applications, a small amount of data, acquired periodically, or on the basis of an event at a sensor node is wirelessly communicated.

As low-sample rate applications require small amounts of data to be communicated, and which can easily be accomplished with the performance provided by low-power processing and communication units, the use of such low-power resources enable energy consumption to be minimized. Using such low-power resources for data intensive monitoring applications, however, not only restrict in achieving the desired monitoring goals but also leads to higher energy consumption. For example, low-power and low-cost transceivers that are used in typical wireless sensor nodes lack the required bandwidth to communicate the large amount of data across the network. Even with intermittent monitoring, a large size memory is required to store the raw data so as to transmit it using a low-data rate wireless transceiver. In that case, the long transmission time causing high energy consumption leads to a short operational lifetime. On the other hand, in an endeavour to suppress the raw data by computing the useful information at each sensor node, the processing performance and amount of memory in low-power micro-controllers that are typically used in wireless sensor nodes is often insufficient to realize applications with the large amount of data that is required to be processed using complex processing algorithms. Improving the communication bandwidth and/or processing capabilities by integrating powerful resources at the sensor node level typically leads to higher power consumption and thus, is required to be assessed in an energy consumption perspective. 5

With regards to the constraints of a wireless sensor node, as depicted in Figure 1.2, this thesis explores an energy efficient architecture that can fulfil the processing and communication requirements of data and computation intensive monitoring applications while enabling a practically feasible operational lifetime.

1.1      Data and Computation Intensive Monitoring Applications

In principle, any monitoring application that requires the monitoring of parameters for which required transducers are available can potentially be realized with a WSN. As a huge collection of transducers [49] do exist that can be incorporated in a wireless sensor node and therefore, the potential applications that could be benefit with the adoption of WSNs is limited only by one’s imagination. In this section, an overview of potential application areas for WSN followed by an application classification and design space relating to data and computation intensive wireless monitoring applications is provided.

1.1.1        Overview of applications domains of WSN

The potential applications of WSNs are limitless. Nevertheless, the applicability of WSNs can be briefly summarized in terms of existing real world application fields.

1.1.1.1 Military applications

In fact, the concept of (distributed) sensor network (DSN) can be traced back to the late 1970’s, when researchers working on a military project funded by DARPA, identified components associated with a DSN for monitoring applications [50]. The practical utilization was demonstrated by employing a custom-built prototype system, mainly relying on acoustic sensors, for tracking aircrafts flying at low altitude. Despite the successful demonstration, the technology at that time was not considered ready to expand the concept of distributed sensor network both in terms of a large scale and across different fields. However, with the recent advances in technology, numerous military related applications, which in a broader spectrum are those defining the monitoring and tracking of enemy lines as well as the surveillance of one’s own territory, are being realized with WSNs. Some of the examples in this domain include WSN-based counter sniper system [51], surveillance [20], and tracking vehicles and troops’ movement [21]-[22].

1.1.1.2 Environmental monitoring

With the emergence of low-power, low-cost, and miniature size electronics, the prospective benefits associated with DSNs have attracted both the research and 6

businesses communities to extend the concept to numerous fields, including environmental monitoring [8]-[11]. In relation to traditional environmental monitoring that relies on highly accurate but on a small number of expensive sensing systems, a WSN enables cost-effective deployment on a large scale [52]. Besides monitoring basic environmental parameters such as temperature, pollution, humidity, etc. to inform inhabitant about their surroundings, other applications in this domain include monitoring of green house gases [53], habitat monitoring [37],[54], forest fire detection [11]-[13], agriculture [23], etc.

1.1.1.3 Home automation

In relation to enabling smart homes, in which appliances are connected to a network so as to manage them intelligently, WSNs are considered an attractive choice [55].

1.1.1.4 Remote health monitoring

In relation to personal health care applications [55], continuous monitoring of patients using WSN [16]-[17], [56]-[58] could be used to detect emergency conditions early on and thus provide the necessary treatment. In addition, such solutions can also be enabled to perform additional activities such as reminding patients regarding the taking of medicines in a timely manner, managing their home appliances, etc. In comparison to keeping a patient in a hospital, a WSN could potentially facilitate the patients in performing their regular activities at home or work while being continuously examined.

1.1.1.5 Business related applications

WSNs can be incorporated in different businesses so as to increase the production, in addition to improving the management and delivery system. Example applications in this regards include monitoring soil and crop related parameters in agriculture business [23][59], monitoring and tracking of livestock in farming [60]-[62], keeping track of both the quality and geographical parameters of logistics [24], industrial monitoring and control, managing inventory related tasks [63] etc.

1.1.1.6 Industrial process management and control

There are number of industrial applications [64]-[67] that can benefit from the use of WSN. Examples include real-time monitoring of operating machinery for possible defects and/or performance degradation, monitoring different processes and their states so as to better control them, enabling factory automation, process control, monitoring of liquid and gas lines for possible leakages, monitoring of 7

contaminated areas, structural condition monitoring of buildings, mines, bridges etc.

1.1.2 Classification

In order to assess the requirements and challenges associated with designing and deploying WSNs for a diverse nature of applications as mentioned above, it is important to firstly classify these applications in relation to some relevance so that a design space can be deduced for a given set of applications. In [1] and [68], on the basis in which a sensor node interacts with a data sink, the authors classified WSN applications into monitoring, tracking and event detection. Such an abstract level classification does not provide the means to deduce an optimized design space for set of applications being addressed in this thesis. Therefore, the potential applications of WSNs are classified in relation to their (sensed) data size and computational complexity so as to explore an energy efficient architecture for data and computation intensive applications. For a diverse range of monitoring applications, as discussed in section 1.1.1, a quantitative classification based on data size and computational complexity is a challenging task. Instead, we opt to use a relative scale representing low, medium, high and intensive data size and computational complexity.

1.1.2.1 Application with small data size and low computational complexity

A large number of monitoring applications require static sensor nodes, each of which monitor only few parameters at a relatively low-frequency. In addition, the acquired data, ranging from a couple of bytes to tens of bytes are often transmitted without requiring any complex processing besides simple aggregation (addition/subtraction). Examples of such applications include general environmental monitoring, monitoring of crop & soil in agriculture, monitoring of green house gases, monitoring of different industrial processes, etc.

1.1.2.2 Applications with medium to high data size and computational complexity

Applications such as habitat and logistics monitoring, in which sensor nodes may be carried from one place to other, require a sensor node to continuously discover its surrounding nodes so as to relay the acquired data. In such an application, the size and computational complexity of data may often be low; however, the frequency of acquisition and transmission could be irregular varying the size of the data to be handled at a given time. Apart from acquiring the desired data at a low-sample rate, providing geographical related support, for example, 8

through Global Position System (GPS) could increase the data size and complexity to process it [10].

In applications such as tracking vehicles and troops, surveillance, structural health monitoring, automation and control etc. the factors that determine the data size and computational complexity include the types and number of sensors used, rate at which the data from each of the sensor is acquired, and level of the information that needs to be extracted from the raw data. Depending upon the objectives and requirements of a given application scenario, the amount of data and complexity involved in processing that data may be solved easily with a typical low-power and low-cost wireless sensor node. In other cases, these may require high-performance processing resources and/or communication transceivers to realize such applications. However, the applications that certainly involve a large amount of acquired data and require intensive processing, as explained in the following section, are dealt within this thesis.

1.1.2.3 Applications with intensive data and computational complexity

Monitoring applications which either acquire scalar data at a high-sample rate (e.g. more than few kHz) or when the data associated with each sample is large (e.g. an image frame), are best candidates for data intensive applications in relation to limited data rate transceivers [69] that are typically used in wireless sensor nodes. In addition, the applications requiring complex processing so as to extract useful information from the large amount of acquired data fall in computational intensive category. Examples of such applications include vibration- based structural and machinery health monitoring, image-based process monitoring, video-based surveillance, etc.

In order to perform quantitative analysis based on actual implementations of data and computation intensive applications, case studies based on applications of highly practical importance, i.e. industrial condition monitoring are conducted in this thesis. In industrial monitoring applications, the most widely accepted methods for analyzing the operating condition of machinery are based on vibration and oil analysis [70]-[73]. In relation to WSNs, vibration-based monitoring, in particular, multi-axes and high-frequency monitoring generate large amounts of data and require intensive signal processing to analyze that data [74]-[79].

In relation to industrial condition monitoring, oil analysis enables the machinery’s wear and tear to be assessed by accounting debris/residual particles that detach from the machinery and circulate in the oil [80]-[83]. With camera based wireless sensor nodes, by capturing images of oil passing through a small window such as of glass, debris/residual as well as other foreign particles present in the oil can easily be detected [84]-[86]. In a similar manner to that of the high-frequency vibration-based condition monitoring, image-based oil analysis.

1.1.3 Design space for data and computation intensive applications

The design aspects of a typical WSN as listed in [88] are equally important in developing WSNs for data and computation intensive monitoring applications. However, the scope and emphasis of certain aspects that are of high importance to data and computation intensive applications, especially with regards to industrial condition monitoring, are discussed in the following.

1.1.3.1 Energy

As shown in Figure 1.2, the infrastructure less and distributed nature of operation in a wireless sensor node, together with its small size and low-cost sets constraints on the type of energy sources and their capacity that can be integrated in a sensor node. Given the low area/volume to energy ratio, high-cost, and dependency on an appropriate environment for the energy scavenging methods [38]-[41], wireless sensor nodes are typically required to rely on the limited amount energy provided through integrated batteries.

From the energy consumption’s perspective of a wireless sensor node, especially with regards to applications mentioned in section 1.1.2.3 the responsible modules are sensor(s), processor and communication transceiver. As the specific requirements of an application may dictate the choice of the sensors to be used thus, leaving not many options to optimize the energy consumption associated with them. However, the energy consumption associated with processing and communication is required to be taken in to consideration. At a wireless sensor node architecture level, this requires choosing appropriate processing algorithms, integrating low-power but energy efficient modules, applying energy conserving techniques, operating nodes in duty-cycle manners, etc [34],[89].

1.1.3.2 Communication modality

Data intensive applications such as those mentioned in section 1.1.2.3 generate a large amount of data, typically in hundreds of kbps to tens of Mbps. The choice of wireless communication technology, such as radio or optical, to communicate such a large amount of data hugely impacts upon both the reliable transfer of data to a destination and the associated energy consumption [90]. The different interferences in the industrial environment can also affect the performance of wireless communication [91]-[92] and thus, are required to be taken in to consideration. In addition, the size of the network, coverage area, and any additional costs associated with obtaining wireless communication services, for 10

example, licence for the frequency spectrum or other paid services are required to be accounted for.

1.1.3.3 Quality of services (QoS)

In addition to application dependent QoS requirements such as robustness, temper/eavesdropping-resistance and unobtrusiveness, data and computation intensive applications generally tend to have real-time constraints, which could add to existing constraints regarding processing performance and/or communication throughput.

1.1.3.4 Operational lifetime

Mainly derived from the available amount of energy and its consumption for a given application, the operational lifetime highly influence the maintenance cost. With regards to industrial monitoring applications, sensor nodes may be deployed on to continuously operating machinery as well as non-stationary and hard to access locations and would, thus, require the suspending of an on-going activity to replace batteries for sensor nodes. In order to reduce such a high maintenance cost, an operational lifetime of at least several years could be highly desirable.

1.1.3.5 Cost and size

Infrastructure-less nature of WSNs generally enables a reduction in the cost associated with installing the infrastructure and its maintenance. The cost of the individual sensor nodes and the associated maintenance, for example, replacing batteries is, however, required to be taken into consideration. Therefore, for a widespread use of wireless sensor technology, it is important that not only the cost of the individual sensor nodes is reasonable for the given applications but practically viable operational lifetime could also be achieved.

In relation to industrial condition monitoring in which sensor nodes might be deployed on to the space constrained locations of the machinery, the small size and the low weight is desirable to enable effective monitoring of the desired locations without causing mechanical imbalances.

1.2 Problem statement

A WSN, as an infrastructure-less network that requires no physical connectivity to enable communication and energy supply amongst its nodes, promises to enable cost-effective monitoring solutions both for existing wire-based monitoring applications and for those in which wire-based monitoring had not been feasible previously. However, untethered energy supply, in conjunction with low-cost and small size led to the development of wireless sensor nodes with restricted 11

energy, processing and communication resources, thus limiting WSNs to low-sample rate intermittent monitoring applications.

In order to analyze the resources of such wireless sensor nodes in relation to realizing data and computation intensive monitoring applications, firstly, some of the popular wireless sensor nodes are briefly described. Following on from that, the problem description is continued.

1.2.1 An overview of existing wireless sensor nodes

Starting from WeC, the first prototype sensor node which was developed at UC Berkeley in 1998, a large number of wireless sensor nodes have been developed as a result of academic research and commercial activity. Some of the popular and relevant in this regard, as summarized in Table 1.1, are discussed in the following.

1.2.1.1 Mica motes

The WeC integrated an 8-bit micro-controller AT90LS8535, from Atmel, that was operated at 4 MHz and had 512 B of internal Static Random Access Memory (SRAM). In order to perform wireless communication, the WeC was integrated with a radio transceiver, TR1000, operating at 915 MHz frequency band that provided a transmission rate of 10 kbps. Based on the development of the WeC, a number of sensor nodes with varying design issues related to communication, micro-controller, memory, size, etc. were developed at UC Berkely and a commercial venture Crossbow. These sensor nodes such as Rene, Mica, Mica2, Mica2Dot, MicaZ and Telos integrated 8/16-bits low-power micro-controllers with SRAM of up to 10 kB, and radio transceivers with a maximum data rate of 250 kbps. These detailed specifications for each of these can be found in Table 1.1. The typical power consumption of these nodes is less than 100 mW [95].

1.2.1.2 Eyes

Based on the 8-bit micro-controller, MSP430 from Texas Instruments, Eyes nodes [69] were developed by Infineon as part of a European Union funded project to enable energy efficient sensor networks. The architecture of the EYES is quite similar to the Mica motes; however, the latest versions of EYES motes, EYESIFXv1 and EYESIFXv2, were equipped with radio transceivers from Infineon that support data transmission rates of up to 64 kbps.

CHAPTER II

RELATED WORK

This thesis draws on and builds on a variety of related and prior works in the area of design of network-on-chip buffers, virtual channel allocation (VA) and switch allocation (SA) in the router. Prior studies provide concepts and systems that will be implemented and further refined in the course of the proposed research. A. Network-on-Chip Topology W. J. Dally introduced Network-on-Chip communication and especially 2D torus architecture in [2]. Kumar et al. [5] used 2-D tile-based architecture adopting a mesh based topology. Mesh and torus are popular NoC topologies, and they have different features in terms of throughput, power consumption, and latency depending on routing algorithms [6]. In addition, SPIN network uses the fat tree topology in [3] and octagon topology is proposed in [7]. Each topology has its own characteristic. Among theses topologies, many designers like to use mesh topology because of simplicity. B. Routing Strategy The packet is routed through networks depending on a routing strategy. The routing algorithms could be one of the following two strategies. Deterministic routing such as XY routing is when the routes between given pairs of nodes are pre-programmed

and thus follow the same path between two nodes. Adaptive routing is when the path taken by a packet may depend on other packets, and each router should know network traffic status in order to avoid a congested region in advance [2]. 5

C. Dynamic Virtual Channel Allocation A generic router generally uses a statistically allocated buffer which can cause the Head-of-Line (HoL) blocking problem. [8] proposes buffer customization which decreases the queue blocking probability in order to improve the network performance. A scheme called dynamic virtual channel regulator (ViChaR) is proposed in [9] . In

the ViChaR, VCs are allocated dynamically, and buffer allocation for each VC could be different depending on network traffic. For example, many and shallow VCs are more efficient in the light traffic, and few and deeper VCs are more efficient in heavy traffic. In addition, on-chip network router meeting traffic demand must be designed to have the least number of buffers since the power consumption of the buffer dominates all the other logic such as VA, SA, and crossbar[9]. ViChaR proposes the method of increasing buffer utilization and decreasing overall power consumption. From [9] the area overhead and extra power consumption is trivial where there is a

4 percent reduction in logic area and minimal 2 percent power increase compared to equal size generic buffer implementation. Especially, it reports a 25 percent increase in performance with the same amount of buffering [9]. Dynamically Allocated Multi-Queue (DAMQ) buffer architecture is presented in [10]. This DAMQ has the unified and dynamically-allocated buffer structure. The concept of DAMQ is similar with ViChaR. DAMQ uses a fixed number of queues per input port. But this can cause the HoL blocking problem. Therefore, ViChaR assigns the buffer resource to each of the VCs according to the network traffic in order to solve the HoL problem. Fully Connected Circular Buffer (FC-CB) is explained in [11]. FC-CS is basically using a Dynamically Allocated Fully Connected (DAFC) method [12] in order to have the flexibility in diverse traffic by adapting wormhole routing and virtual channels. Hence, this FC-CB provides a low average message latency and 6 high throughput even under heavy traffic. However, FC-CB structure has a fixed number of VCs and complex logic to control circular buffer and thus causes higher dynamic power consumption.

CHAPTER III

BACKGROUND

This chapter provides background information on Network-on-Chip research areas throughout the thesis. On-chip networks share many concepts with an interconnection network for a traditional multiprocessor system. When we categorize networks, it is typically done by recognizing four key properties: topology, switching technique, routing protocol, and flow control mechanism.

A. Network Property

1. Topology

Mesh and torus network topologies are selected as the best choice in a NoC [2]. These two network topologies have simplicity of 2-D square structure. Figure 2 (a) shows a 2-D mesh network structure [5]. It is composed of a grid of horizontal and vertical lines with a router. This mesh topology is mostly used since delay among routers can be predicted in a high level. A router address is computed by the number of horizontal nodes and the number of vertical nodes. 2-D torus topology [2] is a donut-shaped structure which is made by a 2-D mesh and connection of opposite sides as we can see in Figure 2 (b). This topology has twice the bisection bandwidth of a mesh network at the cost of a doubled wire demand. But the nodes should be interleaved because all inter-node routers have the same length. In addition to the mesh and torus network

topologies, a fat-tree structure [3] is used. In M-ary fat-tree structure, the number of connections between nodes increases with a factor M towards the root of the tree. By wisely choosing the fatness of links, the network can be tailored to efficiently use any bandwidth. An octagon network was proposed by [7]. Eight processors are

linked by an octagonal ring. The delays between any two nodes are no more than two hops within the local ring. The advantage of an octagon network has scalability. For example, if a certain node can be operated as a bridge node, more Octagon network can be added using this bridge node. Figure 2 (c) and (d) show binary fat-tree and octagon topologies.

2. Switching Technique

Switching mechanisms determine how network resources are allocated for data transmission

when the input channel is connected to the output channel selected by the routing algorithm. There are typically four popular switching techniques: store-andforward, virtual cut-through, wormhole switching, and circuit switching [13]. The first three techniques are categorized into a packet-switching method. In a store-and-forward switching method, the entire packet has to be stored in the buffer when a packet arrives at an intermediate router. After a packet arrives,

the packet can be forwarded to a neighboring node which has available buffering space, available to store the entire packet. This switching technique requires a lot of buffering space more than the size of the largest packet. It should increase the on-chip area. In addition to the area, it could cause large latency because a certain packet cannot traverse to the next node until its whole packet is stored. Figure 3 shows a store-and-forward switching technique and a flow diagram.

In order to solve long latency problem in a store-and-forward switching scheme, virtual cut-through switching [14] stores a packet at an intermediate node if next routers are busy, while current node receives the incoming packet. But, it still requires a lot of buffering space in the worst case. Figure 4 shows the timing diagram for a virtual cut-through switching method. The requirement of large buffering space can be solved using the wormhole switching method [15]. In the wormhole switching method, the packets are split to flow control digits (flits) which are snaked along the route in a pipeline fashion. Therefore, it does not need to have large buffers for the whole packets but has small buffers for a few flits. A header flit build the routing path to allow other data flits to traverse in the path. A disadvantage of wormhole switching is that the length of the path is proportional to the number of flits in the packet. In addition, the header flit

is blocked by congestion, the whole chain of flits are stalled. It also blocked other flits. This is called deadlock where network is stalled because all buffers are full and circular dependency happens between nodes. The concept of virtual channels [15] is introduced to present deadlock-free routing in wormhole switching networks. This method can split one physical channel into several virtual channels. Figure 5 shows the concept of a virtual channel. For real-time streaming data, circuit switching supports a reserved, point-topoint connection between a source node and a target node. Circuit switching has two

phases: circuit establishment and message transmission. Before message transmission, a physical path from the source to the destination is reserved. A header flit arrives at the destination node, and then an acknowledgement (ACK) flit is sent back to the source node. As soon as the source node receives the ACK signal, the source node transmits an entire message at the full bandwidth of the path. The circuit is released by the destination node or by a tail flit. Even though circuit switching has the overhead of circuit connection and release phase, if a data stream is very large to amortize the overhead, circuit switching will be used ontinuously. Since most Network-on-Chip systems need less buffering space and has a low latency requirement, the wormhole switching method with a virtual channel is the most suitable switching method

3. Routing Protocol

Routing protocol is a protocol that specifies how routers communicate with each other to diffuse information that allows them to select routes between any two nodes on a network. In general, routing protocol can be either deterministic or adaptive. Deterministic routing, such as XY routing, is when the routes between given pairs of nodes are pre-programmed and thus follow the same path between two nodes. This routing protocol can cause a congested region in the network and poor utilization of the network capacity. On the other hand, adaptive routing is when the path taken by a packet may depend on other packets in order to improve performance and fault tolerance. In adaptive router, each router should know the network traffic status in order to avoid a congested region in advance [16]. In addition, modules which need heavy intercommunication should be placed close to each other to minimize congestion. [16] states that adaptive routing can

support higher performance than the deterministic routing method with deadlock-free network. However, higher performance requires a higher number of virtual channels [17]. A higher number of virtual channels can cause long latency because of design complexity. Therefore, if network traffic is not heavy and the in-order packet is delivered, the deterministic routing could be selected.

4. Flow Control Mechanism

Figure 6 shows units of resource allocation.A message is a contiguous group of bits that are delivered from a source node to a destination node. A packet is the basic unit of routing and the packet is divided into flits. A flit (flow control digit) is the basic unit of bandwidth and storage allocation. Therefore, flits do not contain any routing or sequence information and have to follow the route for the whole packet. A packet is composed of a head flit, body flits (data flits), and a tail flit. A head flitallocates channel state for a packet, and a tail flit de-allocates it. The typical valueof flits is between 16 bits to 512 bits. A phit (physical transfer digit) is the unit thatcan be transferred across a channel in a single clock cycle. The typical value of phitranges between 1 bit to 64 bits. Flow control can be examined with the same method as the switching technique.A role of flow control mechanism is to decide which data is serviced first when a physical channel has many data to be transferred. In a store-and-forward and avirtual cut-through switching method, flow control is performed at packet level, which means that an entire packet is stored in buffers and forwarded to a neighboring router which has available buffers. In a wormhole routing switching method, the packet is split into flits and thus flow control executes at flit level. The first flit goes through intermediate nodes to set up a path for the following data flit, and a tail flit closes the

4. Flow Control Mechanism

Figure 6 shows units of resource allocation. A message is a contiguous group of bitsthat are delivered from a source node to a destination node. A packet is the basic unit of routing and the packet is divided into flits. A flit (flow control digit) is the basic unit of bandwidth and storage allocation. Therefore, flits do not contain any routing or sequence information and have to follow the route for the whole packet. A packet is composed of a head flit, body flits (data flits), and a tail flit. A head flit allocates channel state for a packet, and a tail flit de-allocates it. The typical value of flits is between 16 bits to 512 bits. A phit (physical transfer digit) is the unit that can be transferred across a channel in a single clock cycle. The typical value of phit ranges between 1 bit to 64 bits. Flow control can be examined with the same method as the switching technique.

A role of flow control mechanism is to decide which data is serviced first when a physical channel has many data to be transferred. In a store-and-forward and a virtual cut-through switching method, flow control is performed at packet level, which means that an entire packet is stored in buffers and forwarded to a neighboring router which has available buffers. In a wormhole routing switching method, the packet is split into flits and thus flow control executes at flit level. The first flit goes through intermediate nodes to set up a path for the following data flit, and a tail flit closes the 13 path passing the intermediate node. With virtual channels, the wormhole router can solve deadlock problems [15]. Virtual channels share the same physical channel, but these virtual channels are logically separated with different input and output buffers.

B. Buffering in Packet Switches In crossbar switch architecture, buffering is necessary to store packet because the packets which arrive at nodes are unscheduled and should be multiplexed by control information. Three buffering cases happen in a NoC router. The first buffering condition is the output port can receive only one packet at a time when two packets arrive at the same output port at the same time. The second buffering condition is that the next stage of network is blocked and the packet in the previous stage cannot be routed into next router. And finally, a packet has to wait for arbitration time to get route path in a current router, the current router must store this packet in buffer. Therefore, the place of buffer space can be located in three parts: The Output Queue, The Input Queue, and the Central Shared Queue.

1. Output Queues

In buffer architecture, output queues can be used if output buffers are large enough to accept all input packets, and switch fabric runs at least N times faster than the speed of the input lines in an N by N switch. However, since high speed switch fabric is currently not available and output queues should have as many input ports as an input line can support, output queue buffer architecture should make logic delay large 2. Input Queues Input buffers require only one input port in a packet switch because only one packet can arrive at a time. Therefore, it can speed up performance with many input ports. That is why many researchers use input queue buffer architecture. But, the input queue buffer architecture has the Head-of-Line (HoL) blocking problem. HoL can happen while a packet in the head of queue waits for getting output port, another pacekt behind it can not proceed to go to idle output port. HoL blocking significantly reduces throughput in NoC. Figure 7 shows Head-of-Line blocking in a NoC router environment.

3. Shared Central Queues

All the input ports and output ports can access shared central buffer. For example, if the number of input ports is N and the number of output ports is N, central buffer has minimum 2N ports for all input and output ports. As N increases, access time to memory also increases which brings performance down. This large access time should occur whenever packet transmission happens. In addition to implementation difficulties, shared central buffer also causes down performance because of large access time [18].

C. Diverse Input Port Buffers

1. Single Input Queues

Figure 8 (A) shows that each input port has a single FIFO buffer and each FIFO buffer receives packets from a previous node. The packets stored are waiting for their order. In the figure 8 (A), crossbar switch has 4 input and output ports [18]. 2. SAFC (Statically Allocated Fully Connected) Each input port has separate FIFO queue for every output port in order to remove the Head-of-Line (HoL) blocking problem. If packets arrive at a corresponding queue, the packets are competing for the same output and thus the packet at the head of line cannot block other packets. Throughput in SAFC is higher than in single input queue since each input can transmit N packets every time. However, this SAFC has several drawbacks. The first one is inefficient buffer utilization because N buffers are divided by four statically allocated queues, and thus, available buffer space for each input port is only one quarter of the buffer space. Therefore, it is mandatory to route all packets in advance to know the destination output port. The second disadvantage is design complexity. It is required to manage separate buffers and crossbars [18].

3. SAFQ (Statically Allocated Multi-Queue)

In SAMQ, this resolves the second disadvantage of SAFC which is a design complexity. If packets arrive at buffers, one of the packets is forwarded to the crossbar. Therefore, it does not need to manage N crossbars. However, SAMQ still has inefficient buffer utilization [18]. Figure 8 (C) shows SAMQ design.

4. SAFQ (Statically Allocated Multi-Queue)

In DAMQ scheme, each input buffer uses a single buffer space. In order to increase buffer utilization, each queue in each input port is dynamically allocated, and this queue is maintained by a linked list. The dynamic buffer allocation significantly increase buffer usage. The control logic is composed of head and tail pointers. Whenever a packet arrives, the packet is stored in the location which the head pointer directs. While a packet is stored into free buffer space, its destination output port number will be decided. The tail pointer is responsible for pointing out location to be sent into crossbars.

D. Generic NoC Router

Generally, a NoC router has five input and output ports, each of which is for local processing element (PE) and four directions: North, South, West, and East. Each router also has five components: Routing Computation (RC) Unit, Virtual Channel Allocator (VA), Switch Allocator (SA), Flit Buffers (BUF), and Crossbar as we can see in Figure 9. When the header flit arrives at the internal flit buffer, the RC unit sends incoming flits to one of physical channels. The Virtual Channel Allocation unit receives the credit information from the neighboring routers, arbitrates all the header flits which access the same VCs, and then select one of them according to the arbitration policy. Therefore, this header flit can set up the path where the following data and tail flits can traverse this route successfully. The transmitting router sends the control information to the receiving router, and receiving router may update VC ID at the internal buffer with this control information. Switch Allocation (SA) unit arbitrates the waiting flit in all VCs accessing the crossbar and allow only one flit to get crossbar permission. The SA operation is based on the VA stage since the flit data in the buffer comes from the previous router in the route. The flit data pas over the crossbar and thus can arrive at the destination node.

ROUTER IMPLEMENTATION

A. Asynchronous Design

A Network-on-Chip router is implemented by asynchronous design methodologies which use implicit or explicit data valid signal instead of clock signals. Asynchronous design has the following benefits [20]:

1. No Clock Skew

The first advantage is that asynchronous logic has no clock skew. Clock skew is a phenomenon in synchronous circuits in which the clock signal arrives at different components at different times. Therefore, asynchronous logic does not have to worry about clock skew because it has no globally distributed clock.

2. Low Power Consumption

The second advantage is low power consumption. Asynchronous logic only execute in its own operation while synchronous circuits have to be toggled by clock signal every time which causes dynamic power consumption.

3. Low Global Wire Delay

A clock cycle of synchronous circuits depends on critical path delay. Hence, setup and hold timing issue should be carefully controlled and the logic which has critical path delay must be optimized considering the highest clock rate. However, the performance of asynchronous circuits can be calculated by the speed of the circuit path currently in operation.

27

4. Automatic Adaptation to Variation

Combinational logic delay can be changed depending on variation such as fabrication, temperature, and power-supply voltage. A clock cycle should be calculated considering the worst case variation, for example, low voltage and high temperature. However, the delay of asynchronous circuits is computed by current physical properties

B. An Arbiter Design

Many input ports which are requestor want to access a common physical channel resource. In this case, an arbiter is required to determine how the physical channel can be shared amongst many requestors. When we think about arbitration logic, we have to consider many factors.

1. Basic Concept of Arbitration

If many flits arrive at buffers from several virtual channels and these flits are destined for one physical channel, an arbiter receives request signals from buffer such as FIFO empty or full signals. These FIFO empty and full signals are generated by comparing write pointers and read pointers. For example, if a write pointer has the same value as a read pointer, FIFO empty signal will be generated. On the other hand, if a write pointer has more value than a read pointers, FIFO full signal will be made. Figure 14 shows general arbitration flow in FIFO (First In First Out) based request and a grant system. When a new flit arrives at FIFO, a write pointer gets incremented and request signal is generated. An arbiter receive N request signals and grant only one buffer, and this grant signal increases a read pointer of corresponding FIFO. This type of arbitration flow is used to implement a NoC router VA and SA logic. Fairness is a key property of an arbiter. In other words, a fair arbiter support equal service.

to the different requests. In FIFO environment, requesters are served in the order they made their requests. Even though an arbiter is fair, if traffic congestion is not fair in a NoC environment, the system cannot be fair. Figure 13 shows a example of arbitration in FIFO.

2. Fixed Priority Arbiter

One general arbitration scheme is a fixed priority arbiter. Each input port has its own fixed priority level, and an arbiter grants an active request signal with the highest priority depending on this priority level. For instance, if request[0] has the highest priority among N requests, and request[0] is active, it will be granted regardless other request signals. If request[0] is not active, the request signal with the next highest priority will be granted. In other words, the current request (lower priority) only will be served if the previous request (higher priority) has not appeared or been served already. Therefore, fixed priority arbiter can be used where there are a few requesters. The implementation of fixed priority arbiter can be done by the following

case statement [21].

3. Round-Robin Arbiter

When we use a fixed priority arbiter in NoC design, there is no limit to how long a lower priority request should wait until it receives a grant. In a round-robin arbiter, every requester can take a turn in order because a request that was just served should have the lowest priority on the next round of arbitration. A pointer register maintains which request is the next one. Hence, a round-robin arbiter is a strong fairness arbiter [21]. Figure 14 shows a example of a fixed priority arbiter code. In general, a roundrobin arbiter can use N fixed priority arbiter logic. Figure 15 shows a muxed parallel priority arbiter. The parallel priority arbiter uses N fixed priority arbiters in parallel for N requesters. Shift module shifts request to the right by t values before arbitration. Therefore, multiplexer receives all possible grant signals for round-robin arbiter. At this time, pointer should select which of the intermediate grant vectors will actually be used. This design is fast, but the area of this design is too much because of N shift module and fixed priority arbiters with a large number of requesters. For good area and timing aspect, two fixed priority arbiters with a mask can be used. In Figure 16, the lower arbiter receives all request signal and the upper arbiter first masks all request signal before one request signal is selected by the round-robin

pointer. If upper arbiter selects a grant signal, the grant signal is chosen by MUX.

4. Matrix Arbiter

Matrix arbiter uses a least recently served priority scheme by maintaining a triangle array of state bit Wij for all i < j. Wij (row i and column j) indicates that request i takes priority over request j. Only the upper triangular portion of the matrix need be maintained. Figure 17 shows how a single grant output is generated for a 4-input arbiter in matrix arbitration. This arbiter ensures that a grant is generated only

if an input request with a higher priority is not asserted. The flip-flop matrix is updated after each clock cycle to reflect the new request priorities [22]. This matrix arbiter is a favorite arbiter for small number of inputs because it is fast, inexpensive to implement, and provides strong fairness.

5. Crossbar Switch and Other Implementation

The crossbar fabric module in the design is responsible for physically connecting an input port to its destined output port, based on the grant issued by the scheduler. In every crossbar, the cross-points are controlled by the CNTRL input of the module. If a certain CNTRL bit is high, then the corresponding cross point is closed. Figure 24 shows crossbar fabric implementation in a NoC router. 5-bits control signal( direction p1, direction p2, direction p3, direction p4, direction p5) is determined by output signals of 2nd stage SA. This control signals are used by a crossbar to decide which input ports have to be connected into output ports. According to the control signals, a whole connection between source and target nodes is accomplished. RC (Route Computation) logic is not implemented in thesis project since the RC module is required to implement a whole system and RC module needs routing algorithm in a NoC router. Therefore, without RC module, we implemented a generic router using Verilog.

IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY, IEEE PROJECTS IN PONDICHERRY,

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)