ABSTRACT
Radio communication has
highest energy consumption. To reduce the energy and power Round Robin
algorithm is used. This paper describes the design and implementation of newly
proposed folded tree architecture. Folded tree architecture has three input
phases. They are trunk and twig phase. Normally the power is reduced to 60-70%
compared to existing methods. Measurements of the silicon implementation show
an improvement of 10-20 × in terms of energy as compared to traditional modern
micro controllers found in sensor nodes. Commonly this technique is used to
wireless communication.
EXISTING SYSTEM
In existing method binary tree is used. The
disadvantages of this method is at a time only one node act as root nodes,
other node act as leaves. So at a time only one data is send. Hence the power
as well as energy is increased. Time requirement is high and interconnection is
high. The proposed approach gives the limited power and energy. The time
requirement is low as well as interconnection path is increased. So Folded tree
architecture is proposed to send the data in the way of wireless communication
technique.
EXISTING SYSTEM DRAWBACKS
The number of gate count is high so the area is not
efficient.
The latency and critical-path of the designs is very
high.
PROPOSED SYSTEM:
•
In Proposed system we have to apply the
round robin algorithm for the input nodes. This will overcome the drawbacks
created by the existing system. In this method we have to create the three tree
based inputs ,that three inputs are connected to the outputs parrelley
using mux structure.
•
At the Same time we are applying
addressing for the purpose of reducing the overhead
PROPOSED SYSTEM ADVANTAGES:
•
Less area-time-power complexity compared
with the existing designs.
•
Overall power also low compare to
existing one
HARDWARE REQUIREMENT:
•
FPGA Spartan 3/ Spartan 3 AN
•
SOFTWARE REQUIREMENTS:
•
ModelSim 6.4c
•
Xilinx ISE 9.1/13.2
Introduction & Problem Formulation
The transition from the 20th to
21st century observed the emergence of low-cost, low-power, and miniature size
electronics, enabling attractive solutions for numerous new application areas
to be created as well as facilitating several existing ones to improved. One
such example is the development of Wireless Sensor Network (WSN), providing a
low-cost alternative to both manual monitoring solutions and traditional
infrastructure-based monitoring solutions, in which both the power and the data
are required to be transported over a physical media, such as cables. In
contrast to an infrastructure-based monitoring network, a WSN is comprised of
spatially distributed sensor nodes that, in addition to sensing the
environment, are capable of communicating wirelessly to transport the acquired
data to a desired destination. In addition, the wireless communication in a WSN
also provides the means to establish a self-organizing wireless monitoring
network. A WSN finds its applications in numerous fields spanning from home
automation to industrial monitoring, and to battlefield tracking etc. [1]-[7].
Some of the notable applications include environmental monitoring [8]-[10],
fire detection in forests [11]-[13], structural health monitoring for buildings
and bridges [14]-[15], health care monitoring [16]-[17], industrial condition
monitoring [18]-[19], battlefield monitoring [20]-[22], precision agriculture
[23] and logistics monitoring [24].
In comparison to
infrastructure-based monitoring networks, the advantages associated with WSNs
are low-cost, ability to self-organize, scalability and ease of deployment
[25]-[28].
As the sensor nodes in a WSN
communicate wirelessly and, the means of energy (for example, batteries) are
integrated within them, the cost associated with the development and
maintenance of communication and power related infrastructure is low as
compared to infrastructure-based networks. In addition, typically, sensor nodes
have a simple design that leads to the low-cost development and thus, enables
the realizing of cost-effective WSNs.
The ability of wireless sensor nodes to self-organize and establish a network
plays an important role in achieving highly scalable monitoring networks. In
contrast to infrastructure-based networks, the low-complexity and the
associated cost of adding new sensor nodes in a WSN provides the means to
achieve a highly scalable solution. The scalability enables applications in
which sensor nodes may be added or removed without halting the operation of the
network which, in fact, in certain cases this characteristic might be essential
in achieving the desired goals. For example, in monitoring and tracking
applications consisting of non-stationary sensor nodes, some nodes may
collaborate in one network for a specific time or event and then leave that
network and join another nearby WSN.
In relation to
physical deployment, wireless sensor nodes provide high-flexibility as compared
to infrastructure-based monitoring solutions. For example, in addition to
regular deployment in which sensor nodes are placed according to pre-planned
fixed locations, wireless sensor nodes can also be deployed in a random manner.
Such flexibility could be highly desirable in certain applications, for
example, in order to detect and monitor fire in a forest; such nodes can be
dropped from an airplane. In addition, infrastructure-less operability of
wireless sensor networks enables the sensor nodes to be deployed in
hard-to-reach as well as non-stationary locations.
In
a wireless monitoring application that may monitor the desired parameters at a
few locations over a small geographical area or at hundred of locations over a
large area, it is a wireless sensor node that plays a central role in realizing
monitoring solutions based on WSNs. A typical wireless sensor node is a compact
sized hardware unit that acquires desired data from its environment and
communicates wirelessly to other nodes in a network so as to relay that data or
extracted information to a central station. The architecture and data flow of a
typical wireless sensor node [29] is shown in Figure 1.1. Depending upon what
and how many parameters are to be monitored in an application, it may consist
of one or m
ore
transducers that measure physical phenomena and produce equivalent electrical
outputs, that is, in the form of electrical current or voltage. The output
signal from a transducer, which is typically low in amplitude, is amplified
with the assistance of signal conditioning circuits so as to ensure that it
matches the requirements of the
digitization circuits. Following on from the amplification
the cost. In order to achieve
cost-effective high-spatial resolution monitoring through densely deployed
sensor nodes, as is one of main objective of WSNs [35]-[37], both the size and
the cost of a sensor node is desired to be as low as possible. The small size
and low-cost, in turn, set constraints on the size, type and capacity of
associated energy source i.e. a battery or an, alternate, energy scavenging
unit that can be integrated in a wireless sensor node. Given the limited amount
of energy available in typical small size batteries and the low power-to-area
(or volume) ratio of alternate energy sources [38]-[41], a small size and
low-cost wireless sensor node is restricted to operate on a limited energy
budget.
In order to achieve a long
operation lifetime with a limited energy budget, a wireless sensor node is
typically designed using modules that offer ultra low-power characteristics and
enable efficient power management [34],[42]-[43]. A wireless sensor node built
on such low-power modules typically lacks high performance capabilities. For
example, low-power micro-controllers that are typically used in wireless sensor
nodes provide limited computational performance in terms of operating frequency
and on-chip memory [44]. Similarly, the low-power and low-cost wireless
transceivers provide limited communication data rates. Thus, a WSN, consisting
of typical wireless sensor nodes that are constrained by energy, processing,
and communication resources [44]-[45], is better suited to applications that
acquire data at a low-sample rate and operate in intermittent pattern
[27],[46]-[48]. In such applications, a small amount of data, acquired
periodically, or on the basis of an event at a sensor node is wirelessly
communicated.
As low-sample rate applications
require small amounts of data to be communicated, and which can easily be
accomplished with the performance provided by low-power processing and
communication units, the use of such low-power resources enable energy consumption
to be minimized. Using such low-power resources for data intensive monitoring
applications, however, not only restrict in achieving the desired monitoring
goals but also leads to higher energy consumption. For example, low-power and
low-cost transceivers that are used in typical wireless sensor nodes lack the
required bandwidth to communicate the large amount of data across the network.
Even with intermittent monitoring, a large size memory is required to store the
raw data so as to transmit it using a low-data rate wireless transceiver. In
that case, the long transmission time causing high energy consumption leads to
a short operational lifetime. On the other hand, in an endeavour to suppress
the raw data by computing the useful information at each sensor node, the
processing performance and amount of memory in low-power micro-controllers that
are typically used in wireless sensor nodes is often insufficient to realize
applications with the large amount of data that is required to be processed
using complex processing algorithms. Improving the communication bandwidth
and/or processing capabilities by integrating powerful resources at the sensor
node level typically leads to higher power consumption and thus, is required to
be assessed in an energy consumption perspective. 5
With
regards to the constraints of a wireless sensor node, as depicted in Figure
1.2, this thesis explores an energy efficient architecture that can fulfil the
processing and communication requirements of data and computation intensive
monitoring applications while enabling a practically feasible operational
lifetime.
1.1 Data
and Computation Intensive Monitoring Applications
In
principle, any monitoring application that requires the monitoring of
parameters for which required transducers are available can potentially be
realized with a WSN. As a huge collection of transducers [49] do exist that can
be incorporated in a wireless sensor node and therefore, the potential
applications that could be benefit with the adoption of WSNs is limited only by
one’s imagination. In this section, an overview of potential application areas
for WSN followed by an application classification and design space relating to
data and computation intensive wireless monitoring applications is provided.
1.1.1
Overview of applications domains of WSN
The
potential applications of WSNs are limitless. Nevertheless, the applicability
of WSNs can be briefly summarized in terms of existing real world application
fields.
1.1.1.1 Military
applications
In
fact, the concept of (distributed) sensor network (DSN) can be traced back to
the late 1970’s, when researchers working on a military project funded by
DARPA, identified components associated with a DSN for monitoring applications
[50]. The practical utilization was demonstrated by employing a custom-built
prototype system, mainly relying on acoustic sensors, for tracking aircrafts
flying at low altitude. Despite the successful demonstration, the technology at
that time was not considered ready to expand the concept of distributed sensor
network both in terms of a large scale and across different fields. However,
with the recent advances in technology, numerous military related applications,
which in a broader spectrum are those defining the monitoring and tracking of
enemy lines as well as the surveillance of one’s own territory, are being
realized with WSNs. Some of the examples in this domain include WSN-based
counter sniper system [51], surveillance [20], and tracking vehicles and
troops’ movement [21]-[22].
1.1.1.2 Environmental monitoring
With the emergence of low-power,
low-cost, and miniature size electronics, the prospective benefits associated
with DSNs have attracted both the research and 6
businesses
communities to extend the concept to numerous fields, including environmental
monitoring [8]-[11]. In relation to traditional environmental monitoring that
relies on highly accurate but on a small number of expensive sensing systems, a
WSN enables cost-effective deployment on a large scale [52]. Besides monitoring
basic environmental parameters such as temperature, pollution, humidity, etc.
to inform inhabitant about their surroundings, other applications in this
domain include monitoring of green house gases [53], habitat monitoring
[37],[54], forest fire detection [11]-[13], agriculture [23], etc.
1.1.1.3 Home
automation
In relation to
enabling smart homes, in which appliances are connected to a network so as to
manage them intelligently, WSNs are considered an attractive choice [55].
1.1.1.4 Remote health
monitoring
In relation to
personal health care applications [55], continuous monitoring of patients using
WSN [16]-[17], [56]-[58] could be used to detect emergency conditions early on
and thus provide the necessary treatment. In addition, such solutions can also
be enabled to perform additional activities such as reminding patients
regarding the taking of medicines in a timely manner, managing their home
appliances, etc. In comparison to keeping a patient in a hospital, a WSN could
potentially facilitate the patients in performing their regular activities at
home or work while being continuously examined.
1.1.1.5 Business
related applications
WSNs can be
incorporated in different businesses so as to increase the production, in
addition to improving the management and delivery system. Example applications
in this regards include monitoring soil and crop related parameters in
agriculture business [23][59], monitoring and tracking of livestock in farming
[60]-[62], keeping track of both the quality and geographical parameters of
logistics [24], industrial monitoring and control, managing inventory related
tasks [63] etc.
1.1.1.6
Industrial process management and control
There are number
of industrial applications [64]-[67] that can benefit from the use of WSN.
Examples include real-time monitoring of operating machinery for possible
defects and/or performance degradation, monitoring different processes and
their states so as to better control them, enabling factory automation, process
control, monitoring of liquid and gas lines for possible leakages, monitoring
of 7
contaminated
areas, structural condition monitoring of buildings, mines, bridges etc.
1.1.2
Classification
In order to
assess the requirements and challenges associated with designing and deploying
WSNs for a diverse nature of applications as mentioned above, it is important
to firstly classify these applications in relation to some relevance so that a
design space can be deduced for a given set of applications. In [1] and [68],
on the basis in which a sensor node interacts with a data sink, the authors
classified WSN applications into monitoring, tracking and event detection. Such
an abstract level classification does not provide the means to deduce an
optimized design space for set of applications being addressed in this thesis.
Therefore, the potential applications of WSNs are classified in relation to
their (sensed) data size and computational complexity so as to explore an
energy efficient architecture for data and computation intensive applications.
For a diverse range of monitoring applications, as discussed in section 1.1.1,
a quantitative classification based on data size and computational complexity
is a challenging task. Instead, we opt to use a relative scale representing
low, medium, high and intensive data size and computational complexity.
1.1.2.1
Application with small data size and low computational complexity
A large number
of monitoring applications require static sensor nodes, each of which monitor
only few parameters at a relatively low-frequency. In addition, the acquired
data, ranging from a couple of bytes to tens of bytes are often transmitted
without requiring any complex processing besides simple aggregation
(addition/subtraction). Examples of such applications include general
environmental monitoring, monitoring of crop & soil in agriculture,
monitoring of green house gases, monitoring of different industrial processes,
etc.
1.1.2.2
Applications with medium to high data size and computational complexity
Applications
such as habitat and logistics monitoring, in which sensor nodes may be carried
from one place to other, require a sensor node to continuously discover its
surrounding nodes so as to relay the acquired data. In such an application, the
size and computational complexity of data may often be low; however, the
frequency of acquisition and transmission could be irregular varying the size
of the data to be handled at a given time. Apart from acquiring the desired
data at a low-sample rate, providing geographical related support, for example,
8
through
Global Position System (GPS) could increase the data size and complexity to
process it [10].
In applications
such as tracking vehicles and troops, surveillance, structural health
monitoring, automation and control etc. the factors that determine the data
size and computational complexity include the types and number of sensors used,
rate at which the data from each of the sensor is acquired, and level of the
information that needs to be extracted from the raw data. Depending upon the
objectives and requirements of a given application scenario, the amount of data
and complexity involved in processing that data may be solved easily with a
typical low-power and low-cost wireless sensor node. In other cases, these may
require high-performance processing resources and/or communication transceivers
to realize such applications. However, the applications that certainly involve
a large amount of acquired data and require intensive processing, as explained
in the following section, are dealt within this thesis.
1.1.2.3
Applications with intensive data and computational complexity
Monitoring
applications which either acquire scalar data at a high-sample rate (e.g. more
than few kHz) or when the data associated with each sample is large (e.g. an
image frame), are best candidates for data intensive applications in relation
to limited data rate transceivers [69] that are typically used in wireless
sensor nodes. In addition, the applications requiring complex processing so as to
extract useful information from the large amount of acquired data fall in
computational intensive category. Examples of such applications include
vibration- based structural and machinery health monitoring, image-based
process monitoring, video-based surveillance, etc.
In order to
perform quantitative analysis based on actual implementations of data and
computation intensive applications, case studies based on applications of
highly practical importance, i.e. industrial condition monitoring are conducted
in this thesis. In industrial monitoring applications, the most widely accepted
methods for analyzing the operating condition of machinery are based on
vibration and oil analysis [70]-[73]. In relation to WSNs, vibration-based
monitoring, in particular, multi-axes and high-frequency monitoring generate
large amounts of data and require intensive signal processing to analyze that
data [74]-[79].
In relation to
industrial condition monitoring, oil analysis enables the machinery’s wear and
tear to be assessed by accounting debris/residual particles that detach from
the machinery and circulate in the oil [80]-[83]. With camera based wireless
sensor nodes, by capturing images of oil passing through a small window such as
of glass, debris/residual as well as other foreign particles present in the oil
can easily be detected [84]-[86]. In a similar manner to that of the
high-frequency vibration-based condition monitoring, image-based oil analysis.
1.1.3 Design space for data and
computation intensive applications
The design aspects of a typical
WSN as listed in [88] are equally important in developing WSNs for data and
computation intensive monitoring applications. However, the scope and emphasis
of certain aspects that are of high importance to data and computation
intensive applications, especially with regards to industrial condition
monitoring, are discussed in the following.
1.1.3.1 Energy
As shown in Figure 1.2, the
infrastructure less and distributed nature of operation in a wireless sensor
node, together with its small size and low-cost sets constraints on the type of
energy sources and their capacity that can be integrated in a sensor node. Given
the low area/volume to energy ratio, high-cost, and dependency on an
appropriate environment for the energy scavenging methods [38]-[41], wireless
sensor nodes are typically required to rely on the limited amount energy
provided through integrated batteries.
From the energy consumption’s
perspective of a wireless sensor node, especially with regards to applications
mentioned in section 1.1.2.3 the responsible modules are sensor(s), processor
and communication transceiver. As the specific requirements of an application
may dictate the choice of the sensors to be used thus, leaving not many options
to optimize the energy consumption associated with them. However, the energy
consumption associated with processing and communication is required to be
taken in to consideration. At a wireless sensor node architecture level, this
requires choosing appropriate processing algorithms, integrating low-power but
energy efficient modules, applying energy conserving techniques, operating
nodes in duty-cycle manners, etc [34],[89].
1.1.3.2 Communication modality
Data intensive applications such
as those mentioned in section 1.1.2.3 generate a large amount of data,
typically in hundreds of kbps to tens of Mbps. The choice of wireless
communication technology, such as radio or optical, to communicate such a large
amount of data hugely impacts upon both the reliable transfer of data to a
destination and the associated energy consumption [90]. The different
interferences in the industrial environment can also affect the performance of
wireless communication [91]-[92] and thus, are required to be taken in to
consideration. In addition, the size of the network, coverage area, and any
additional costs associated with obtaining wireless communication services, for
10
example,
licence for the frequency spectrum or other paid services are required to be
accounted for.
1.1.3.3 Quality
of services (QoS)
In addition to
application dependent QoS requirements such as robustness,
temper/eavesdropping-resistance and unobtrusiveness, data and computation
intensive applications generally tend to have real-time constraints, which
could add to existing constraints regarding processing performance and/or
communication throughput.
1.1.3.4
Operational lifetime
Mainly derived
from the available amount of energy and its consumption for a given
application, the operational lifetime highly influence the maintenance cost.
With regards to industrial monitoring applications, sensor nodes may be
deployed on to continuously operating machinery as well as non-stationary and
hard to access locations and would, thus, require the suspending of an on-going
activity to replace batteries for sensor nodes. In order to reduce such a high
maintenance cost, an operational lifetime of at least several years could be
highly desirable.
1.1.3.5 Cost and
size
Infrastructure-less
nature of WSNs generally enables a reduction in the cost associated with
installing the infrastructure and its maintenance. The cost of the individual
sensor nodes and the associated maintenance, for example, replacing batteries
is, however, required to be taken into consideration. Therefore, for a
widespread use of wireless sensor technology, it is important that not only the
cost of the individual sensor nodes is reasonable for the given applications
but practically viable operational lifetime could also be achieved.
In relation to
industrial condition monitoring in which sensor nodes might be deployed on to
the space constrained locations of the machinery, the small size and the low
weight is desirable to enable effective monitoring of the desired locations
without causing mechanical imbalances.
1.2 Problem
statement
A WSN, as an
infrastructure-less network that requires no physical connectivity to enable
communication and energy supply amongst its nodes, promises to enable
cost-effective monitoring solutions both for existing wire-based monitoring
applications and for those in which wire-based monitoring had not been feasible
previously. However, untethered energy supply, in conjunction with low-cost and
small size led to the development of wireless sensor nodes with restricted 11
energy,
processing and communication resources, thus limiting WSNs to low-sample rate
intermittent monitoring applications.
In order to
analyze the resources of such wireless sensor nodes in relation to realizing
data and computation intensive monitoring applications, firstly, some of the
popular wireless sensor nodes are briefly described. Following on from that,
the problem description is continued.
1.2.1 An
overview of existing wireless sensor nodes
Starting from
WeC, the first prototype sensor node which was developed at UC Berkeley in
1998, a large number of wireless sensor nodes have been developed as a result
of academic research and commercial activity. Some of the popular and relevant
in this regard, as summarized in Table 1.1, are discussed in the following.
1.2.1.1 Mica
motes
The WeC
integrated an 8-bit micro-controller AT90LS8535, from Atmel, that was operated
at 4 MHz and had 512 B of internal Static Random Access Memory (SRAM). In order
to perform wireless communication, the WeC was integrated with a radio
transceiver, TR1000, operating at 915 MHz frequency band that provided a
transmission rate of 10 kbps. Based on the development of the WeC, a number of
sensor nodes with varying design issues related to communication,
micro-controller, memory, size, etc. were developed at UC Berkely and a
commercial venture Crossbow. These sensor nodes such as Rene, Mica, Mica2,
Mica2Dot, MicaZ and Telos integrated 8/16-bits low-power micro-controllers with
SRAM of up to 10 kB, and radio transceivers with a maximum data rate of 250
kbps. These detailed specifications for each of these can be found in Table
1.1. The typical power consumption of these nodes is less than 100 mW [95].
1.2.1.2 Eyes
Based on the
8-bit micro-controller, MSP430 from Texas Instruments, Eyes nodes [69] were
developed by Infineon as part of a European Union funded project to enable
energy efficient sensor networks. The architecture of the EYES is quite similar
to the Mica motes; however, the latest versions of EYES motes, EYESIFXv1 and
EYESIFXv2, were equipped with radio transceivers from Infineon that support
data transmission rates of up to 64 kbps.
CHAPTER
II
RELATED
WORK
This
thesis draws on and builds on a variety of related and prior works in the area of
design of network-on-chip buffers, virtual channel allocation (VA) and switch
allocation (SA) in the router. Prior studies provide concepts and systems that
will be implemented and further refined in the course of the proposed research.
A. Network-on-Chip Topology W. J. Dally introduced Network-on-Chip
communication and especially 2D torus architecture in [2]. Kumar et al. [5]
used 2-D tile-based architecture adopting a mesh based topology. Mesh and torus
are popular NoC topologies, and they have different features in terms of
throughput, power consumption, and latency depending on routing algorithms [6].
In addition, SPIN network uses the fat tree topology in [3] and octagon
topology is proposed in [7]. Each topology has its own characteristic. Among theses
topologies, many designers like to use mesh topology because of simplicity. B.
Routing Strategy The packet is routed through networks depending on a routing
strategy. The routing algorithms could be one of the following two strategies.
Deterministic routing such as XY routing is when the routes between given pairs
of nodes are pre-programmed
and
thus follow the same path between two nodes. Adaptive routing is when the path
taken by a packet may depend on other packets, and each router should know network
traffic status in order to avoid a congested region in advance [2]. 5
C.
Dynamic Virtual Channel Allocation A generic router generally uses a statistically
allocated buffer which can cause the Head-of-Line (HoL) blocking problem. [8]
proposes buffer customization which decreases the queue blocking probability in
order to improve the network performance. A scheme called dynamic virtual
channel regulator (ViChaR) is proposed in [9] . In
the
ViChaR, VCs are allocated dynamically, and buffer allocation for each VC could be
different depending on network traffic. For example, many and shallow VCs are more
efficient in the light traffic, and few and deeper VCs are more efficient in
heavy traffic. In addition, on-chip network router meeting traffic demand must
be designed to have the least number of buffers since the power consumption of
the buffer dominates all the other logic such as VA, SA, and crossbar[9].
ViChaR proposes the method of increasing buffer utilization and decreasing
overall power consumption. From [9] the area overhead and extra power
consumption is trivial where there is a
4
percent reduction in logic area and minimal 2 percent power increase compared
to equal size generic buffer implementation. Especially, it reports a 25
percent increase in performance with the same amount of buffering [9]. Dynamically
Allocated Multi-Queue (DAMQ) buffer architecture is presented in [10]. This DAMQ
has the unified and dynamically-allocated buffer structure. The concept of DAMQ
is similar with ViChaR. DAMQ uses a fixed number of queues per input port. But
this can cause the HoL blocking problem. Therefore, ViChaR assigns the buffer
resource to each of the VCs according to the network traffic in order to solve
the HoL problem. Fully Connected Circular Buffer (FC-CB) is explained in [11]. FC-CS
is basically using a Dynamically Allocated Fully Connected (DAFC) method [12]
in order to have the flexibility in diverse traffic by adapting wormhole
routing and virtual channels. Hence, this FC-CB provides a low average message
latency and 6 high throughput even under heavy traffic. However, FC-CB
structure has a fixed number of VCs and complex logic to control circular
buffer and thus causes higher dynamic power consumption.
CHAPTER III
BACKGROUND
This
chapter provides background information on Network-on-Chip research areas throughout
the thesis. On-chip networks share many concepts with an interconnection network
for a traditional multiprocessor system. When we categorize networks, it is typically
done by recognizing four key properties: topology, switching technique, routing
protocol, and flow control mechanism.
A.
Network Property
1.
Topology
Mesh
and torus network topologies are selected as the best choice in a NoC [2].
These two network topologies have simplicity of 2-D square structure. Figure 2
(a) shows a 2-D mesh network structure [5]. It is composed of a grid of
horizontal and vertical lines with a router. This mesh topology is mostly used
since delay among routers can be predicted in a high level. A router address is
computed by the number of horizontal nodes and the number of vertical nodes.
2-D torus topology [2] is a donut-shaped structure which is made by a 2-D mesh
and connection of opposite sides as we can see in Figure 2 (b). This topology
has twice the bisection bandwidth of a mesh network at the cost of a doubled
wire demand. But the nodes should be interleaved because all inter-node routers
have the same length. In addition to the mesh and torus network
topologies,
a fat-tree structure [3] is used. In M-ary fat-tree structure, the number of
connections between nodes increases with a factor M towards the root of the
tree. By wisely choosing the fatness of links, the network can be tailored to
efficiently use any bandwidth. An octagon network was proposed by [7]. Eight
processors are
linked
by an octagonal ring. The delays between any two nodes are no more than two hops
within the local ring. The advantage of an octagon network has scalability. For
example, if a certain node can be operated as a bridge node, more Octagon
network can be added using this bridge node. Figure 2 (c) and (d) show binary
fat-tree and octagon topologies.
2.
Switching Technique
Switching
mechanisms determine how network resources are allocated for data transmission
when
the input channel is connected to the output channel selected by the routing
algorithm. There are typically four popular switching techniques:
store-andforward, virtual cut-through, wormhole switching, and circuit
switching [13]. The first three techniques are categorized into a
packet-switching method. In a store-and-forward switching method, the entire
packet has to be stored in the buffer when a packet arrives at an intermediate
router. After a packet arrives,
the
packet can be forwarded to a neighboring node which has available buffering space,
available to store the entire packet. This switching technique requires a lot of
buffering space more than the size of the largest packet. It should increase the
on-chip area. In addition to the area, it could cause large latency because a
certain packet cannot traverse to the next node until its whole packet is
stored. Figure 3 shows a store-and-forward switching technique and a flow
diagram.
In
order to solve long latency problem in a store-and-forward switching scheme, virtual
cut-through switching [14] stores a packet at an intermediate node if next routers
are busy, while current node receives the incoming packet. But, it still
requires a lot of buffering space in the worst case. Figure 4 shows the timing
diagram for a virtual cut-through switching method. The requirement of large
buffering space can be solved using the wormhole switching method [15]. In the
wormhole switching method, the packets are split to flow control digits (flits)
which are snaked along the route in a pipeline fashion. Therefore, it does not
need to have large buffers for the whole packets but has small buffers for a
few flits. A header flit build the routing path to allow other data flits to
traverse in the path. A disadvantage of wormhole switching is that the length
of the path is proportional to the number of flits in the packet. In addition,
the header flit
is
blocked by congestion, the whole chain of flits are stalled. It also blocked
other flits. This is called deadlock where network is stalled because all
buffers are full and circular dependency happens between nodes. The concept of
virtual channels [15] is introduced to present deadlock-free routing in wormhole
switching networks. This method can split one physical channel into several virtual
channels. Figure 5 shows the concept of a virtual channel. For real-time
streaming data, circuit switching supports a reserved, point-topoint connection
between a source node and a target node. Circuit switching has two
phases:
circuit establishment and message transmission. Before message transmission, a
physical path from the source to the destination is reserved. A header flit
arrives at the destination node, and then an acknowledgement (ACK) flit is sent
back to the source node. As soon as the source node receives the ACK signal,
the source node transmits an entire message at the full bandwidth of the path.
The circuit is released by the destination node or by a tail flit. Even though
circuit switching has the overhead of circuit connection and release phase, if a
data stream is very large to amortize the overhead, circuit switching will be
used ontinuously. Since most Network-on-Chip systems need less buffering space
and has a low latency requirement, the wormhole switching method with a virtual
channel is the most suitable switching method
3.
Routing Protocol
Routing
protocol is a protocol that specifies how routers communicate with each other
to diffuse information that allows them to select routes between any two nodes on
a network. In general, routing protocol can be either deterministic or
adaptive. Deterministic routing, such as XY routing, is when the routes between
given pairs of nodes are pre-programmed and thus follow the same path between
two nodes. This routing protocol can cause a congested region in the network
and poor utilization of the network capacity. On the other hand, adaptive
routing is when the path taken by a packet may depend on other packets in order
to improve performance and fault tolerance. In adaptive router, each router
should know the network traffic status in order to avoid a congested region in
advance [16]. In addition, modules which need heavy intercommunication should
be placed close to each other to minimize congestion. [16] states that adaptive
routing can
support
higher performance than the deterministic routing method with deadlock-free network.
However, higher performance requires a higher number of virtual channels [17].
A higher number of virtual channels can cause long latency because of design complexity.
Therefore, if network traffic is not heavy and the in-order packet is delivered,
the deterministic routing could be selected.
4.
Flow Control Mechanism
Figure
6 shows units of resource allocation.A message is a contiguous group of bits that
are delivered from a source node to a destination node. A packet is the basic unit
of routing and the packet is divided into flits. A flit (flow control digit) is
the basic unit of bandwidth and storage allocation. Therefore, flits do not
contain any routing or sequence information and have to follow the route for
the whole packet. A packet is composed of a head flit, body flits (data flits),
and a tail flit. A head flitallocates channel state for a packet, and a tail
flit de-allocates it. The typical valueof flits is between 16 bits to 512 bits.
A phit (physical transfer digit) is the unit thatcan be transferred across a
channel in a single clock cycle. The typical value of phitranges between 1 bit
to 64 bits. Flow control can be examined with the same method as the switching
technique.A role of flow control mechanism is to decide which data is serviced
first when a physical channel has many data to be transferred. In a
store-and-forward and avirtual cut-through switching method, flow control is performed
at packet level, which means that an entire packet is stored in buffers and
forwarded to a neighboring router which has available buffers. In a wormhole
routing switching method, the packet is split into flits and thus flow control
executes at flit level. The first flit goes through intermediate nodes to set
up a path for the following data flit, and a tail flit closes the
4.
Flow Control Mechanism
Figure
6 shows units of resource allocation. A message is a contiguous group of bitsthat
are delivered from a source node to a destination node. A packet is the basic unit
of routing and the packet is divided into flits. A flit (flow control digit) is
the basic unit of bandwidth and storage allocation. Therefore, flits do not
contain any routing or sequence information and have to follow the route for
the whole packet. A packet is composed of a head flit, body flits (data flits),
and a tail flit. A head flit allocates channel state for a packet, and a tail
flit de-allocates it. The typical value of flits is between 16 bits to 512
bits. A phit (physical transfer digit) is the unit that can be transferred
across a channel in a single clock cycle. The typical value of phit ranges
between 1 bit to 64 bits. Flow control can be examined with the same method as
the switching technique.
A
role of flow control mechanism is to decide which data is serviced first when a
physical channel has many data to be transferred. In a store-and-forward and a virtual
cut-through switching method, flow control is performed at packet level, which means
that an entire packet is stored in buffers and forwarded to a neighboring
router which has available buffers. In a wormhole routing switching method, the
packet is split into flits and thus flow control executes at flit level. The
first flit goes through intermediate nodes to set up a path for the following
data flit, and a tail flit closes the 13 path passing the intermediate node.
With virtual channels, the wormhole router can solve deadlock problems [15].
Virtual channels share the same physical channel, but these virtual channels
are logically separated with different input and output buffers.
B.
Buffering in Packet Switches In crossbar switch architecture, buffering is
necessary to store packet because the packets which arrive at nodes are
unscheduled and should be multiplexed by control information. Three buffering
cases happen in a NoC router. The first buffering condition is the output port
can receive only one packet at a time when two packets arrive at the same
output port at the same time. The second buffering condition is that the next
stage of network is blocked and the packet in the previous stage cannot be
routed into next router. And finally, a packet has to wait for arbitration time
to get route path in a current router, the current router must store this
packet in buffer. Therefore, the place of buffer space can be located in three
parts: The Output Queue, The Input Queue, and the Central Shared Queue.
1.
Output Queues
In
buffer architecture, output queues can be used if output buffers are large
enough to accept all input packets, and switch fabric runs at least N times faster
than the speed of the input lines in an N by N switch. However, since high
speed switch fabric is currently not available and output queues should have as
many input ports as an input line can support, output queue buffer architecture
should make logic delay large 2. Input Queues Input buffers require only one
input port in a packet switch because only one packet can arrive at a time.
Therefore, it can speed up performance with many input ports. That is why many
researchers use input queue buffer architecture. But, the input queue buffer
architecture has the Head-of-Line (HoL) blocking problem. HoL can happen while
a packet in the head of queue waits for getting output port, another pacekt
behind it can not proceed to go to idle output port. HoL blocking significantly
reduces throughput in NoC. Figure 7 shows Head-of-Line blocking in a NoC router
environment.
3.
Shared Central Queues
All
the input ports and output ports can access shared central buffer. For example,
if the number of input ports is N and the number of output ports is N, central
buffer has minimum 2N ports for all input and output ports. As N increases,
access time to memory also increases which brings performance down. This large
access time should occur whenever packet transmission happens. In addition to
implementation difficulties, shared central buffer also causes down performance
because of large access time [18].
C.
Diverse Input Port Buffers
1.
Single Input Queues
Figure
8 (A) shows that each input port has a single FIFO buffer and each FIFO buffer
receives packets from a previous node. The packets stored are waiting for their
order. In the figure 8 (A), crossbar switch has 4 input and output ports [18]. 2.
SAFC (Statically Allocated Fully Connected) Each input port has separate FIFO
queue for every output port in order to remove the Head-of-Line (HoL) blocking
problem. If packets arrive at a corresponding queue, the packets are competing
for the same output and thus the packet at the head of line cannot block other
packets. Throughput in SAFC is higher than in single input queue since each
input can transmit N packets every time. However, this SAFC has several
drawbacks. The first one is inefficient buffer utilization because N buffers
are divided by four statically allocated queues, and thus, available buffer
space for each input port is only one quarter of the buffer space. Therefore,
it is mandatory to route all packets in advance to know the destination output
port. The second disadvantage is design complexity. It is required to manage
separate buffers and crossbars [18].
3.
SAFQ (Statically Allocated Multi-Queue)
In
SAMQ, this resolves the second disadvantage of SAFC which is a design
complexity. If packets arrive at buffers, one of the packets is forwarded to
the crossbar. Therefore, it does not need to manage N crossbars. However, SAMQ
still has inefficient buffer utilization [18]. Figure 8 (C) shows SAMQ design.
4.
SAFQ (Statically Allocated Multi-Queue)
In
DAMQ scheme, each input buffer uses a single buffer space. In order to increase
buffer utilization, each queue in each input port is dynamically allocated, and
this queue is maintained by a linked list. The dynamic buffer allocation
significantly increase buffer usage. The control logic is composed of head and
tail pointers. Whenever a packet arrives, the packet is stored in the location
which the head pointer directs. While a packet is stored into free buffer space,
its destination output port number will be decided. The tail pointer is
responsible for pointing out location to be sent into crossbars.
D.
Generic NoC Router
Generally,
a NoC router has five input and output ports, each of which is for local processing
element (PE) and four directions: North, South, West, and East. Each router
also has five components: Routing Computation (RC) Unit, Virtual Channel Allocator
(VA), Switch Allocator (SA), Flit Buffers (BUF), and Crossbar as we can see in
Figure 9. When the header flit arrives at the internal flit buffer, the RC unit
sends incoming flits to one of physical channels. The Virtual Channel
Allocation unit receives the credit information from the neighboring routers,
arbitrates all the header flits which access the same VCs, and then select one
of them according to the arbitration policy. Therefore, this header flit can
set up the path where the following data and tail flits can traverse this route
successfully. The transmitting router sends the control information to the
receiving router, and receiving router may update VC ID at the internal buffer
with this control information. Switch Allocation (SA) unit arbitrates the
waiting flit in all VCs accessing the crossbar and allow only one flit to get
crossbar permission. The SA operation is based on the VA stage since the flit data
in the buffer comes from the previous router in the route. The flit data
pas over the crossbar and thus can
arrive at the destination node.
ROUTER
IMPLEMENTATION
A.
Asynchronous Design
A
Network-on-Chip router is implemented by asynchronous design methodologies which
use implicit or explicit data valid signal instead of clock signals.
Asynchronous design has the following benefits [20]:
1.
No Clock Skew
The
first advantage is that asynchronous logic has no clock skew. Clock skew is a
phenomenon in synchronous circuits in which the clock signal arrives at
different components at different times. Therefore, asynchronous logic does not
have to worry about clock skew because it has no globally distributed clock.
2.
Low Power Consumption
The
second advantage is low power consumption. Asynchronous logic only execute in its
own operation while synchronous circuits have to be toggled by clock signal
every time which causes dynamic power consumption.
3.
Low Global Wire Delay
A
clock cycle of synchronous circuits depends on critical path delay. Hence,
setup and hold timing issue should be carefully controlled and the logic which
has critical path delay must be optimized considering the highest clock rate.
However, the performance of asynchronous circuits can be calculated by the
speed of the circuit path currently in operation.
27
4.
Automatic Adaptation to Variation
Combinational
logic delay can be changed depending on variation such as fabrication, temperature,
and power-supply voltage. A clock cycle should be calculated considering the
worst case variation, for example, low voltage and high temperature. However, the
delay of asynchronous circuits is computed by current physical properties
B.
An Arbiter Design
Many
input ports which are requestor want to access a common physical channel resource.
In this case, an arbiter is required to determine how the physical channel can
be shared amongst many requestors. When we think about arbitration logic, we have
to consider many factors.
1.
Basic Concept of Arbitration
If
many flits arrive at buffers from several virtual channels and these flits are
destined for one physical channel, an arbiter receives request signals from
buffer such as FIFO empty or full signals. These FIFO empty and full signals
are generated by comparing write pointers and read pointers. For example, if a
write pointer has the same value as a read pointer, FIFO empty signal will be
generated. On the other hand, if a write pointer has more value than a read
pointers, FIFO full signal will be made. Figure 14 shows general arbitration
flow in FIFO (First In First Out) based request and a grant system. When a new
flit arrives at FIFO, a write pointer gets incremented and request signal is
generated. An arbiter receive N request signals and grant only one buffer, and
this grant signal increases a read pointer of corresponding FIFO. This type of
arbitration flow is used to implement a NoC router VA and SA logic. Fairness is
a key property of an arbiter. In other words, a fair arbiter support equal
service.
to
the different requests. In FIFO environment, requesters are served in the order
they made their requests. Even though an arbiter is fair, if traffic congestion
is not fair in a NoC environment, the system cannot be fair. Figure 13 shows a
example of arbitration in FIFO.
2.
Fixed Priority Arbiter
One
general arbitration scheme is a fixed priority arbiter. Each input port has its
own fixed priority level, and an arbiter grants an active request signal with
the highest priority depending on this priority level. For instance, if
request[0] has the highest priority among N requests, and request[0] is active,
it will be granted regardless other request signals. If request[0] is not
active, the request signal with the next highest priority will be granted. In
other words, the current request (lower priority) only will be served if the
previous request (higher priority) has not appeared or been served already.
Therefore, fixed priority arbiter can be used where there are a few requesters.
The implementation of fixed priority arbiter can be done by the following
case
statement [21].
3.
Round-Robin Arbiter
When
we use a fixed priority arbiter in NoC design, there is no limit to how long a lower
priority request should wait until it receives a grant. In a round-robin arbiter,
every requester can take a turn in order because a request that was just served
should have the lowest priority on the next round of arbitration. A pointer
register maintains which request is the next one. Hence, a round-robin arbiter
is a strong fairness arbiter [21]. Figure 14 shows a example of a fixed
priority arbiter code. In general, a roundrobin arbiter can use N fixed
priority arbiter logic. Figure 15 shows a muxed parallel priority arbiter. The
parallel priority arbiter uses N fixed priority arbiters in parallel for N
requesters. Shift module shifts request to the right by t values before
arbitration. Therefore, multiplexer receives all possible grant signals for
round-robin arbiter. At this time, pointer should select which of the intermediate
grant vectors will actually be used. This design is fast, but the area of this
design is too much because of N shift module and fixed priority arbiters with a
large number of requesters. For good area and timing aspect, two fixed priority
arbiters with a mask can be used. In Figure 16, the lower arbiter receives all
request signal and the upper arbiter first masks all request signal before one
request signal is selected by the round-robin
pointer.
If upper arbiter selects a grant signal, the grant signal is chosen by MUX.
4.
Matrix Arbiter
Matrix
arbiter uses a least recently served priority scheme by maintaining a triangle array
of state bit Wij for all i < j. Wij (row i and column j) indicates that request
i takes priority over request j. Only the upper triangular portion of the
matrix need be maintained. Figure 17 shows how a single grant output is
generated for a 4-input arbiter in matrix arbitration. This arbiter ensures
that a grant is generated only
if
an input request with a higher priority is not asserted. The flip-flop matrix
is updated after each clock cycle to reflect the new request priorities [22].
This matrix arbiter is a favorite arbiter for small number of inputs because it
is fast, inexpensive to implement, and provides strong fairness.
5.
Crossbar Switch and Other Implementation
No comments:
Post a Comment