You are on page 1of 49

A Self-Healing Framework for WSNs:

Detection and Recovery of Faulty Sensor


Nodes and Unreliable Wireless Links
Sergio Diaz
Doctoral Candidate

Dr. Diego Méndez


Doctoral Advisor

Members of the Examining Committee,


Dr. Mario Schölzel
Dr. Pedro Wightman
Dr. Manuel Pérez

1
Outline
• Motivation and state-of-the-art
• Dissertation proposal
• The self-healing framework

Topology construction: DGHS
• Information collection - fault detection: ICI
• Fault recovery: CITT
• Contributions

Conclusions and future work

2
Motivation
● Unstable wireless links [28][34]
– Unlicensed radio bands, low transmission power, interference, propagation problems
● Link quality changes over time
● Links are asymmetric
● Gray zone

● Nodes’ failures [40]


– Harsh environments → nodes prone to failure
– Component failures, nodes runs out of energy
– On-site technical service is infeasible

3
Fundamental concepts

● Redundancy: Dense → Use redundant links/nodes


– Alternate paths
– Backup nodes
● Self-organization:
– Tree, cluster
– No human intervention
– No previous topology knowledge
● Self-healing
– Detect failures
– Repair the network using link/node redundancies. 4
State-of-the-art
● Self-organization
– Tree construction and maintenance (CTP[161], RPL[86], LIBP[109])
– Trickle algorithm [173]
● Sends control packets less frequently as the topology stabilizes
● Self-healing
– Classical approaches
● Heartbeat - probes, timeout - unresponsive, threshold - deteriorate)
– Advanced approaches
● Probabilistic [136] [138], recognition of the healthy state [140] [141]
5
Dissertation proposal
● Research question: How can the resilience of WSNs to faulty sensor nodes
and unreliable wireless links be improved without exhausting its limited
resources?
● Objective: Improve the resilience of WSNs by detecting node/link failures
and repairing the network topology considering the limited resources of real
sensor nodes.

6
The self-healing framework

7
Validation of the self-healing framework

● Operating system
– Contiki
● Real testbed
– Telosb sky, re-mote
● Emulation
– Cooja
● Advantage -> Ease of
transferring the code to the
real testbed.
8
The self-healing framework

9
Topology Construction - DGHS
● Finds optimal routes in a distributed and asynchronous manner.
● Fault Detection and recovery
● Gallager-Humblet-Spira (GHS) algorithm [41]: Defines and merges fragments
Advantages

● Distributed
● Asynchronous
● Deadlock-free
● Tolerates unpredictable delay
● Message-optimal
Disadvantages - Theoretical
● GHS does not:

● Tolerate the loss of any packet

● Tolerate node failures

● Detect node/link failures


10
● Recover from failures
Topology Construction - DGHS

Faulty node – Unreliable wireless link

● The main fragment searches for disconnected fragments

● Detection and recovery:


● Heartbeat packets that detect disconnected fragments and initiate their merging

● Dynamic GHS builds a robust minimum spanning tree.

11
Topology Construction - DGHS
Mechanisms → Robust

● ReTX - ACK: Guarantees that there is no loss of packages.


● Heartbeat packet: Detect node/link failures
● Send initiate packets periodically: Search for disconnected fragments and trigger
the merging.
Pseudocode DGHS
Pseudocode GHS -------------------------------------
-------------------------------------
Initial process: Initial process:

send_initiate_packet: While (True) {


Search_for_disconnected_fragments send_initiate_heartbeat_packet: (ReTX-ACK
------------------------------------- Search_for_disconnected_fragments
Disconnected fragment found: }
Find_lowest_cost_edge -------------------------------------
------------------------------------- Disconnected fragment found:
Lowest cost edge found: Find_lowest_cost_edge
Merge_the_fragments_together -------------------------------------
------------------------------------- Lowest cost edge found:
Merge_the_fragments_together 12
-------------------------------------
Topology Construction - DGHS
● Emulation with 70 nodes

13
Topology Construction - DGHS
Tree const. DGHS consumes 9.2% less
LIBP uses 25.6% more control packets
Data col. DGHS consumes 22.1% less

DGHS converges 8.1% slower


DGHS consumes 23.2% more flash memory and 11.1% more RAM memory 14
Topology Construction - DGHS
● Conclusion:
● DGHS reduces:

● Number of control messages - Energy consumption

● At the cost of a slight increase in:

● Memory size - convergence time.

Problem Outcome Contribution

Find optimal routes in a Minimum Spanning Tree that


distributed and asynchronous minimizes the number of
manner control packets and energy Extend and evaluate a GHS-
consumption inspired algorithm in a
Reconstruct the MST when Heartbeat packets that detect wireless network setting
there are node/link failures disconnected fragments and
initiate their merging
15
The self-healing framework

16
Information collection - fault detection: ICI
● Link quality constantly changes over time (Interference Wifi)
– Maintain an updated link status
– Do not exhaust the limited resources of the nodes
● Reuse metrics from the MAC layer → Avoid the overhead of
computing new parameters
MAC metrics
RSSI
LQI
Network metrics
One hop latency
Packet loss

17
Information collection - fault detection: ICI
d1 d1
TX RX TX RX
d2
Int Int
Scenario 1: Int close to RX Scenario 2: Int close to TX

● Predictable and controllable


source of interference [177]
– CC2420 transceiver turns on
and off its carrier periodically
● CSMA: TX and RX
18
Information collection - fault detection: ICI
d1 d1
TX RX TX RX
d2
Int Int
Scenario 1: Int close to RX Scenario 2: Int close to TX

19
Information collection - fault detection: ICI
d1 d1
TX RX TX RX
d2
Int Int
Scenario 1: Int close to RX Scenario 2: Int close to TX

20
Information collection - fault detection: ICI
: Linear combination

● Empirical cumulative distribution function (ecdf)

● Statistical distances [140]


● KS distance

● CM distance

21
Information collection - fault detection: ICI

● Estimate the level of interference with an error:


● 9.5% for CM distance

● 10.8% for KS distance 22


Information collection - fault detection: ICI
● Avoid computing new parameters in the network layer
● One hop latency: Overhead → Synchronization

● Packet loss: Overhead → Sequence Number

● Find the equivalent in the MAC layer

Network metrics MAC metrics - CSMA

One hop latency Backoff time

Percentage of packets
Packet loss dropped

23
Information collection - fault detection: ICI
d1 d1
TX RX TX RX
d2
Int Int
Scenario 1: Int close to RX Scenario 2: Int close to TX

24
● Both metrics deteriorate with the level of interference
Information collection - fault detection: ICI

● Exponential weighted moving average


(EWMA)
– Discard the effect of stale samples
● Naive bayes classifier
– Estimates the level of interference on the
channel.

25
Information collection - fault detection: ICI

● Statistical distance
● Estimate the level of interference with an error:

● 9.5% for CM distance

● 10.8% for KS distance

● Naive bayes classifier


● Estimate the level of interference with an error:

● 4.6% for naive bayes classifier

● Conclusion
● The naive bayes classifier outperforms the

statistical distance method [140]

26
Information collection - fault detection: ICI
: Ratio of packet loss to latency

27
Information collection - fault detection: ICI
● Conclusion:
– ICI is able to:
● Estimate the level of interference in a wireless channel with an error
of 4.6%
● Determine whether the source of interference is close to TX or RX
Problem Outcome Contribution

Estimate the level of Level of interference with an Estimate the level of


interference avoiding the error 4.6% interference and compare the
overhead (Synchronization- statistical distances method
SeqN) with the naive bayes classifier

Determine locally whether the Binary location of the source Proposal of the coefficient ψ
source of interference is close of interference which is the ratio of packet
to the TX or RX loss to latency
28
The self-healing framework

29
Fault recovery - CITT
● Find the paths with the lowest levels of interference in a distributed manner
● Dynamically change the forwarding paths according to the current level of
interference.
● Distance vector algorithm
● Exchange beacons, fill neighbor table

Neighbor Distance

A 10+10

B 1+100

C 20+20 30
Fault recovery - CITT
● Wn is the aggregated interference in the whole path
● The nodes have global information of the path

Neighbor Distance

A 10+10

B 1+100

C 20+20

31
Fault recovery - CITT
Real testbed – 10 nodes

For node 7:
Neighbor Distance

4 50+1+1+1

6 1+50+50+1

● Find the path where the total value


of interference is the lowest
● Pass the level of interference from
the MAC layer to the network layer
32
Fault recovery – CITT
● RPL [194, 195]
Power Consumption

CITT RPL

CPU
0.21 0.06
mW

TX
3.21 0.20
mW

RX
1.20 1.51
mW

LPM
1.1 1.2
uW

33
Fault recovery – CITT

• The data packets reach the sink • RPL losses 10.3% more data packets.
2.16 times faster using CITT. 34
Fault recovery – CITT
● Conclusion:
– CITT is able to:
● Find the paths with the lowest levels of interference in a distributed
manner
● Dynamically changes the forwarding paths according to the current
level of interference.
Problem Outcome Contribution

Determine the paths with the Pass the estimation of the CITT outperforms RPL
lowest levels of interference in level of interference from the regarding latency and
a distributed manner MAC to the network layer packet reception rate
35
Paper Type/Status
Contributions
A review on self-healing and self- Published/Journal
organizing techniques for wireless
sensor networks.

CITT - Construction of an Submitted/Journal


interference-tolerant
The self-healing framework tree topology using cross-layer
information.
A multi-layer self-healing Accepted/Journal
algorithm for WSNs.
ICI — Interference characterization Published/Conference
and identification for WSN.
Estimation of the level of interference
Dynamic minimum spanning tree Accepted/Journal
construction
and maintenance for Wireless
Extend and evaluate a GHS- Sensor Networks
inspired algorithm in a wireless Dynamic gallager-humblet-spira Published/Conference
network setting algorithm for wireless sensor
networks.

DACA - Disjoint path and Published/Conference


clustering algorithm
for self-healing WSN
Minor Wireless technologies for Published/Conference
pollution
monitoring in large cities and rural 36
areas.
Conclusions

● We proposed and evaluated a conceptual framework for putting together


self-organizing and self-healing techniques
● DGHS reduces the number of control messages - energy consumption at
the cost of a slight increase in memory size - convergence time
● ICI is able to estimate the level of interference in a wireless channel with
an error of 4.6%
● CITT outperforms RPL regarding latency and packet reception rate

37
Future works

● Evaluate the framework in a larger testbed (100 nodes)


● Implement the trickle algorithm which reduces the number of control
packets as the network stabilizes
● Merge the proposed techniques into a single approach
● Include other metrics in the evaluation, such as ¿How long does it
takes to recover from a failure?

38
39
References
[28] K. Pengwon, T. Komolmis, and P. Champrasert. Solving asymmetric link problems in wsns using site link quality estimators
and dual-tree topology. In 2016 13th International Conference on Electrical Engineering/Electronics, Computer,
Telecommunications and Information Technology (ECTI-CON), pages 1–4, June 2016.

[34] A. Castagnetti, A. Pegatoquet, T. N. Le, and M. Auguin. A joint duty-cycle and transmission power management for energy
harvesting wsn. IEEE Transactions on Industrial Informatics, 10(2):928–936, May 2014

[40] Koen Langendoen, Aline Baggio, and Otto Visser. Murphy loves potatoes: Experiences from a pilot sensor network
deployment in precision agriculture. In Proceedings of the 20th International Conference on Parallel and Distributed
Processing, IPDPS’06, pages 174–174, Washington, DC, USA, 2006. IEEE Computer Society.

[41] R. G. Gallager, P. A. Humblet, and P. M. Spira. A distributed algorithm fo minimum-weight spanning trees.
ACM Trans. Program. Lang. Syst., 5(1):66–77 January 1983.
[136] Chafiq Titouna, Makhlouf Aliouat, and Mourad Gueroui. Fds: Fault detection scheme for wireless sensor networks.
Wireless Personal Communications, 86(2):549–562, 2016.

[138] Zhenjiang Zhang, Tonghuan Liu, and Wenyu Zhang. Novel paradigm for constructing masses in dempster-shafer
evidence theory for wireless sensor network’s multisource data fusion. Sensors, 14(4):7049–7065, 2014.

[140] Xiaohang Jin, Tommy W. S. Chow, Yi Sun, Jihong Shan, and Bill C. P. Lau Kuiper test and autoregressive model-based
approach for wireless sensor network fault diagnosis. Wireless Networks, 21(3):829–839, 2015.

[177] C. A. Boano, Z. He, Y. Li, T. Voigt, M. Zúñniga, and A. Willig. Controllable radio interference for experimental and testing
purposes in wireless sensor networks. In 2009 IEEE 34th Conference on Local Computer Networks, pages 865–872, Oct 2009.
40
References
[140] Xiaohang Jin, Tommy W. S. Chow, Yi Sun, Jihong Shan, and Bill C. P. Lau. Kuiper test and autoregressive model-based
approach for wireless sensor network fault diagnosis. Wireless Networks, 21(3):829–839, 2015.

[141] X. Miao, K. Liu, Y. He, Y. Liu, and D. Papadias. Agnostic diagnosis: Discovering silent failures in wireless sensor networks. In
INFOCOM, 2011 Proceedings IEEE, pages 1548–1556, April 2011.

[161] Omprakash Gnawali, Rodrigo Fonseca, Kyle Jamieson, Maria Kazandjieva, David Moss, and Philip Levis. Ctp: An efficient,
robust, and reliable collection tree protocol for wireless sensor networks. ACM Trans. Sen. Netw., 10(1):16:1–16:49, December 2013.

[86] Jean-Philippe Vasseur and Adam Dunkels. Chapter 17 - rpl routing in smart object networks. In Jean-Philippe Vasseur and Adam
Dunkels, editors, Interconnecting Smart Objects with IP, pages 251 – 288. Morgan Kaufmann, Boston, 2010.

[109] Lutando Ngqakaza and Antoine Bagula. Least Path Interference Beaconing Protocol (LIBP): A Frugal Routing Protocol for The
Internet-of-Things, pages 148–161. Springer International Publishing, Cham, 2014.
[173] P Levis, Thomas H. Clausen, Omprakash Gnawali, Jonathan Hui, and Jeong Gil Ko. The Trickle Algorithm. RFC 6206, March
2011.

[177] C. A. Boano, Z. He, Y. Li, T. Voigt, M. Zúñniga, and A. Willig. Controllable radio interference for experimental and testing
purposes in wireless sensor networks. In 2009 IEEE 34th Conference on Local Computer Networks, pages 865–872, Oct 2009.

[194] S. Min, S. Chung, and Y. Ha. An improved mobility support mechanism for downward traffic in rpl. In 2018 Tenth International
Conference on Ubiquitous and Future Networks (ICUFN), pages 311–313, July 2018.

[195] H. Kim, J. Paek, D. E. Culler, and S. Bahk. Do not lose bandwidth: Adaptive transmission power and multihop topology control.
In 2017 13th International Conference on Distributed Computing in Sensor Systems (DCOSS), volume 00, pages 99–108, June 2018.
41
DGHS – Performance Evaluation Setting

● We run 20 emulations with


different seeds and average
the results.

● Standard deviation

42
ICI – Performance Evaluation Setting
● We collect LP measurements in the MySQL database for 2,5 hours for each level of
interference
● Implicit Network Time Synchronization
– Broadcasts synchronization messages
– Reduces its periodicity in every iteration

43
CITT – Performance Evaluation Setting
● Platforms
– 9 RE-Motes (Zolertia manufacturer) (TX power 7dBm)
– 1 TelosB/Tmote Sky (Advanticsys manufacturer) (TX power 0dBm)
– 1 SmartRF transceiver evaluation board by Texas Instruments to observe the interference pattern.
● Source of interference
– We define the wave period to be 2 seconds
● If the carrier is on for 1 second, we say that the level of interference is 50%
● Rime: A set of lightweight communication primitives such as:
– Anonymous broadcast, reliable unicast, neighbor discovery, among others
● Messages of CITT
– Beacons
– Data
● IEEE 802.15.4 - Channels 15
– Channels 15, 20, 25 and 26 do not overlap with WiFi devices
44
DGHS details

● GHS determines an upper bound on the total number of control


messages
– For a graph of N nodes and E edges the total number of control
messages is at most
● 5N log2N + 2E
● Message: a message contains at most one edge weight plus log2 8N
bits
● Complexity: O(N log N)

45
Tree Topology

Blue branch
Green branch Sink

1 4

2 3 5 6

46
Topology

A B
ab

ac bd

C D
cd

47
DGHS details

● GHS determines an upper bound on the total number of control


messages
– For a graph of N nodes and E edges the total number of control
messages is at most
● 5N log2N + 2E
● Message: a message contains at most one edge weight plus log2 8N
bits
● Complexity: O(N log N)

48
Statistical Distances

● Kolmogorov-Smirnov distance (KS distance)


– KS distance is the maximum value of the absolute difference
between two ecdf (Empirical cumulative distribution function (ecdf))

● Cramér-von Mises distance (CM distance):


● CM distance is proportional to the area between two ecdf

49

You might also like