The process and algorithm for integration of neural network and data aggregation in WSN

Development Process

The development process of the application proceeds as follows:

Node formation and Clustering

We formed nodes for the implementation of our network. Each node in the network represents a Wireless Sensor Unit in the real world. We have used the NetworkX package for the formation of the network. NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks. With NetworkX you can load and store networks in standard and nonstandard data formats, generate many types of random and classic networks, analyze network structure, build network models, design new network algorithms, draw networks, and much more. Clustering is the process in which the nodes are grouped according to their weights relative to other nodes. Out of all the nodes in a cluster, a single node is selected to be a cluster-head.

The code contains several section. It has a "generate_cluster()" and a "regenerate_cluster()" functions. Here the nodes are being formed and the corresponding weights of the edges are being calculated. It’s contains all the information of our network. First functionality includes the cluster creation for the first time. "generate_cluster()" function takes the graph information as a parameter and the cluster is being generated using two algorithms.(DBScan, Kmeans Clustering) DBscans calculates the no of clusters will be formed and K-means clustering generate the total number of clusters that has been specified in the DBScan algorithm. First the coordinate of the cluster is being calculated by the get_coordinate function. Then the nodes are being plotted in the network. Then the cluster has been generated by the clustering algorithms and stored in final list. After clusters are being generated the cluster head is being calculated and stored in cluster_head list. Regenerate function works the same but updates the corresponding list (i.e cluster_head, weight_matrix, final). The node generation and clustering has the following functionality:

  1. The Cluster class take the cluster size as input and it set the size variable parameter accordingly. The generate_cluster() function takes network graph information and make the cluster using following algorithm.

generate_cluster(Graph G):

  1. Get the coordinate of the total number of nodes

  2. Plot the coordinate in the networkx Graph

  3. Get the distance between the pair of nodes

  4. Calculate the number of cluster through DBScan with the help of the threshold value

  5. Form the cluster using k-means clustering

  6. Make the final list with the clusters and cluster nodes information

  1. The regenerate_cluster regenerates a cluster when needed. It works similar to the generate cluster method but do not consider the nodes who are in sleeping mode.

  2. Cluster Head Selection is done using following algorithm:


  1. Get the energy level of the nodes of the clusters

  2. Iterate through each cluster list

  3. Find the maximum energy node with respect to its latency in a cluster

  4. Set it as the cluster head and update the cluster head list


Simulation is carried out with the help of SymPy. SymPy is a Python library for symbolic mathematics. It aims at becoming a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.

Simulation includes the environment which will simulate the network. It includes the details about the cluster and the network. It functionality includes the selecting the nodes that are going to take part in the transmission. Transmission can occur in one of the following ways:

  • Transmission between nodes of same cluster

  • Transmission between  nodes of different cluster.

The functionality includes:

  1. Node selection for transmission

This functionality is done by the startTransmission() function which helps us to select arbitrary  nodes and the basic paramenter among then. startTransmission() uses getNode() and getCluster() functions for getting the curresponding nodes.

  1. Checking the energy of the nodes:

This section comprise of two parts, i.e checking the nodes energy and transmission of data between two nodes. If the energy of any node is below 20% then the nodes will be added into sleep mode and then  the clustering will be regenerated and the nodes are being reset with the new cluster.

  1. Data transmission and weight updation between nodes:

Data transmission part comprises of the getting the nodes and starting the transmission. The transmission part is comprised of two types.

  • Same cluster transmission: In same cluster transmission the data are being packed and transmitted inside the cluster making the use of Dijkstra algorithm. Dijkstra will give the shortest part inside a cluster and then the transmission algorithm will be iterated among them.

  • Different cluster transmission: In different cluster transmission the cluster has been specified and then the data is being transmitted accordingly. The data transmission includes source cluster transmission, Cluster head transmission and destination cluster transmission. Source cluster transmission includes data transmission form the source node to the source cluster head transmission. The cluster head transmission includes the source cluster head to destination cluster head transmission. The destination cluster transmission includes the data transmission from destination cluster head to the destination node.

Weight update using neural network: The weight updation of the edges is done when the data transm3its between source nodes to sink node. The weight updation is done using Neural Network by Random Forest algorithm. In Random Forest the weight updation depends upon:

  • Distance between two nodes

  • Latency between two nodes

  • Energy of the Source node(E1)

  • Energy of the Sink node(E2)

  • Energy consumption of the source node(E1_consume)

  • Energy consumption of the sink node(E2_consume)

  • Data Sent(bytes)

  • Data Received(bytes)

  • Processing

The latency between two nodes depends upon distance, Transmission rate, size of the data(in bits) and Bandwidth.  The latency is calculated using following formula:

Latency=DistanceBandwidth+1024*sizetramission rate (1)

The Energy consumption of the source and sink node is dependent on Distance and Size of the data(bits). The Energy Consumption is calculated as:

Energy_consumption=Distance*Size*c (2)

where c is a constant which varies for different devices.

By using Random forest, the updated  weight can be calculated as:

Weight=Latency*E1_consume*E2_consumeE1*E2*k            (3)

where “k” is constant used to find the updated weight.

Model training includes a dataset of hundred records which will be used for prediction. Model.pkl file is the model file which will be used to predict the weight accordingly. The accuracy of the model is 95% and the accuracy will gradually increase in more time.

Algorithm for Simulation is as follows:


  1. While true

  2.       Get two nodes randomly

  3.       Get the cluster of the two nodes

  4.       If those two are from same clusters

  5.             Find the shortest path between them

  6.             Data_transmit(source,destination)

  7.        Else

  8.              Data_transmit(source,source_cluster_head)

  9.              Data_transmit(source_cluster_head, destination_cluster_head)

  10.              Data_transmit(destination_cluster_head, destination)  

  11.   end

Data_transmit(source, destination)

  1. Find the shortest path between source and destination

  2. Iterate through each pair of path

  3.                  If the nodes are below energy threshold

  4.                       Put the node in sleep mode

  5.                       Regenerate cluster

  6.                  else

  7.                       Update the weight of edges using neural network

  8.                       Change the parameters of the nodes

Dijkstra Shortest Path Algorithm

One of the algorithm for finding the shortest path from a starting node to a target node in a weighted graph is Dijkstra’s algorithm. The algorithm creates a tree of shortest paths from the starting node, i.e the source, to all other nodes in the graph. Dijkstra will take weight matrix, source node id and destination node id as input it and will give the corresponding minimal path of traversal to any node as a list. The data transmission process will start after that. Dijkstra work as an efficient algorithm for traversal in the network that has been described in this project.


Integration part includes integrating all the sections which has been developed independently. The independent sections are network formation, the cluster generation, defining the parameters for nodes, creating the simulation environment, defining the weight matrix of cluster heads of different clusters, defining the weight matrix of each cluster and simulating the network environment. The integration uses all the Python packages that is required to implement the application. is the starting point of this project where the integration has been done.