5 Different Types of Neural Networks

A Comprehensive Guide to Neural Networks |A mostly complete chart of Neural Networks explained with the architecture of different types of Neural Networks.

Get access to all Deep Learning Projects View all Deep Learning Projects

Last Updated: 11 Apr 2024 | BY Manika

-A mostly complete chart of neural networks is here- Understand the idea behind the neural network algorithm, the definition of a neural network, the mathematics behind the neural network algorithm, and the different types of neural networks to become a neural network pro.

Let's Have Some Fun Before That ...Game Time!

Instead of starting with a mostly complete neural network chart, let us play a fun game first. Below you'll find a mixture of red balls and black circles; your task is to count the number of balls of each color.

different types of neural networks

Too easy, right? Well, for most humans, it is. But, what if I wanted a computer to solve this task? Is it possible for it to do that? It turns out it is. A similar problem solved by one of the professors from Cornell University (CU) is now widely considered as the first step towards Artificial Intelligence. In 1958, Frank Rosenblatt from CU successfully demonstrated that a computer could separate cards marked on the left from cards marked on the right after 50 trials. Let us find out in the next section how exactly he did that.

What is a Perceptron?
- Mathematical Model of the Perceptron
What are Neural Networks?

What is a Perceptron?

Perceptron is one of the simplest binary classifiers; it separates two classes from each other by learning their features. For example, consider the famous Iris Dataset with features-widths and lengths of sepals and petals for three classes of flowers: Iris setosa, virginica, and versicolor. The dataset was collected by Dr. Edgar Anderson and contains 150 instances, each having four length values and a corresponding class of flowers with it.

neural networks types different neural networks

Image: Iris Flowers (left) and four parameters that form the features of Iris Dataset (right). Source: Freepik.com(left), Digital Image Processing Textbook ^[1]

To keep things simple, let us consider only two features- petal length (cm) and sepal length (cm) for two flowers Iris setosa and Iris versicolor. And if we plot these features on a graph, this is what it will look like:

a mostly complete chart of neural networks

Carefully observe the graph and note that we can easily separate the two flowers from each other based on the two characteristics. In other words, one can effortlessly draw a straight line between the two and set the threshold values for the two lengths for each flower. Perceptron solves this problem. It tries to come up with the required equation of a line. But how is that possible? We'll explore the answer to this now.

New Projects

Mathematical Model of the Perceptron

In essence, a perceptron takes in features of an instance (x = {x1, x2, x3, ..., xn}) from the dataset, multiplies each feature value by certain weights (w = {w1, w2, w3, ..., wn}) and adds a bias term (b) to it. This function, h(x), maps the input vector to the activation function's output. Look at the figure below that will help you understand this better.

different kinds of neural networks

The output of the function h(x) decides the instance belongs to which class. If the result is above zero, we say the instance x belongs to class A1. Otherwise, if the output is less than zero, it belongs to class A2. We can write this mathematically as,

$h\left ( x \right )= \sum^{n}_{i=1} w_ix_i+b= \begin{cases} >0, & \text{if }x\in A_1 \\ <0, & \text{if }x\in A_2 \end{cases}$

But, how does a perceptron learn these weights so that the instance is labeled with its correct class? We are now ready to answer this.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Algorithm: For simplicity, we consider the input features as a vector and 1 to it at the end so that the input to the activation function is written as y = {y1, y2, y3, ..., yn, 1}) and the weight vector becomes (w = {w1, w2, w3, ..., wn, b}). We can now write the function h as -

$h\left ( y\right )= \sum^{n}_{i=1} w_iy_i= \begin{cases} >0, & \text{if }y\in A_1 \\ <0, & \text{if }y\in A_2 \end{cases}$

Here, the vector y and w are called the augmented input vector and weight vector respectively. Using this notation, the algorithm of a perceptron can be written as:

Consider the weight vector w¹ with arbitrary values. The weights vector will now be updates using the following:

$\large \large 1) \ \text{If} \ y^{1}\in A_1 \ and \ h\left (x \right ) \ \leq 0, then \ w^{j+1}= w^{j} + \beta y^{j}$

$\large 2) \ \text {If} \ y^{1}\in A_2 \ and \ h\left (x \right ) \ \geq 0, then \ w^{j+1}= w^{j} -\beta y^{j}$

$\large 3) \ \text{Otherwise,} \ \ w^{j+1}= w^{j}$

where β> 0 represents a correction increment/the learning increment/the learning rate. The first two cases refer to the situation where the classes have been wrongly identified by the Perceptron. Thus, in this case, the weights will have to be updated. And, if the class has been identified correctly, the weights need not change as can be seen in the third step. These weights can then be used to plot the line that separates the two classes.

You may wonder how such a simple algorithm can give the correct answer always. Well, it cannot. The application of the Perceptron algorithm is limited to cases where the two classes can be separated linearly. That is, we only need to draw a line to separate the objects of two classes. And that’s the only case where this algorithm converges to give the correct weights.

Before we move on to the snippet of code that implements this algorithm, let us play a fun quiz.

Here's what valued users are saying about ProjectPro

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills in Data Engineering/Science and hoping to find real-world projects fortunately, I came across...

Ed Godalle

Director Data Analytics at EY / EY Tech

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data. In each learning path, there are many customized projects with all the details from the beginner to...

Jingwei Li

Graduate Research assistance at Stony Brook University

Not sure what you are looking for?

View All Projects

Question: What was Frank Rosenblatt working on that led to the birth of the idea of a Perceptron?

Studying the way neurons in a human brain transfer information
Studying the way, the fly decides in its eye that determines its path of flee
Studying the behavior of a cat towards red and blue balls
Studying the response of a prey fish to predators

CODE:
types of deep neural networks

The code is simple and easy to understand. Read the comments for a better explanation.

Test Yourself! Implement the above code on the two classes of Iris Dataset and classify them on the basis of sepal length and petal length. Also, don’t forget to use the weights to draw the line that separates the two classes on a graph.

We are now ready to move on to one of the most widely used algorithms, the Neural Networks. This algorithm is somewhat based on the Perceptron algorithm that we just finished learning. If all this was a bit rigorous for you, please go grab a snack and reward yourself for coming this far.

Recommended Reading

CNN vs RNN- Choose the Right Neural Network for Your Project

What are Neural Networks?

Definition of Neural Network: Neural Network, as the name suggests, is a network of neurons where each neuron behaves like a perceptron that we just finished discussing. The algorithm is based upon the operations of a biological neural system. It aims at recognizing the pattern between the input features and the expected output by minimizing the error between the predicted outcome and the actual output.

Let us look at the various neural network layers used for different purposes.

Neural Network Layer Types

The four most common types of neural network layers are-

Fully connected layer- In a fully connected layer, each neuron is connected to every neuron in the previous and next layers, allowing for complex relationships between input and output.
Convolutional layer- Convolutional layers use filters to extract spatial features from input data, enabling effective pattern recognition in images or sequential data.
Deconvolutional layer- Deconvolutional layers, also known as transposed convolutional layers, reverse the operation of a convolutional layer by upsampling the input to reconstruct the original image or data.
Recurrent layer- Recurrent layers incorporate memory units to process sequential data by capturing dependencies and interactions among previous inputs, making them suitable for tasks involving time series or natural language processing.

Neural Networks and Deep Learning

Deep Learning is a subfield of machine learning that consists of algorithms that mimic how a human brain function. And the basis of most such algorithms is the neural network (NN). The reason for its popularity is the large number of problems it has assisted in solving. From Face Recognition to Object Detection to Stock Prediction, NNs are at the heart of all such solutions. The applications of NN are no more limited to images or numbers. With the invention of exciting algorithm architectures like LSTM, GRU, neural networks have expanded their applications to Natural Language Processing problems. So, what lies in a neural network algorithm? Continue to find out.

Explore Categories

Data Science Projects in Python Data Science Projects in R Machine Learning Projects in Python Machine Learning Projects in R Deep Learning Projects Neural Network Projects Tensorflow Projects Keras Deep Learning Projects NLP Projects Pytorch Data Science Projects in Banking and Finance Data Science Projects in Retail & Ecommerce Data Science Projects in Entertainment & Media Data Science Projects in Telecommunications

How Do Neural Networks Work?

Let us begin with the most common way of visualizing a neural network architecture, as shown in figure 1.

Neural networks

A neural network takes a feature vector from the dataset as input, just like a perceptron. But unlike perceptron, this algorithm works for more than two classes. Thus, it can have more than two outputs. Let us understand this algorithm step by step.

The first step begins at the input layer (Fig. 1), where the neural network receives the feature vector, x = {x1, x2, x3, ..., xn} from the dataset. Each orange-colored circle of fig. 1 represents an element of this feature vector.
The next step involves connecting the input vector to all the neurons of next layer. Each neuron of this layer receives weighted sum of input vector-elements along with a bias term. Mathematically, this would mean:

$\large \text {Input at} \ K^{th} \ \text{neuron}=\sum_{i=1}^D w_k_i x_i + b_k$

The outcome is then passed through an activation function a(x) so that the output of each neuron is given by

$\large \text{Output of} \ K^{th} \ \text{neuron}, O_{k}= a\left ( \sum_{i=1}^{D} w_k_i x_i+ b_k\right )$

Some of the popular activation functions are listed below

1Image Source: Handbook of Neural Network Signal Processing ^[2]

What are neural networks

Repeat step 2 for all the hidden layers- layers that lie between the input and output layer. But, the key point to remember is that the activation functions need not be same for all the hidden layers. Thus, depending on the problem at hand, the output layer usually has different activation function as the neurons of output layer are responsible for labelling the feature vector to one of the expected classes.
The number of neurons in the output layer have to be same as the number of expected classes, each representing one class. The neuron that generates the highest value as an output identifies the class for the input feature vector.

Get FREE Access to Machine Learning Example Codes for Data Cleaning, Data Munging, and Data Visualization

Now that we have figured out how the output is evaluated , the remaining part of unraveling is how the network will learn the correct weights. For that, we first compute the error function using the output neurons given by

$\large E=\sum_{i=1}^{N} E_i$

E_i is the error for a single pattern vector: xâ‚™ and is defined as,

$\large E_i=\frac{1}{2}\sum_{j} \left ( o_j - z_j \right )^{2}$

j= 1, 2, 3, …, NÊŸ = number of different classes in the dataset; oj is the output value of the jáµ—Ê° neuron of the output layer, and zj is the desired response for the jth neuron of the output layer. But, this is not the only function that is in use today. There are a variety of options available, and you can explore them all here:

Once the error is evaluated at the output, it needs to be minimized. And that will only become possible when the whole network has learned the correct weights. The error is propagated back to the previous layers to ensure the network learns the correct weights. We can understand how this works by considering the application of the gradient descent algorithm. The weights will adjust in proportion to the partial derivative of the error function. That is,

$\large \Delta w_{jn}^{(i)}=-\alpha \frac{\partial E_i}{\partial w_{jn}^{(i)}}$

where α represents the learning parameter and the superscripts denote the layer whose parameters are being considered.

After performing the necessary algebra, we end up with the following algorithm:
For any two layers l and l-1, the weights that connects the two layers are modified using

$\large \Delta w_{jn}^{(i)}=-\alpha \delta _{j}^{(i)} o_{n}^{i-1}$

If j denotes the neuron of the output layer (l=L), the parameter δ is evaluated as

$\large \Delta \delta _{j}^{(L)}=[o_{j}^{(L)}-z_{j}] {a}' (h_{j}^{(L)})$

$\large h_{j}^{(L)}=\sum_{i=1}^{D}w_{ji}^{(L)}x_i+ b_{j}^{(L)}$

If j denotes the neuron of a hidden layer l and p represents a neuron of hidden layer l+1, the parameter δ is evaluated as

$\large \Delta \delta _{j}^{l}={a}'(h_{j}^{l})\sum_{p}\delta _{p}^{(l+1)}w_{jm}$

That's all. We are all set with the mathematics. Grab another snack to energize yourself for the next section.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Request a demo

Different Types of Neural Network Models

There are several types of neural networks available, some of which are-

Feedforward Neural Network
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Long Short-Term Memory (LSTM) Network
Gated Recurrent Unit (GRU) Network
Autoencoder
Generative Adversarial Network (GAN)
Radial Basis Function Network (RBFN)
Artificial Neural Network
Self-Organizing Map (SOM)

Let us explore some of these neural networks in further detail-

How do neural networks work

Artificial Neural Network: The neural network that we explained in the previous section is often referred to as Artificial Neural Network. We can thus easily skip this one as we have discussed it already,
Radial Basis Functional Neural Network (RBFNN): A special neural network class consisting of only three layers: input layer, hidden layer, and output layer. As is evident from the name, it utilizes Radial Basis Functions (RBFs) like gaussian, thin plate spline, multi-quadratic, etc., as an activation function for the hidden layers. It works like K-Means Clustering Algorithm. Thus, it is used in situations where the instances are not linearly seperable. The idea of using RBF is to transform the variables into a higher dimension where the instances of our dataset become linearly separable. Here is what the architecture of an RBFNN looks like:

Define neural networks

The training algorithm for an RBFNN is different from the ANN and requires a few more parameters other than learning increment for computation.

Convolutional Neural Network (CNN): As the name suggests, this neural network involves the convolution operation. This type of neural network has wide applications in Image Classification and Object Detection. It receives an image at the input and the features of the image are extracted through the convolution operation. The convolution operation is mathematically defined as:

$\large p\left ( x \right )=w * y=\sum_{a=-s}^{s}w(a)y(x-a)$

where y represents the input image vector, w represents the weights/filter/kernel, and s = (t-1)/2 where 1xt is the odd size of kernel.

â€‹ Neural networks and deep learning. Definition of neural networks

As an example, consider the following values for the input vector y = [2, 1, 2, 3, 4, 6, 8, 1] and w = [0,1, 0, 0, 0] .

Get More Practice, More Data Science and Machine Learning Projects, and More guidance.Fast-Track Your Career Transition with ProjectPro

Note that we have a problem if we start from the origin as we cannot define the operation there. And the solution for this is the padding operation which involves adding the number of zeroes to the input vector so that the convolution operation can be defined.

Thus, the output in this case for x = 0 would be

$\large \\=\sum_{a=-2}^{2}w(a)y(-a) \\ =w(-2)y(2)+w(-1)y(1)+...+w(2)y(2) \\ = 0+1+0+0+0=1$

2Architecture of LeNet-5. ^[3]

Neural networks analysis

Notice the input to the network is an image. There are multiple convolution layers denoted by C and subsampling layers, represented by S. The subsampling layers are simple layers that contract the size by using operations like average, maximum of the four elements, etc. This model, LeNet-5, was utilized by the authors to recognize the handwritten and machine-printed characters. There can be many more exciting applications like you can use it for identifying your favorite cartoon

Image Recognition

3Image source: seekpng.com

And if you don't get accurate results using LeNet-5, you may switch to more recent CNNs like AlexNet, VGG, Resnet, Inception, Xception, etc.

Recurrent Neural Networks (RNN): The word recurrent means "occurring often or repeatedly." The name suggests that there must be something like an operation happening many times or a repeated calculation. And that is indeed the case with RNN. In RNN, each output element is evaluated as a function of previous elements of the output. And, all the output elements are calculated by applying the same rule of updating the earlier outcomes. This is possible because layers of RNN are kind enough to allow weight-sharing. To understand this better, consider the figure below.

Computer neural networks

This figure sums up the basic idea of RNN. The input vector of specific dimensions is fed to the hidden layers, and the output is evaluated. However, there is also a circular arrow that points back at the input. This is referring to the fact that the output is being fed back to the network.

RNNs are used for processing sequential data. For example, in Natural Language Processing (NLP) applications, predicting the next word in a sentence keeping the sequence of words already entered in mind. We see Google Keyboard helping us with this every day.

So, if there are four words in a sentence and we want to predict the fifth word, we can use RNN. The network will unravel itself by producing four copies of its layers, one for each word. The terms are, of course, converted to vectors using embedding techniques like word2vec, one-hot encoding, etc. The network starts with evaluating the first word, x1 at time t=1. After that, the output s1 is assessed using an activation function. Next, at time t=2, the output is fed back to the input and even the second word of the sentence. Again, the outcome is evaluated using an activation function and so on. Notice the weight parameters are remain the same for all the calculations, thereby suggesting the recurrent behavior of RNN. Note that the recurrence is there with respect to time.

Neural network algorithms

After evaluating the final output, the loss function is evaluated, and the error is propagated back to update the weights. Many recent algorithms like Long Short Term Memory networks (LSTM),

Deep Learning and Neural Networks

Gated Recurrent Units (GRU), and attention-based models have RNNs as a part of their architecture.

Autoencoders: These are a special kind of neural network that consists of three main parts: encoder, code, and decoder. For these networks, the input is the same as that of the output. They compress the information received at the input into a lower-dimensional code, which they then use to rebuild the result. Both the encoder and decoder have an ANN-based architecture and are usually a mirror image of each other. The idea of using a code between an encoder and a decoder is to introduce a few changes in the input vector and still expect the same output. It might seem odd at first but imagine if you pass a random image at the input, then an autoencoder will be able to present you a picture without noise easily. They are thus widely used for anomaly detection, data denoising, and dimensionality reduction.

Access Data Science and Machine Learning Project Code Examples

Mastering Neural Networks through Hands-On Projects

Congratulations! You are now done with learning about one of the most famous algorithms used by Data Scientists. But, as they say, knowledge is incomplete without action, it is thus important that you explore relevant codes too which can guide you about how to apply Neural Network algorithms for solving real-world problems. Too lazy to google for Neural Network project ideas? Don’t worry, we’ve got you covered with some innovative Neural Network Project Ideas that will add great value to your data science or machine learning portfolio.

References

Gonzalez, R. C., & Woods, R. E. (2002). Digital image processing.
Hu, Y. H., & Hwang, J. (2002). Handbook of Neural Network Signal processing.
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE (p./pp. 2278--2324).

PREVIOUS

NEXT

About the Author

Manika

Manika Nagpal is a versatile professional with a strong background in both Physics and Data Science. As a Senior Analyst at ProjectPro, she leverages her expertise in data science and writing to create engaging and insightful blogs that help businesses and individuals stay up-to-date with the

Meet The Author