MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to use auto encoder for unsupervised learning models?

# How to use auto encoder for unsupervised learning models?

This Pytorch recipe trains an autoencoder neural net by compressing the MNIST handwritten digits dataset to only 3 features.

This recipe builds an autoencoder for compressing the number of features in the MNIST handwritten digits dataset.

For building an autoencoder, three components are used in this recipe :

- an encoding function,

- a decoding function,

- a loss function between the amount of information loss between the compressed representation of your data and the decompressed representation.

**What is an Auto encoder ?**

An auto encoder is used to encode features so that it takes up much less storage space but effectively represents the same data. It is a type of neural network that learns efficient data codings in an unsupervised way. The aim of an autoencoder is to learn a representation for a dataset, for dimensionality reduction, by ignoring signal "noise".

**What is Unsupervised learning model ?**

Unsupervised machine learning models infer patterns from a dataset without reference to known, or labeled, outcomes. Unlike supervised machine learning, unsupervised machine learning methods cannot be directly applied to a problem because you have no idea what the values for the output data might be, making it impossible for you to train the algorithm the way you normally would. Unsupervised learning can instead be used for discovering the underlying structure of the data.

In [103]:

```
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import numpy as np
%matplotlib inline
```

In [104]:

```
torch.manual_seed(1) # reproducible
```

Out[104]:

In [105]:

```
# Hyper Parameters
EPOCH = 10
BATCH_SIZE = 64
LR = 0.005 # learning rate
DOWNLOAD_MNIST = False
N_TEST_IMG = 5
```

In [106]:

```
# Mnist digits dataset
train_data = torchvision.datasets.MNIST(
root='./mnist/',
train=True,
# this is training data
transform=torchvision.transforms.ToTensor(),
# Converts a PIL.Image or numpy.ndarray to
# torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
download=DOWNLOAD_MNIST,
# download it if you don't have it
)
```

In [107]:

```
# plot one example
print(train_data.train_data.size()) # (60000, 28, 28)
print(train_data.train_labels.size()) # (60000)
plt.imshow(train_data.train_data[2].numpy(), cmap='gray')
plt.title('%i' % train_data.train_labels[2])
plt.show()
```

In [108]:

```
# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
```

In [109]:

```
class AutoEncoder(nn.Module):
def __init__(self):
super(AutoEncoder, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(28*28, 128),
nn.Tanh(),
nn.Linear(128, 64),
nn.Tanh(),
nn.Linear(64, 12),
nn.Tanh(),
nn.Linear(12, 3), # compress to 3 features which can be visualized in plt
)
self.decoder = nn.Sequential(
nn.Linear(3, 12),
nn.Tanh(),
nn.Linear(12, 64),
nn.Tanh(),
nn.Linear(64, 128),
nn.Tanh(),
nn.Linear(128, 28*28),
nn.Sigmoid(), # compress to a range (0, 1)
)
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return encoded, decoded
```

In [110]:

```
autoencoder = AutoEncoder()
print(autoencoder)
optimizer = torch.optim.Adam(autoencoder.parameters(), lr=LR)
loss_func = nn.MSELoss()
# original data (first row) for viewing
view_data = Variable(train_data.train_data[:N_TEST_IMG].view(-1, 28*28).type(torch.FloatTensor)/255.)
```

In [111]:

```
for epoch in range(EPOCH):
for step, (x, y) in enumerate(train_loader):
b_x = Variable(x.view(-1, 28*28)) # batch x, shape (batch, 28*28)
b_y = Variable(x.view(-1, 28*28)) # batch y, shape (batch, 28*28)
b_label = Variable(y) # batch label
encoded, decoded = autoencoder(b_x)
loss = loss_func(decoded, b_y) # mean square error
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
if step % 500 == 0 and epoch in [0, 5, EPOCH-1]:
print('Epoch: ', epoch, '| train loss: %.4f' % loss.data[0])
# plotting decoded image (second row)
_, decoded_data = autoencoder(view_data)
# initialize figure
f, a = plt.subplots(2, N_TEST_IMG, figsize=(5, 2))
for i in range(N_TEST_IMG):
a[0][i].imshow(np.reshape(view_data.data.numpy()[i],
(28, 28)), cmap='gray');
a[0][i].set_xticks(()); a[0][i].set_yticks(())
for i in range(N_TEST_IMG):
a[1][i].clear()
a[1][i].imshow(np.reshape(decoded_data.data.numpy()[i],
(28, 28)), cmap='gray')
a[1][i].set_xticks(()); a[1][i].set_yticks(())
plt.show(); #plt.pause(0.05)
```

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.