How to create a word cloud in Python?

This recipe helps you understand create a word cloud in Python
Last Updated: 19 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

How to create a Wordcloud in Python

In this tutorial, let us understand how to generate wordclouds in python! Yes you read it right. Wordclouds!

Wordcloud is basically a visualization technique to represent the frequency of words in a text where the size of the word represents its frequency.

In order to work with wordclouds in python, we will first have to install a few libraries using pip. They are numpy (for array manipulation), pillow (for image handling), matplotlib (for generating plots) and finally wordcloud (for generating wordclouds).


    pip install numpy
    pip install pillow
    pip install matplotlib
    pip install wordcloud

In this tutorial, we will be using the Restaurant reviews data from kaggle. The dataset can be found here.

To begin with, we will firstly have to import the necessary libraries.


    # Importing Libraries
    import pandas as pd
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud, STOPWORDS

Now, let us read the data as a dataframe using pandas


    # Importing Dataset
    df = pd.read_csv("Restaurant_Reviews.tsv", sep="	")

The data contains two columns namely 'Review' and 'Liked'. The 'Review' column is the actual review written by the customer and the 'Liked' column is a binary variable which states whether or not the customer liked the food.

The next step is to generate a text variable which contains all the reviews combined as a single string. This can be done using the join function available in python.


    #Creating the text variable
    text = " ".join(cat for cat in df.Review)

Now, we have reached the important step of creating the wordcloud! We can generate the wordcloud in the following manner.


    # Generate word cloud
    word_cloud = WordCloud(
        width=3000,
        height=2000,
        random_state=1,
        background_color="salmon",
        colormap="Pastel1",
        collocations=False,
        stopwords=STOPWORDS,
        ).generate(text

The WordCloud function provides a lot of parameters that we can tweak according to our desire. Let us understand a few of them.

width/height : To adjust the height and width of the wordcloud
random_state : To recreate the same plot every time we run the function. The random_state parameter has to be an integer value.
background_color : To set a background_color. The default value for this parameter is 'black'. This page displays a list of colours that can be used.
colormap : To set up the color theme for the words. This link provides a list of colormaps that can be used. The default value is 'viridis'
collocations : To include bigrams of two words when set to True. The default value is True
stopwords : To set the list of words that needs to be eliminated. This list can include trivial words like this, that, is, was, the, etc. If this parameter is set to None, then function will consider a built-in list of STOPWORDS
max_font_size : To set the maximum font size of the largest word.
normalize_plurals : To keep or remove the trailing 's' from the words

Now comes the last step where we plot the generated wordcloud using the imshow() function of matplotlib


    # Display the generated Word Cloud
    plt.imshow(word_cloud)
    plt.axis("off")
    plt.show()

Complete Code

Output

Let us make this a little more interesting! So far, we saw how to generate the wordcloud on a plain canvas. What if I say that you can create these word clouds in different shapes? Sounds interesting, Isn,t it? Well that is not difficult at all. All we have to do is pick an image of the shape that you would like the wordcloud to be. For example, I have considered the following image.

Let us make use of the pillow and numpy libraries to read this image and store this in a variable called mask.


    # Importing Libraries
    import pandas as pd
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud, STOPWORDS
    from PIL import Image
    import numpy as np

    # Import image to np.array
    mask = np.array(Image.open('comment.png'))

We can pass this mask as a parameter to the wordcloud function like so.


    # Generate word cloud
    word_cloud2 = WordCloud(
        width=3000,
        height=2000,
        random_state=123,
        background_color="purple",
        colormap="Set2",
        collocations=False,
        stopwords=STOPWORDS,
        mask=mask
    ).generate(text)

Complete code

Output

Ta-da! There you go! Hope you enjoyed reading this tutorial as much as I did while writing :)

Download Materials

Restaurant_Reviews

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

NLP Project for Multi Class Text Classification using BERT Model

In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

View Project Details

CycleGAN Implementation for Image-To-Image Translation

In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

View Project Details

Tensorflow Transfer Learning Model for Image Classification

Image Classification Project - Build an Image Classification Model on a Dataset of T-Shirt Images for Binary Classification

View Project Details

PyCaret Project to Build and Deploy an ML App using Streamlit

In this PyCaret Project, you will build a customer segmentation model with PyCaret and deploy the machine learning application using Streamlit.

View Project Details

Deploying Machine Learning Models with Flask for Beginners

In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

View Project Details

Multi-Class Text Classification with Deep Learning using BERT

In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

View Project Details

Locality Sensitive Hashing Python Code for Look-Alike Modelling

In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

View Project Details

Credit Card Fraud Detection as a Classification Problem

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

View Project Details

Model Deployment on GCP using Streamlit for Resume Parsing

Perform model deployment on GCP for resume parsing model using Streamlit App.

View Project Details

Many-to-One LSTM for Sentiment Analysis and Text Generation

In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

View Project Details

How to create a word cloud in Python?