How to build text preprocessing pipelines with Dask?

This recipe helps you build text preprocessing pipelines with Dask

Recipe Objective.

How to build text preprocessing pipelines with Dask?

`dask_ml.preprocessing` have same styled transformers of **scikit-learn** that we can use in Pipelines to perform different types of data transformations as the part of the model fitting process. These transformers works very nicely on dask collections (`dask.array, dask.dataframe`), NumPy arrays, or pandas dataframes.

Step 1- Importing Libraries.

!apt install dask_ml from dask_ml.preprocessing import Categorizer, OneHotEncoder from sklearn.linear_model import LogisticRegression from sklearn.pipeline import make_pipeline import pandas as pd import dask.dataframe as dd

Step 2- Creating a DataFrame.

We will create a dataframe and then divide it to x and y to fit them in the pipeline.

df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": ["a", "b", "c", "d", "e", "f"]}) x = dd.from_pandas(df, npartitions=2) y = dd.from_pandas(pd.Series([0, 1, 1, 0]), npartitions=2)

Step 3- Creating a pipeline.

We will create a pipeline in which we process the data through Categorizer, OneHotEncoder, LogisticRegression.

pipe = make_pipeline( Categorizer(), OneHotEncoder(), LogisticRegression(solver='lbfgs') ) pipe.fit(x, y) ``` Pipeline(steps=[('categorizer', Categorizer()), ('onehotencoder', OneHotEncoder()), ('logisticregression', LogisticRegression())]) ```

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Many-to-One LSTM for Sentiment Analysis and Text Generation
In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

Build a Logistic Regression Model in Python from Scratch
Regression project to implement logistic regression in python from scratch on streaming app data.

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Deep Learning Project for Text Detection in Images using Python
CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Linear Regression Model Project in Python for Beginners Part 2
Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Detectron2 Object Detection and Segmentation Example Python
Object Detection using Detectron2 - Build a Dectectron2 model to detect the zones and inhibitions in antibiogram images.

Build a Graph Based Recommendation System in Python -Part 1
Python Recommender Systems Project - Learn to build a graph based recommendation system in eCommerce to recommend products.