Over 70% of the work you will do as a Data Scientist on any Data Science or Statistics project is cleaning your data and manipulating it to make it ready for modelling and analysis.
What is Data Cleaning ?
At the start of a data science project, you will inherit multiple data-sets from different teams. You will then be asked to solve for a specific business problem. Your solution may not need all the data you got - you might have to remove columns, modify columns, remove duplicate values, deal with missing values, deal with outlier data etc. Sometimes you will also need to normalize or scale data to make the data fit within a range.
This critical time consuming step is data cleaning or data cleansing. As part of the data cleansing process you will also perform EDA (Exploratory Data Analysis) - here you will visualise the data using graphs and statistical functions to understand the underlying data - mean, median, range, distribution etc.
This Wikipedia explanation gives a good overview of Data Cleaning from a more generic perspective.