1: Autofit Columns and Rows
Begin data cleaning by auto-fitting columns and rows to make the data readable. This simple action ensures that all your data is visible, addressing any issues of data being cramped or cut off.
To make your data properly visible, select the whole worksheet by clicking this
triangle:
data:image/s3,"s3://crabby-images/88342/883424a883cffe23f3a4f76d3a572e63f8f65b1d" alt="Select whole worksheet"
Next, bring your cursor between any 2 column labels within your
dataset until it's converted into the double-headed arrow > double-click.
data:image/s3,"s3://crabby-images/eda21/eda211cf5613e5221ca95c1d5226739da48a1f41" alt="Double headed arrow cursor to autofit columns"
The same can be achieved using the keyboard shortcut CTRL+A to select the data and ALT,H,O,I to auto-fit column width.
Similarly, you can repeat the process to autofit the rows.
data:image/s3,"s3://crabby-images/a5c77/a5c77ab19bfb0e94a478eae0f86dde18425a5bb1" alt="autofit rows"
The keyboard shortcut for data auto-fitting row height is ALT,H,O,A.
While it's a quick and easy fix, it ensures that no data is overlooked. Once all your data is visible, move on to the next step.
2: Identify and Remove Duplicates
Duplicates are rarely intentional. You can use conditional formatting to highlight duplicates in your data, and then utilize the 'Remove Duplicates' feature under the Data tab to eliminate them, ensuring each row remains unique.
For example, in my dataset, I can quickly identify duplicates
in the ID column by going to the Home tab > Conditional Formatting > Highlight Cells Rules > Duplicate values.
data:image/s3,"s3://crabby-images/fff44/fff445ea56ddbc5d9c4ab698c20188542cf4cff7" alt="Conditional formatting to highlight duplicates"
This opens the Duplicate Values dialog box, allowing you to choose the highlight format.
data:image/s3,"s3://crabby-images/73f70/73f705640776a5f734d2602f78f04fe32ae87325" alt="Set custom format for duplicate values"
I'll go with the default settings but notice you can choose from these options or create your own custom format too.
Now I can see which rows have duplicate IDs at a glance.
data:image/s3,"s3://crabby-images/08c2c/08c2c994af76eda2c36452e0b121a4365861e326" alt="Custom formatting showing duplicate rows"
I can also easily remove the duplicates, and eliminate errors from my reports.
To do that, go to the
'Data' tab > 'Remove Duplicates':
data:image/s3,"s3://crabby-images/b8cce/b8cce8ce3ad30086789f4d4d7b6e58dd1030ba09" alt="Remove duplicates"
This will open the Remove Duplicates
dialog box which lets us choose the columns to compare.
In this case, I want to check for duplicate rows, so I will select all the columns in my table:
data:image/s3,"s3://crabby-images/7b00c/7b00c8790e4ccda1fcc7d8fca6e11cb506083d18" alt="Select all columns in table"
And you can see Excel does the heavy lifting by deleting all the duplicate entries, ensuring each row is unique.
data:image/s3,"s3://crabby-images/8a064/8a064e42ebaaa16fc979d37fae55963755adb82d" alt="Table with duplicates removed"
3: Trim Extra Spaces
Extra spaces can lead to inconsistencies. While some unwanted spaces are obvious and easy to find, others are not always visible.
For example, the extra spaces between Mike and Tyson, Anna and Belle and Peter and Parker are easily
visible however, the space in front of Mary Jane is not.
data:image/s3,"s3://crabby-images/3b2c0/3b2c06bd6da20e6a4322ce2600028b71c31cd82e" alt="Extra spaces highlighted in text"
Instead of spotting and removing these one by one, it's best to employ the TRIM function in an empty column beside your data.
For example, I will use the formula TRIM(C3) in cell I3:
data:image/s3,"s3://crabby-images/61018/61018c24203c6fc0c575656580fe794c403d10a0" alt="Use TRIM function to remove
extra spaces"
And then, copy it down to fix all the rows:
data:image/s3,"s3://crabby-images/07511/0751155283c919aec16a3bca258289135f5fa0e4" alt="Copy formula down to fix all rows"
Once I have trimmed the data, I can copy and paste it to the original column 'as values' via the Paste dropdown > Paste Values > Values:
data:image/s3,"s3://crabby-images/45b33/45b3310b46b1f165caad83f1c6586c4c7c4c31c1" alt=""
In Microsoft 365, you can also use the new keyboard shortcut CTRL+SHIFT+V to paste as values, while in older versions you can use ALT,E,S,V.
4: Eliminate
Blank Cells
Blank cells can interrupt data analysis. To avoid that, you can utilize the 'Go To Special' option under the Find & Select menu to quickly select and fill blank cells, maintaining data uniformity.
For example, some cells in my Region column are empty:
data:image/s3,"s3://crabby-images/2a220/2a2201a56867f4c746f9eaa20227eb9cd7388eb2" alt="Empty cells in column"
I will first select my data > Go to the 'Find & Select' menu > Select 'Go To Special':
data:image/s3,"s3://crabby-images/ada31/ada31aa8c637501d7f3420319f51b11011c0cc6a" alt="Go To Special"
This will open the 'Go To Special' dialog box in which I need to choose 'Blanks':
data:image/s3,"s3://crabby-images/21e9a/21e9a55acced28b4837300b63d18499591dda756" alt="Go To Special Dialog Box"
And within seconds, all the blank cells in the dataset get selected:
data:image/s3,"s3://crabby-images/bf620/bf620308b5a7d295fbf23c05d72f2c93b3bfe9ff" alt="Blank cells are selected"
I can simply fill a placeholder value as 'TBA' in all these cells.
To do that, I can type TBA in cell D8 and then press CTRL+ENTER to enter it in all selected cells in one go.
data:image/s3,"s3://crabby-images/70031/700313f057908c0c5f39ddd6cb034b408f081f20" alt="Enter value in all blank cells at once"
Alternatively, if I want, I can also copy the value from the cell above.
To do that, with the three empty cells selected, Type = in cell D8 > Press the up arrow > Press CTRL+ENTER.
This will fill the region of the cell above each of these blank cells, respectively.
data:image/s3,"s3://crabby-images/91289/9128947fd4d7c9f41e7010296fb2da83109951e2" alt="Copy value from cell above into blank cell"
5: Spell Check
Always run a spell check to correct typos and misspellings, a crucial step
for maintaining professionalism in your reports.
For example, in my dataset, I should run a spellcheck on columns D through F, excluding the name and numeric columns.
To do that, I will select these columns > Go to the 'Review' tab > Click 'Spelling'. Alternatively, I can also use the keyboard shortcut F7.
This will open the Spelling dialog box which goes through each word in the selected cells to find spelling errors.
data:image/s3,"s3://crabby-images/1e529/1e52975dc8b8826a83ce461102e611b95b263fcc" alt="Press F7 to open spell check"
Here, it's found a spelling error in the word 'Excelent', and suggests the correct spelling 'Excellent'.
We can either ignore it or change it. It's better to change it and by clicking Change All, we ensure that any other occurrence of this misspelling is also rectified.
We can also
select the 'Add to Dictionary' option for proper nouns such as Asgard that are not recognized English words if this is a common name we'll be using in other files.
data:image/s3,"s3://crabby-images/f4f7e/f4f7e0c1779eab98ef04cb7a8ae8a14bc51cfabc" alt="Add word to dictionary"
And we're good to go!
data:image/s3,"s3://crabby-images/bd269/bd269a06f00b6c8e50da1748924833263565eb9e" alt="Spell check complete"
6: Data Validation
While knowing data cleaning techniques is recommended, knowing how to prevent future errors is another skill to have in your Excel quiver.
Setting up data validation rules, like creating drop-down lists, is one of the top ways of preventing future data cleaning, as it enforces data integrity from the start.
For example, in my dataset, I can set up a dropdown list for the region column.
To do that, I'll select the region column > Go to the Data tab > Select Data Validation:
data:image/s3,"s3://crabby-images/3079d/3079d1b0ee5681b9b1ddb41806d49e3508466286" alt="Data Validation in Ribbon"
From the 'Allow' dropdown, I will select 'List':
data:image/s3,"s3://crabby-images/12f03/12f03f4dbca01369f27a87cdc3ca6a466cfe2254" alt="Choose List type of data validation"
For the source, I can either reference a range of cells that contain the values, or I can just type them in with a comma separator.
I'll type in North, South, East, West, Asgard:
data:image/s3,"s3://crabby-images/5129c/5129c2ebfc882eb6cdface16ae4f630d4678141a" alt="Enter source data for data validation list"
Now, whenever I will add new data, I don't need to type in the region, I can simply
choose from the dropdown list:
data:image/s3,"s3://crabby-images/33032/33032c4adad2559a45417b1a7306222784d582df" alt="Choose data from data validation list"
And if I enter something
not in the list, I get an error, preventing unwanted values and future data cleaning:
data:image/s3,"s3://crabby-images/392f2/392f2e77837d7a77bb100f47f5a154b80ab74bf6" alt="Error generated if incorrect value entered"
7: Use Tables
Another preventive data cleaning technique is using Excel tables.
Storing data in an Excel Table format can make data easier to manage, format, and analyze.
But before you can insert a table, make sure your data is in a tabular layout, and each column contains the same type of data.
In our example,
- We have columns for the date, ID, name, region, and so on
- Each row represents a unique record
- Column headers are also in a single row, i.e. they're not split across multiple rows leading to merged cells.
data:image/s3,"s3://crabby-images/b8c5e/b8c5ebbd578e98be6c97b73a4c56dd781b7b7bf7" alt="Tabular data layout"
To insert a table, use the keyboard shortcut CTRL+T. This will open the Create Table dialog box:
data:image/s3,"s3://crabby-images/a3795/a379579578252b9ce97d452e14b9eed29fa50030" alt="Insert Table"
As my table has headers, I will check the box and click OK.
On inserting a table, you can
instantly see the data is a lot easier to work with.
data:image/s3,"s3://crabby-images/2b68a/2b68adb0e46831db19325ee6a257723078e4102c" alt="Formatted table created"
- The headers are
clearly formatted to differentiate them from the data
- The rows are banded to allow you to easily glance across a row, which is super important in wide tables.
- On scrolling down, the headers automatically pin to the top of the sheet:
data:image/s3,"s3://crabby-images/80c69/80c69f5f77ac6bc9f95e5620adde816fe91755ee" alt="Table headers pinned at top as sheet scrolls"
- Each column has a filter button, allowing you to easily sort or filter your data:
data:image/s3,"s3://crabby-images/64f98/64f98ad836f9651aebafa7c892e990f43f0921d6" alt="Sort and filter table column"
- You can also choose from different styles on the Table Design contextual tab > Table Styles:
data:image/s3,"s3://crabby-images/269d4/269d47e89e1ebaa4a57ae86face49aebf60d450d" alt="Choose table style"
- And, you can add a total row and change the aggregation method as required:
data:image/s3,"s3://crabby-images/26bf9/26bf93cdd3fb5be8726d1179c18d3e5a2e57efbb" alt="Add total row to table with desired aggregation"
8: Handle Errors with IFERROR
Another great data cleaning technique involves using the IFERROR function to neatly manage and display errors in your dataset, preventing unsightly error messages from cluttering your reports.
For example, let's add a column in our dataset to calculate the Sales amount by multiplying Quantity and Price per Unit.
data:image/s3,"s3://crabby-images/1ca98/1ca9820d612ff7059844108cb14c366a7e74167e" alt="Add column to table"
Notice the formula refers to the columns by name instead of G2*H2? These are called table
structured references and Excel inserts them automatically when your data is formatted in an Excel table, and they make writing formulas super easy.
Also, when I press enter, the table copies the formula down the column for me, I don't even need to drag it down.
But wherever the price per unit is missing, the formula gives the #VALUE error.
Instead of displaying the error, we can return something else by wrapping the formula in the following IFERROR function:
=IFERROR( [@Quantity]*[@[Price Per Unit]], "Missing")
data:image/s3,"s3://crabby-images/b2e86/b2e862681a479e1ba42d2bbc4ce1a6d80bded8dd" alt="Use
IFERROR function to display useful message"
This will return 'Missing' in the rows with missing prices, instead of those ugly #VALUE errors, making your data look error-free.
Alternatively, if you don't want to mix data
types in a column (which is never a good idea in your source data), you can use the below formula that returns a blank value instead of 'Missing.'
=IFERROR( [@Quantity]*[@[Price Per Unit]] , '')
data:image/s3,"s3://crabby-images/6fe29/6fe297735a3ad45f9fc80c805bb9cbcb528eec0c" alt="Use IFERROR function to return blank
value"
Keep in mind that you can return anything in place of an error, even another formula.
9: Number Formats
While number formats are great for presenting your data, using
plain number formats during the analysis phase avoids unnecessary clutter and complexity in your dataset.
Here are the data cleaning steps for using the right number formats:
- Use 'General' or 'Number' formats, instead of 'Currency' or 'Accounting' formats during the analysis phase
- Always
use comma separators while working with large numbers
- Don't use the comma icon from the ribbon as it converts the number into the 'Accounting' format
data:image/s3,"s3://crabby-images/2102f/2102fd7fa29629897acc05095be02a85866837ce" alt="Don't use comma icon from Ribbon to set number format"
- Instead, Open the Format Cells dialog box using the shortcut key CTRL+1 > Select Number Format > Check the box for 'Use 1000 Separator'.
data:image/s3,"s3://crabby-images/ba686/ba6861aac02d6c2e79a337651f9efb24258926b9" alt="Open the Format Cells dialog and choose use 1000 separator"
- Also, if you have dates, format them to 'Short Date' if you don't need the time component.
data:image/s3,"s3://crabby-images/da8c6/da8c652959a7ede9df4502dda6845ff75b74769c" alt="Format dates to short date if time not needed"
10: Find & Replace
One of the most useful data cleaning techniques leverages the Find & Replace tool for bulk corrections
across your dataset to maintain data consistency.
For example, in our dataset, we can replace 'inf', because as we've seen with the IFERROR example, text in a numeric column can cause problems.
To bulk replace 'inf', go to the 'Home' tab > Find and Select > Replace:
data:image/s3,"s3://crabby-images/581f8/581f83831d0072e13238114a4d2cea8641045ba3" alt="Find and Select, Replace"
Or use the keyboard shortcut, CTRL+H.
This will open the Find and Replace Dialog
box.
Here, enter 'inf' in the 'Find What' field and the value you want to replace it with in the 'Replace With' field.
I'll leave 'Replace With' blank as I want to have an empty cell.
data:image/s3,"s3://crabby-images/917dc/917dc200a038a54ada771729748cb0129eef8e02" alt="Replace values with an empty cells"
With Options expanded, notice you can also replace formats and customize where and how the Find and Replace searches the value you want to replace.
Once you've selected the desired options, click 'Replace All', and you're good to go!
data:image/s3,"s3://crabby-images/0dcf9/0dcf9906657178d4719bbec519d93d399f121fb3" alt="Replace all values"
11: Remove
Gridlines
Finally, if you want to present your data in a report, removing gridlines can enhance the visual clarity, making your data stand out. The report below still has the gridlines detracting from the data:
data:image/s3,"s3://crabby-images/661c2/661c2f8022b5ad6df3f79eacf4ff137b6497f0e1" alt="Report with gridlines showing on
sheet"
We can remove gridlines easily via the View tab > Uncheck 'Gridlines':
data:image/s3,"s3://crabby-images/82071/82071249dc3bc0d98d7e37ab7e33d1086a96028c" alt="Uncheck
Gridlines on Ribbon to remove them"
And now our report looks clean and professional:
data:image/s3,"s3://crabby-images/d49c3/d49c332ef79a7c9ec97161d8864bdb0eaf8a3491" alt="Report
without gridlines looks better"
By following these steps, you can transform a dataset into a clean, efficient, and analysis-ready format. Remember, data cleaning is not just about aesthetics, it's about ensuring the accuracy and reliability of your data analysis processes.
Next Step : Automate Data Cleaning
These techniques are great for one-off data cleaning tasks, but if you find yourself repeating these steps each week or month before you can complete your analysis or reports, then check out the video below next on how you can automate data cleaning in Excel and get updates to your data with a single click.