Introduction to Data Cleaning with OpenRefine
Meg Miller - GIS & Data Visualization Librarian
slides: bit.ly/uml_openrefine
Outline
- Software overview
- Data cleaning defined
- Benefits
- Hands-on Exercise
In short:
Open tool
Runs in web browser
Can be as complicated as you want
For data cleaning
Improving the overall quality of your data.
Main categories:
Resolving inconsistencies
Formatting
Null values
Resolving Inconsistencies:
Variant spelling, inconsistent case, duplication...
Formatting:
Unit of measurement
Dates, text, numbers
Columns/ rows
Format migration
Benefits:
Efficient research
Easier conversion
Secondary use
Hands-on: Cleaning some messy data