Introduction to Data Cleaning with OpenRefine



Meg Miller - GIS & Data Visualization Librarian


slides: bit.ly/uml_openrefine

Outline



  1. Software overview
  2. Data cleaning defined
  3. Benefits
  4. Hands-on Exercise

What is OpenRefine


In short:


Open tool

Runs in web browser

Can be as complicated as you want

For data cleaning

What is data cleaning


Improving the overall quality of your data.



points, lines and areas

Main categories:


Resolving inconsistencies

Formatting

Null values

Resolving Inconsistencies:


points, lines and areas

Variant spelling, inconsistent case, duplication...

Formatting:


Unit of measurement

Dates, text, numbers

Columns/ rows

Format migration

Benefits:


Efficient research

Easier conversion

Secondary use

Hands-on: Cleaning some messy data


Workshop content

Questions



meg.miller@umanitoba.ca

slides: bit.ly/uml_openrefine