Thursday, June 20, 2013

Creating Better Visualizations aka Charts


I am no expert on visualizations so I will put the credits right away:
This blog is basically the notes that I took for Introduction to Data Science   by Bill Howe on http://coursera.org

We have all heard "A picture is worth a thousand words". Psycho-physics says that human visual system is the highest bandwidth channel to the human brain.  And so, a visualization is the most effective way to present information to human brain.

Process of creating visualization:
  • Determine which columns represent Nominal/Ordinal/Quantitative data
  • Review visual attributes and assign the perceptually most appropriate visual to each column
  • Create a visualization which represents as many as attributes as possible!

Data can be of following types:
  • Nominal
                this type of data basically acts as a label(name, color etc) and nothing else
                e.g "Type of fruit": Apple, Orange, Guava etc
  • Ordinal
                used to determine ranking, quality, grades etc
                e.g "Quality": A++, A+, A, A- 
                A++ is definitely better than A+ but it doesn't tell the magnitude of difference
  • Quantitative
    • Interval
                        similar to ordinal but also gives a measure of difference (date, latitude,longitude)
                        -15th of a month is 5 days later than 10th of same month
    •    Ratio
                       physical measurements (length, mass etc)
                       all types of statistical inferences can be drawn from such data types

Which visual attributes can be used for what type of data:

Visual Attribute
   for Nominal 

    for Ordinal
    for Quantitative
Position

Size

Value

Texture

x
Color

x
x
Orientation

x
x
Shape

x
x

 ➨  recommended
 ⇒ not recommended
 x   prohibited

Perception of various visual attributes by us in descending order:

Position         (Most Accurate)
Length
Angle/Slope
Area
Volume
Color/Texture (Least Accurate)

Some examples of bad visualizations:
 


The top row of this chart represents quantitative data in colors (which our table above prohibits). It makes sense to represent quantitative data through position (as in the line chart) because quantitative data is precise and we are most accurate in perceiving position.  It's very difficult to infer magnitude from color.



The visualization above is wrong because it represents nominal data through position which implies some sort of ordering in the data when there is none. Perhaps a better visualization would be:
Car







Putting it all together(we will try to show 7 attributes of car in a single visualization):


No comments:

Post a Comment