Class Exercise – Difference between Infographic and Data Visualization

I was thinking about the NM3229 course and realized that the course primarily taught us how to design data visualizations and infographics. Before this course, I was under the impression that these 2 things were one and the same. However, through this module, I have realized that there is a fundamental difference between these 2 concepts. I suddenly thought of the class exercise where we had to discuss the differences between these 2 concepts and thought that I should blog about it since one of my primary takeaways from the course has been to learn the difference between infographic design and data visualization design.

Data visualizations, as the name suggests, are simply visual representations of data. They includes graphs, charts, maps, pictures etc. When someone looks at a data visualization, he/she is left to draw the relevant conclusions. The visualization simply depicts the data graphically and in an understandable fashion. The task of deriving relevant information and conclusions from the visualization is left to the person looking at the visualization.

Infographics, on the other hand, contain multiple visualizations, text etc to convey specific information to a reader.  Infographics tell a story to a reader. By looking at an infographic, a reader can draw very clear conclusions about the data. Infographics usually consist of multiple data visualizations to clearly depict information.

 

 

Assignment 3 – Comparison of Data Visualization Tools

The final individual assignment of NM3229 was to get familiar with data visualization tools available and do a comparison among these tools. To explore various visualization tools effectively, data from US foods nutrient database (available at US Nutrient Database) was provided. We were told to come up with a motivating question about the data that could be answered with our visualization made in various data visualization tools. The data provided was HUGE. It was a challenging task to come up with an effective and relevant motivating question pertaining to the provided data and then to devise a visualization that could easily answer the motivating question. From all the 3 assignments, it is evident that the ideation stage is the most challenging stage of the design process. I spent some time thinking of relevant motivating questions about the provided data and came up with the following ideas:

  • Which energy drink provides the most energy to the drinker?
  • Which food category has the highest nutritional value?
  • What is the sugar content of popular desserts?
  • Which type of cheese is the most healthy ?
  • How do different fast food chains compare in terms of nutritional content of food?

I felt that all these questions had interesting answers and could be described using relevant visualizations. I ultimately narrowed down on the following question as it required a more specific data set and was a unique idea that I personally was interested in:

Which type of cheese is the most healthy (comparison based on sugar and fat content)?

Now that my motivating question was fixed, I had to narrow down on 3 data visualization tools that I would use to find answers to my motivating question. The 3 tools I did a comparison of were:

  1. Google Fusion Tables
  2. Tableau Public
  3. Microsoft Excel

I chose these tools for the following reasons:

  • Google Fusion Tables – Google is truly one of the greatest technology companies and produces some of the most innovative and useful products. As most Google products are top class, I wanted to use Google’s data visualization software and see how it fared against competitors.
  • Tableau Public – I had heard about Tableau from a friend who had used the tool before. It sounded like a useful and easy to use tool. Hence, I wanted to try it out and explore it for myself.
  • Microsoft Excel – I have used Excel extensively for school projects and internships. I am comfortable with using it and hence wanted to see how it performed as a data visualization tool.

After deciding my motivating question and choosing my data visualization tools, I began my comparison of the tools. The upcoming part of this blog post covers some of my findings and thoughts about the 3 data visualization tools.

Google Fusion Tables

Google fusion tables is a web based data visualization application aimed to gather and visualize large amounts of data. Though it is currently an experimental app run by Google, it had several features to make cool visualizations.

Fusion tables has a nice user interface. However, the data sources that you can connect to is limited. Common data sources such as excel worksheets and text files can be connected to easily. It is also possible to import data from a Google spreadsheet.

Since my data was in an excel worksheet, I was easily able to import it into the fusion tables.  After choosing attributes such as the table name and date of creation, the imported data is arranged as rows / cards. Like a Google document, fusion tables gives you the flexibility to filter your data based on various fields. For example, I could filter based my data based on type of cheese or fat content / sugar content.

Fusion tables also lets you summarize data on various parameters such as number of occurrences of a particular value of a particular field. You could also summarize the data by choosing to display the average, maximum and minimum values of the data.
To depict data graphically, there are several types of charts which can be utilized. I created the following visualization for my data:

gf

I could pick which data value I wanted represented in the chart (fat / sugar). I could also pick the maximum number of items I would like to display on the chart. The appearance of the chart could also be changed easily using the ‘change appearance’ tab. Using this feature, fonts, font sizes, axis titles and chart legends could be easily edited.

Tableau Public

Tableau public is a very useful visualization tool. It has an easy to use user interface and has some very useful features. I particularly liked the following points about the Tableau:

  • It is very easy to import data from external sources.
  • After importing, the data appears as ‘dimensions’ and ‘measures’. In Tableau,’ dimensions’ refers to non number values while ‘measures’ refers to actual values which can be measured.
  • You can set ‘dimensions’ or ‘measures’ as the rows / columns of the visualization you would like to generate by dragging and dropping.
  • You can choose the chart you would like from a variety of charts.
  • You can sort data in the chart created in ascending or descending order which makes trends more prominent.
  • You can create and use calculated fields (fields calculated from other fields).

Some visualizations that I created around my data and motivating question:

fat

sugar

 

Microsoft Excel

Excel is one of the most commonly used data analysis tools and a tool that I have used before. In all other visualization tools, I was importing data from excel. In this case I used excel itself to create the visualizations I wanted on the data.
Excel, like the other tools, has features to filter, sort and select data. The Insert tab in excel helps you insert charts based on selected data. A good thing about excel is that it has many different designs for different kinds of charts. For example, it has 2D and 3D bar graphs. There are several design adjustments you can make by changing the fonts, colours and styles of your chart.

A visualization I created around my data and motivating question:

stack

 

Comparison of tools

I enjoyed using all the 3 visualization tools. They have clean and easy to use user interfaces. Playing around with each tool for an hour is sufficient to get a basic understanding of the main features of the tool.
The tool which I believe has most utility is Tableau. This is because:

  • You can import data from a wide variety of sources.
  • There is huge variety in the number of charts and visuals you can create.
  • It is easier to customize the created chart.
  • Tableau is well equipped to deal with huge volumes of data.

Fusion tables is a bit more simplistic as it does not have as many data sources and options for charts.
Excel has many of the features of Tableau. However, it cannot connect to many data sources. Also, a map feature is not available in excel.

Conclusions about data

The tools helped me discover that Riccotta cheese is the best for consumption due to its relatively low sugar and fat content. This is a surprising revelation as it is considered a creamier cheese and thought to be more unhealthy.
American cheese has the highest sugar content and Cream cheese has the highest fat content and hence these 2 cheeses are less suitable in high consumption.

Primary takeaways from Assignment 3

My primary takeaways from assignment 3 are as follows:

  1. Familiarity with useful data visualization tools such as Tableau, Google Fusion Tables and Microsoft Excel.
  2. Insight into how to deal with huge volumes of data.
  3. How to create effective visualizations based on massive amounts of data.

Overall, assignment 3 was a great insight into data visualization with large amounts of data. It also improved my knowledge of data visualization tools.

 

Class Exercise – Flickr Thought Experiment

This week’s NM3229 lecture was pretty action packed as we had a guest lecture by Mr David Ayman Shamma, a senior research scientist at Yahoo! Research in USA (his bio can be found at Bio). Mr Shamma gave a very interesting lecture on what methods are used to analyze and visualize photographs uploaded to social media websites such as Flickr. He demonstrated some innovative techniques to analyze the photos uploaded by various communities on Flickr and also how the data related to the photos could be visualized.

The class exercise for the lecture was to think of a novel way to visualize the data associated with 1.2 millions photos taken in Singapore. For each photo, the following information was available:

  1. Location
  2. Time at which photo is taken
  3. Time at which photo is uploaded to Flickr
  4. User who uploaded the photo

The class split into groups, each group having around 3-4 members. All groups were given 20 minutes to think of a motivating question they would like to answer with their visualization and also to make a rough sketch of the visualization.

My group decided that taking pictures of food is very popular in Singapore. Moreover, getting the location where food pictures are taken and at what time of day they are taken could depict which are some of the most popular eating places in Singapore and at what times of the day are they most crowded. The motivating question we decided to answer with our visualization was – What are the food habits of Singapoeans?

Our rough sketch of the visualization looked as follows:

20140328_134141

We conveyed the following information with this interactive visualization:

  1. Locations in Singapore where food photos have been taken. These are represented using a marker in a map of Singapore.
  2. Slider to depict time of day – You can slide to a particular time of day and the photos taken at that time of day will appear at their corresponding locations. Colour codes on markers are used to indicate different times of day. For example, black marker indicates midnight.
  3. User track – by clicking on an individual marker, a connecting line between that photo and other photos uploaded by the same user appears helping you track where in Singapore a user is eating. It could provide interesting information about the eating habits of Singaporeans.

Overall, Mr Shamma and the class were appreciative of our efforts. Mr Shamma mentioned that our visualization was comprehensive and conveyed sufficient meaningful information. The only recommendation he had was to limit the time slider to blocks of hours in a day (eg: breakfast time – 7 am to 11 am , lunch time – 11.30 am to 2 pm etc) instead of for every hour in the day.

The other groups in class also made interesting visualizations. Here is a list of their motivating questions + visualizations:

1. What are the popular hotspots visited in Singapore from 2004 – 2014?

20140328_134011

2. What is the difference between time a photo is taken and time it is uploaded to Flickr?

20140328_134022

3. What are good places in Singapore to take light photos?

20140328_134038

4. What are the most popular colours photographed in Singapore?

20140328_134054

5. What do Singaporeans do over the weekend?

20140328_134109

Overall, it was a really fun lecture and I was able to learn more about the analysis and visualization of data from photographs through the lecture and class exercise.

 

 

 

 

 

 

Assignment 2 – World University Rankings

Within a week of submitting the first draft of assignment 1, we were given the question for assignment 2. For assignment 2, we had to use world university ranking data available on QS World University Rankings  to create a useful infographic and data visualization. Our infographic and visualization had to tie into a common theme. It was recommended that we think of a suitable question related to university rankings and try to answer that question using the infographic and visualization.

As in the case of assignment 1, the first challenge I faced was deciding what data to use from QS world university rankings. I also had to decide what theme I wanted focus on and what question I was trying to answer with my work.  Deciding the theme and question was extremely hard as there were so many avenues I could explore. I could explore rankings country-wise or region-wise. Alternatively, I could consider a more ‘causal’ approach by focusing on what has caused the rankings of universities to be where they are. A third approach would be to dig deeper into a particular parameter such as academic reputation, employer reputation, cost of living, student life etc. Ultimately, I decided to focus on the top universities in a particular region or country. This would narrow down my data tremendously. Moreover, I could explore various parameters of the top universities in depth. Now that I had decided to focus on a region or country, I was confused about which region or country to pick. My primary contenders were Asia and the US. Asian schools have been climbing in the rankings and might show some interesting trends over the years. However, top US universities are ranked very high worldwide and might be more popular amongst prospective students.  After weighing the pros and cons of both my options, I decided to concentrate on the top 10 US universities. Top US universities are very popular choices for students from all around the world because of their high international intake. Moreover, they offer very interesting courses and opportunities. I thought that my infographic could educate the reader about the top 10 US universities and give the reader some interesting facts and figures which might be useful to him / her while picking his / her university.

Now that I had decided what data and theme I would be focusing on, I started thinking of some potential ideas for my infographic. My simplest idea was to draw a map with the locations of the top 10 US universities indicated on the map. I thought it would be an interesting statistic to include the percentage of top schools on the east coast and west coast. I also wanted to compare parameters such as tuition fees, academic repute and employer repute of the top schools. However, due to lack of time, I was unable to include all the elements I wanted in my first draft. Hence, my first draft was incomplete and looked like this:

A2 draft_Dipika

As the draft was incomplete, I was unable to get a lot of concrete feedback for it. In general, people felt that the data I had represented so far was a bit disconnected and none of it was really very useful to prospective students. I had to find a better way to convey the information I was interested in sharing.

To me, thinking of a good design that is both aesthetically appealing and in accordance with design principles is the most challenging task. I was extremely confused about how to represent all the data I had in a clean and understandable way on the infographic. I looked for inspiration online and came across this infographic:

sample

The inforgraphic and related article can be found at China Knowledge Revolution

I was impressed by the following elements of this infographic:

  1. A clear heading to indicate its purpose.
  2. Use of a map to depict locations of universities.
  3. Use of different kind of charts such as 3D line chart and progress bars to indicate various statistics.

Overall, all the elements of the infographic tied into a common theme and told a coherent story. In a similar manner, I wanted my infographic to very clearly depict the central theme I had chosen. I thought of all the information I wanted to convey and what would be the best way to convey it. I thought of all the elements I needed in my infographic and came up with the following list:

  1. Location of top 10 universities in US – best way to depict this is a map with the universities marked out.
  2. Ranking of the university in US.
  3. World ranking of the university.
  4. Top subjects of study in each university.
  5. Acceptance rate in each university – good way to depict this is a status bar with the percentage acceptance rate.accept
  6. Comparison of tuition fees of top 10 US universities – this could be done using a horizontal / vertical bar graph.
  7. Comparison of academic reputation of top 10 US universities – this could be done using a horizontal / vertical bar graph.
  8. Comparison of employer reputation of top 10 US universities – this could be done using a horizontal / vertical bar graph.

As with assignment 1, I decided to use the online tool Piktochart to put the various elements of the infographic together. My completed infographic for assignment 2 looked like this:

A2_Dipika Suresh

I was pleased with my effort and felt that my infographic was very clear in its central theme. It provided a good insight into the top 10 US universities and helped answer questions students may have regarding tuition fees, acceptance, top subjects or academic repute.

The next step was to create a data visualization again surrounding my central theme of top 10 US universities. As per its wiki definition, data visualization is “information that has been abstracted in some schematic form, including attributes or variables for the units of information“. I decided to focus my visualization on how the world rankings of the top 10 US universities had changed over the past 10 years. This dataset included numerous data points and could be well represented using a line graph. I collected the rankings of the top 10 US universities over 10 years (from 2004 to 2014) on the QS world rankings website. I put all my data into an excel sheet and decided to use Tableau public to plot my graph. Tableau lets you import data from excel sheets while plotting. I used this feature to import the ranking data I had saved on excel into Tableau and was easily able to plot line graphs for each university. My completed visualization looked like this:

A2_Viz_Dipika Suresh

My primary takeaways from assignment 2 are:

  1. Use of tools like Illustrator, Piktochart and Tableau.
  2. Understanding differences between a visualization and inforgraphic.
  3. A better understanding on how to design a good quality infographic.

I enjoyed the process of creating the infographic and visualization for assignment 2. I think I have been able to improve my designing skills and I hope they will continue to improve over the course of this module.

Assignment 1 – Milestones Project

The first assignment for my NM3229 data visualization course was to create a useful infographic based on data available in the Milestones Project.  The Milestones Project charts the progress of data visualization over several centuries. It focuses primarily on inventions in cartography, statistics, graphics and technology. It provides detailed information about each invention such as the name of the invention, inventor name, location of invention and year of invention. In a nutshell, the Milestones Project charts the entire history of data visualization all the way from BC times to the 21st century.

The Milestones Project is filled with a lot of data and the first challenge I faced was to decide on which data to focus on in my infographic. I wanted to narrow down to a few significant pieces of data. I ultimately decided to focus on inventions in data visualization technology as technological inventions seemed the most interesting to me. I also narrowed down my focus to inventions in data visualization technology over 4 centuries (1600s to 1900s).

My first idea for the infographic was to create the effect of a road with numerous ‘milestones’ on the road to indicate inventions. Similar to the picture below, I wanted to draw a long road with milestones on the road, each  milestone having information related to an invention such as the name, year of invention and inventor name.

milestone

I started using Adobe Illustrator to create my infographic. I have never used Adobe Illustrator before and it was pretty difficult getting used to the software. I watched a few online tutorials to build my Illustrator skills  ( the tutorials I used are available here – Illustrator Tutorials). Having established some basic foundation in Illustrator, I began drawing my infographic. However, I realized that my lack of Illustrator skills and prior design experience were a major roadblock to drawing my infographic. I was unable to convert my initial idea into a concise and clear infographic using Illustrator. I hence decided to change my initial idea. Instead of creating a long winding road with milestones along the way, I decided to create crossroads like those depicted in the image below:

crossroad

The crossroads divide the canvas into 4 sections. I thought I could use each section to represent a particular century. Thus, my revised design consisted of a set of crossroads dividing the canvas into 4 sections. Each section focused on a particular century (1600s to 1900s). I wanted to depict the inventions of a century in a very clear manner, including information such as the name , year and location of invention. I was able to achieve using circular markers like the image below:

marker

This circular marker used to represent an invention has the name, year & location of invention

Hence, my first draft for the infographic consisted of crossroads dividing the canvas into 4 sections, each section focused on a century. Inventions in a century were depicted using circular markers (like the one above) and had the name, location and year of invention. I had a key to clarify which location mapped to which colour. My first draft looked like this:

a1 draft

The circular markers (in a particular section) are arranged in chronological order

I received a lot of mixed feedback regarding my first draft for assignment 1.

Here are some of the positives comments I received for my draft:

  1. Unique idea and design.
  2. Large amount of information conveyed in infographic.
  3. Suitable data-ink ratio (as per Tufte’s principles).

The primary criticism I received for my draft is as follows:

  1. Infographic is not very intuitive and easy to understand.
  2. The order of markers in a section can be misleading.
  3. The numerous colours used to depict location are distracting.
  4. The use of crossroads does not very clearly depict the idea of milestones along a road.
  5. No graphs / statistics used to depict trends.

After going through all the feedback received from Ms Jing, classmates and friends, I decided that the concept of using crossroads and milestones along a road was not a very easy one to grasp and execute. It would make more sense for me to choose a more simplistic and intuitive design to improve readability of my infographic. Hence, I decided to completely change my design for the final submission of my infographic.

I decided to depict the same data that I had depicted in draft 1 – technology inventions in data visualization from 1600s to 1900s. I liked the idea of giving each century its own section in the infographic. So I decided to divide my infographic into 4 rows, where each row represented a century. I wanted to depict an invention in a better way (instead of using circular markers like my draft 1).  So I decided to include a small picture of the invention along with its name, year of invention and inventor name. I depicted the location of invention using a tiny flag. Hence, my inventions looked something like this:

invent

5 pieces of information regarding an invention are depicted

Apart from depicting notable inventions in a particular century, I wanted to depict some statistic or trend in my infographic. The 2 trends I decided to depict were:

  1. Increase in number of inventions over 4 centuries.
  2. Country-wise contributions to inventions.

I depicted the first piece of information using a line chart as line charts are a good way to depict a change in value over time. I observed that the number of inventions had risen from just 5 in the 1600s to 15 in the 1900s.

I depicted the country-wise contribution to inventions using a pie chart. A pie chart very clearly depicts the percentage contribution of various countries to technological inventions. I observed that US was the primary contributor to inventions.

I now had all the elements I needed for my final infographic and what remained was putting everything together.Instead of using Illustrator to put all the elements together, I decided to use an online tool called PiktoChart, recommended to me by one of my classmates. Piktochart helps to easily format and put together various elements of the infographic.

After many days of hard work, I was finally done with my infographic. It looked like this:

final a1

I had separate sections for centuries, notable inventions in a century, important information related to an invention and relevant trends and statistics. I felt like I had come a long way from when I first started working on assignment 1.

My primary takeaways from this assignment are:

  1. I have gained some knowledge on designing infographics.
  2. I have learnt how to use design tools such as Illustrator and Piktochart.

I am pretty satisfied with my final infographic and hopefully I will receive positive feedback for it!

Class Exercise – Singapore General Elections Tracker

Its my first ever blog post on my first ever blog and this post is all about the first class of my first ever data visualization course in NUS. Talk about a whole lot of firsts! While first attempts can be extremely daunting, they are also exciting, enriching and enjoyable. Which is why I am super excited about this blog and my data visual visualization class. Hopefully, by the end of the course, my blogging and data visualization skills  will have improved by leaps and bounds.

As a student, I inevitably use infographics , pie charts , bar graphs etc to represent useful data in project reports. But the visuals I create are extremely simple and quite amateur like.  There are so many incredibly talented people in the world who represent data in such creative ways. The visuals they create are imaginative, elegant and most importantly extremely understandable. This course should definitely help me enhance my data visualization skills. Perhaps by the end of the course, my handy work will no longer be considered amateurish.

The first class of the course saw us discussing terms such as data, visualization and infographics. The highlight of the class for me was the TED talk we watched on a father trying to understand his son’s speech patterns. The most fascinating part of this talk were the different ways data was visualized and represented. Trails to depict household activities and graphs to indicate where certain words were used most often were used to represent and understand how a 2 year old boy learned to speak English. The video was an insight into how huge volumes of data should be analyzed, categorized and represented in an understandable format.

Check the video out at: The birth of a word

Our first ever class assignment was to examine the Singapore 2011 general elections tracker and try to infer useful information from it, a seemingly trivial task but complex nonetheless as the tracker was rather difficult to comprehend initially.

Here is what I inferred from the tracker:

1. The tracker represents the most popular terms or terms searched for most often by the Singapore public in the days leading up to the general election 2011.

2. Popularity of a term over time is depicted using color coded graphs.

3. Relationships between popular terms and related articles and tweets is depicted using color coded graphs.

There were many elements of the tracker that I was impressed with and a few elements that I did not like.

The positives:

1. I was impressed by the graphs used to depict trends in popularity

popularity

For example, in the image above, it is evident that the term PAP is very popular and has remained popular over the timeline of 5th May to 8th May. The usage of graphs made it very intuitive to understand how the popularity of a term had changed over time.

2. Headings such as ‘Running’ , ‘Key Terms’, ‘Latest’ gave a clear picture on the latest and most recent terms.

3. Relationships between terms and related articles or tweets were very well defined.

4. The interactive user interface made it easy to click on a term and view its popularity + related tweets and articles.

video

For example, the image above very clearly shows the change in popularity of the word ‘video’ on the left hand side and also depicts articles and tweets with the word ‘video’ on the right hand side.

5. The tracker represents a large amount of data with minimum graphics. It does not add any unnecessary elements just to look attractive.

The negatives:

1. The numbers put next to the key terms are not easily understandable.

numbers

For example, in the image above, the use of the numbers 11 , 2 , 10, 3 is not easily understandable. Only after a little exploration of the visualization, it becomes evident that the numbers depict changes in popularity over time. For example, on 7th May, WP was the 11th most popular term but it improved to the 2nd most popular word on 8th May.

2. While the tracker claims to fairly depict the most shared content on social media, Twitter appears to be the only source that is well represented. If Twitter is the only social media source utilized, the tracker may not be a very fair representation of the truth.

3. The colors used for the graphs are misleading. It is not very evident as to what is the purpose of the color codes used for various graphs.

My suggestions for improvement:

1. An easier color coding on the graphs. For example, green to indicate increase in popularity and red to indicate drop in popularity of a term.

2. More social media sources such as Facebook, Quora, Pinterest and Google+ for the data used in the visualization.

3. Use of a horizontal bar graphs to more clearly indicate the popularity of a term or how many times the term has been shared or been searched for.

bar graph

If you would like to check out the tracker, the link is: Singapore General Elections 2011

On the whole, the first class of data visualization was interesting and I am excited about what is in store for the rest of the semester!