“Dataviz”, “infographics”, “charts”, “graphs”, etc. We are very fond of these types of analyses and their visual results that are so commonly shared on social networks. However, they are often unscientific, therefore unusable and not credible. The Professor OG will explain the rules to be followed in order to present an analysis that is scientifically correct.
Xavier: Hello Professor OG, could you explain to us the stages, from start to finish, that will allow us to present a reliable study, whatever its format?
“There are two ways of doing this:
1/ Either I begin with a hypothesis that I want to verify
2/ Or, I am analyzing data and observing the results that could be of interest
In the first instance, if we want to verify a hypothesis, the hypothesis being the result we want to prove, it must be clearly stated. We then put into place a process that is going to allow us to verify or contradict our hypothesis:
1/ We need to begin with a data set that is going to be sufficiently representative of the population as a whole (in other words, the minimum number from which the addition of data will no longer change the result).
2/ Clean up the data set, by removing the outliers (the values that occur statistically less often).
3/ Compare the intended result with the actual result derived from this data.
4/ Conclude the hypothesis: this conclusion can take many forms, you can for example, completely accept or reject the hypothesis, but you can also reduce or extend the hypothesis, according to the results obtained. This isn’t so important, as long as you remain clear about what has been proven by the study.
In the second instance, we don’t actually have a hypothesis, but by studying a data set, and by comparing the evolution of certain variables in relation to others, we are trying to demonstrate a correlation that could be of interest. From this point on, we can hypothesize and go back to the first instance, taking great care to verify the results on a new data set.”
Xavier: What are the basic rules that mean a scientist such as yourself, can present coherent results?
“1/ you need to have a sufficiently large and representative sample,
2/ you must not try to prove the hypothesis at all costs,
3/ you must test using two different samples,
4/ you must be sure to compare what is comparable,
5/ you must correctly differentiate what was stated at the beginning (axioms) against what is proven by the study.
Basically, the most important thing to remember is that you must remain honest and clearly set out the approach to the study.”
Xavier: What points would you need to be aware of, if you were Joe Bloggs, in order to tell if a study was credible or not?
“1/ you would need to have a description of the sample (e.g.: size, period of observation…),
2/ you would need to be capable of easily identifying the hypothesis (the basic assumptions) from the start,
3/ ideally, you would need to be capable of following the author’s approach from the beginning to the end. If the process is clear and can be reproduced, the study is more likely to be credible.”
Xavier: Could you recommend a recently published study?
“This study, Uber, by Ren Lu, that attempts to predict the final destination of Uber’s clients, on the basis of the place where they stopped: http://blog.uber.com/passenger-destinations”
Professor OG the Over-Graph’s analytics guardian. His team, the #DataTeam_OG produced this study, dealing with Community Managers’ activity on Facebook & Twitter in July 2014, which has been rerun thousands of times on social networks:cette étude, reprise des milliers de fois sur les réseaux sociaux. The team also works on predicting the engagement generated by Facebook publications and Tweets generated by OG users, according to the time and date of publication.