The fallacy of web polls
These last days are full of politics and polls as we get closer to the European elections. Fueled by this climate and by endless web polls running around european websites, it is time we set some things straight about statistics in websites.
Statistics is not very far from what we do in this blog, talking about computer programming technologies. Statistics is a tool we use often, but unlike web technologies (HTML, CSS, JavaScript etc) statistics have very deep roots and programmers/webmasters misjudge them as something simple and easy. Statistics need educated, thoughtful and calm minds to produce objective results.
Before you perform any sort of statistical research you need to understand the two basic types of researches. Census and surveys. Census is a research in the entire population whereas survey researches a part called the "sample". Census is more likely to be solid because you have results from the entire population. In surveys you need to pick your sample wisely because people will then try to project the results to the entire population.
For example if you are researching a group of people where gender plays some role and 50% of them are men and 50% are women, your sample needs to be balanced 50% of men and 50% of women. If your sample is not balanced, after you are done collecting results you need to adjust your metrics so that results are balanced to the percentage of men and women in the population.
Another important factor is "bias". If you want good results from a sample, you should do everything possible to get people answer your questions freely. That is why one basic rule in surveys is not to inform anyone about results before they give their own answers. If you predispose them, their answers will have less or no value. If their answers have not value, your entire research will be misguided.
The questions and their wording are also a crucial factor. If you want good results you should be very careful on the way you formulate your questions. In most cases the human mind answers questions as a result of predisposition by previous questions.
Even when you do everything by the book, your research is still dependent on the sincerity of the people in your survey.
Now, lets get to web polls we usually see in websites. Nothing of what previously stated is taken into account. Web polls almost never get qualitative metrics (e.g. if the user is a man or a woman etc), therefore they don't have any idea of the composition of their sample. They usually don't even have statistics of the entire population, therefore their sample is completely unable to be projected to the entire population. Web polls often allow users to see the results before they vote, which is what we previously called "bias". Questions and answers are usually set by people ignorant of communication skills, therefore the set up of questions and answers is also probably in a bad shape. One extra bad score for web polls is that they let the same people answer the questions more than one times, even if that means they can do it from different IPs or by using a different browser.
Statistics should be done by people who know how to run them. They should also be done by impartial people, otherwise they can easily be skewed to misguide. The next time you see a web poll, you should probably ignore it.