The big challenge: Your data quality

Let’s say you already collected enough participants for your research. Fantastic, isn’t it? Well, although as impressive as it may seem, it does not mean your data is foolproof and reliable just yet.

‍

In this and the next part of our guide, we are going to discuss, what types of distortions can be observed in online-collected survey data and how to accordingly clean your dataset.

We already talked about response bias there (link). There are three further biases to be discussed:

Coverage bias

Coverage bias describes a situation in which participants of your survey do not correspond to the general population.

‍

Example? Inviting your acquaintances and family to help you with data collection may seem like an easy way out, but it ultimately leads to spoiling your data quality. Chances are, most of your friends share the same age, class and occupational profile as you, as well as likings and tendencies - and so does your family. Therefore you cannot make statements about the entire population if your survey is only or mostly based on their responses.

Non-response bias

Think about the non-response bias as the opposite of coverage bias. It’s not about who answered your survey: it’s about who hasn’t.

‍

The poor construction of the survey or the wrongly chosen target group might result in abstaining from taking part. The issue arises, when abstainers ideally fit in your target group. Chances are, their opinion would eventually change the outcome of your survey. However, because they didn’t take part, you never get to know about it.

Self-selection bias

The self-selection bias is a distortion that occurs when a person, who does not fit your target group, proceeds to answer your survey anyway. Most online surveys run almost exclusively via self-selection of the participants. If you include control items in your survey, such as age or gender questions, you will be able to spot such delinquents. But not always. And if your self-selected respondents vary in their opinion from your target population, your dataset will be undoubtedly distorted.

In addition to the aforementioned biases, there are other difficulties with data collected via online surveys. Such problems include a high dropout rate and multiple responses per one participant.

‍

So what can be done to achieve representative results?

Statistical practices

Statistical practices can help to achieve a more reliable dataset. Methods such as Oversampling, Multiple-Site-Entry, Quota Sampling and Weighting Procedure helps to minimize survey biases -- we have an article about them coming soon!

Survey networks / online panels

Yet, not all survey-creators have the time and skills to apply the aforementioned statistical practices to their research. That is when survey networks and online panels come in handy.

‍

Thanks to a broad, heterogeneous and international user base, there is no coverage bias among the survey participants. The need to answer other surveys on the platform eliminates the non-response bias and minimizes the self-selection bias. Moreover, drop-out rates are reduced due to the incentive system and multiple responses are prevented.

‍

One of the leading survey networks is PollPool, which stand out from other platforms by being free and user-friendly. PollPool operates through a survey-exchange economy, in which the more surveys you answer, the more responses you receive.

Click for the next part of the guide --> Cleaning your survey data

The big challenge: your data quality