Data collection from human subjects is always a challenge. Now that everyone and their dog is on the internet, it has become possible to collect data online. Some researchers have investigated the feasibility of doing so and have concluded that it is possible. These data can also be analysed to understand human learning, cognition, psychology, and possibly other topics of interest.
Most of the researchers seem to be paying participants small sums in exchange for the data they produce. But another alternative exists when you consider what is going on with all the data that users generate on interactive websites. Many companies mine it and sell it to advertisers, market researchers, and the like. Others use it to improve the user experience or to evaluate changes to the code base.
I’m interested in possible ways of combining the provision of useful services in exchange for collecting (mostly) anonymised data which can then be used for research. Twitter, for example, has created one of the largest corpora in history of speech-like text which everyone from computer scientists to linguists to political scientists are analysing.
Other examples include Coursera and Khan Academy, both of which collect data on human learning in exchange for a free education. Other sites, such as Human Benchmark, don’t even really offer a service, and yet manage to collect impressive data sets.
So, what I propose is a discussion about
- what types of human data are interesting, but difficult to collect
- what kinds of services or formats could be used to entice people to produce that data
- what are some reasons why these types of services have succeeded/failed in the past
- what existing platforms/projects could be leveraged to facilitate data collection and service provision
I have some limited experience collecting data through websites which have yielded interesting insights into vocabulary acquisition and rent pricing (yes, they are completely unrelated). I’m keen on pursuing this concept further to study the development of reading proficiency and speed in a second/foreign language.