Background
Data collection from human subjects is always a challenge. Now that everyone and their dog is on the internet, it has become possible to collect data online. Some researchers have investigated the feasibility of doing so and have concluded that it is possible. These data can also be analysed to understand human learning, cognition, psychology, and possibly other topics of interest.
Most of the researchers seem to be paying participants small sums in exchange for the data they produce. But another alternative exists when you consider what is going on with all the data that users generate on interactive websites. Many companies mine it and sell it to advertisers, market researchers, and the like. Others use it to improve the user experience or to evaluate changes to the code base.
Proposed discussion
I’m interested in possible ways of combining the provision of useful services in exchange for collecting (mostly) anonymised data which can then be used for research. Twitter, for example, has created one of the largest corpora in history of speech-like text which everyone from computer scientists to linguists to political scientists are analysing.
Other examples include Coursera and Khan Academy, both of which collect data on human learning in exchange for a free education. Other sites, such as Human Benchmark, don’t even really offer a service, and yet manage to collect impressive data sets.
So, what I propose is a discussion about
- what types of human data are interesting, but difficult to collect
- what kinds of services or formats could be used to entice people to produce that data
- what are some reasons why these types of services have succeeded/failed in the past
- what existing platforms/projects could be leveraged to facilitate data collection and service provision
- etc.
Qualifications
I have some limited experience collecting data through websites which have yielded interesting insights into vocabulary acquisition and rent pricing (yes, they are completely unrelated). I’m keen on pursuing this concept further to study the development of reading proficiency and speed in a second/foreign language.
THATcamp misc notes
1 Session notes
1.1 what types of human data are interesting, but difficult to collect
Everything, basically, but especially data that can be analysed
in its own right.
1.2 what kinds of services or formats could be used to entice people to produce that data
whatever it is, it needs to have a competitive or social aspect
1.3 what are some reasons why these types of services have succeeded/failed in the past
successful services need to enrich someone’s life
difficult to find motivating factor
hard to keep people coming back
takes time to build an audience
ensure it’s usable
start from an existing need/demand
have to have a way to filter out the garbage
1.3.1 Ethical issues
privacy policy may help, people seem quite open
ethical issues – who benefits? Ideally, both users and data
collector’s interests should be balanced, with an emphasis on users
be upfront and not sneaky
release data to public
don’t be like Google+ where they claim to make it beneficial to user, but actually it’s not
1.3.2 Be a member of the community
1.3.3 help users be social with each other
1.4 what existing platforms/projects could be leveraged to facilitate data collection and service provision
mix and mash: open government initiative in NZ
five stars of open access
others?
1.5 problems with volunteer crowd sourcing and gamification
1.5.1 not representative group?
2 history example
How could this idea be used to generate new data that would be
worthy of analysis to historians?
get children to record their grandparents stories
and have contests