Gathering data

One of the cool things about doing a PhD in science is that the types of things you need to learn and the set of steps you go through are fairly similar to those other science (and some non-science) PhD students are going through, pretty much regardless of the actual subject and specific topic of your PhD. This can make for a great sense of belonging to a bigger group of people with common experiences, as well as providing opportunities for sharing and learning from each other’s experiences (online as well as IRL).

However, there is still a fair amount of variability and one of the ways in which individual experiences can differ drastically appears to be at one of the earlier stages, namely data collection/acquisition. This step in the process of a PhD can take up a really substantial chunk of time and energy. On the other hand, for some (arguably lucky) students it can be quite minimal, if they have access to an already existing data-set. Some subjects and topics are more suited to running your own individual experiments and collecting your own data, whereas other research questions might be impractical or even impossible to address within the scope of a PhD without access to a pre-existing database. In effect, the choice of topic for one’s PhD might automatically dictate which route (data collection or collaboration/use of other samples) would be ideal, if not necessarily available.

In the field of psychiatric genetics, there is something of a disagreement about the relative importance of quality vs. quantity – it is clear that there is so much variation in DNA and human behaviour that large sample sizes are necessary to rule out chance findings but equally, thorough measurements of each individual are essential to adequately measure the psychiatric problems of interest. The need for both of these (quality and quantity) makes personally collecting sufficient data to address even basic questions during a PhD in this field nearly impossible, hence the need for large data-sets. Although I count myself as extremely lucky to have access to some data which will be indispensable for the analyses I have in mind, I’m at the point where I’m about to set out to collect some other data which would also be really useful. This fills me with a sense of dread for two reasons:

1) as I am planning on collecting sensitive clinical information from people, I am going to need to go through an ethics panel to approve the project – having spoken to a number of friends and other researchers, I can tell this has the potential to be an exceedingly complicated and time-consuming process (see Dorothy Bishop’s post on the subject for a particularly tough account)

2) also, my target sample is a set of individuals who have already been seen previously and trying to track them down and get them to take part in a second bit of research may prove quite challenging, not to mention that those who refuse or are impossible to track down, are likely to be the ones with the most severe problems (i.e. the ones I would like to track down the most!)

Although my concerns are real, I can’t help but feel that I have been spoiled so far by not having to spend my PhD time collecting data and I know others who have had a much tougher time of it than I am likely to. There are some definite positives to be said about collecting your own data though. I know of these to some extent first-hand as I worked as a research assistant collecting some of the data that I am using for my PhD before I started it. The time and effort taken to collect the data, however tough at the time, makes you really appreciate what you have and makes you unlikely to take it for granted. Secondly, collecting the measures yourself really improves your understanding of what they are and how reliable they are.

The other potential advantage, depending on your viewpoint, is that spending a large amount of time collecting data means you have less time to do other things and the impression I get is that if you have access to data and are not collecting it yourself, you are expected to do more complicated analyses and more other types of work. The combination of having access to some data and needing to collect some is potentially quite a good one as you get the advantages and experience of both. I may change my mind though once I get started with the collection…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s