Human brain activity is constantly changing and diverse and lack of sufficient metadata can make results difficult to interpret.
Anyone who has recorded and analyzed human EEG data for anything longer than one minute will know that our brain activity is an ever changing landscape moving in and out of different states and patterns. ‘Statistically significant’ differences in the same features of the EEG turn up between a large number conditions and groupings. ERP amplitude will change in response to a myriad of cognitive tasks, and activity in the alpha or beta bands on the power spectrum will change in response to various tasks of attention, memory or even consumption of common stimulants like tea, coffee and alcohol. Differences in these same features can correlate to measured personality characteristics or mood states or age. And there is quite some difference between people (see the Myth of the Average Brain). Conversely, baseline differences arising from mood, stimulant consumption, personality and other aspects can also quite easily mask effects of tasks, particularly in small datasets. This poses an enormous problem for how to interpret ‘statistically significant’ results on any single dimension, or for that matter, the lack of one.
Are the higher effects in alpha activity during a task a consequence of inadvertently selecting a certain personality type as subjects for the experiments? Did a reported increase in alpha activity on an attention task fail to show up consistently across because some of the subjects had had a lot of caffeine or too little? Or perhaps the differences between age groups were on account of differences in lifestyle habits rather than aging and the younger bunch had come in to the experiment with way less sleep or had simply played too much nintendo?
It is difficult to control for everything – there are simply too many variables – but as a field we can mitigate confounds in several ways and arrive at more meaningful and reliable results.
- Larger datasets
A vast majority of published EEG papers involve between 15 and 40 subjects. Often this arises from wanting to get a paper out as quickly as possible and recordings are concluded once a statistically significant result is reached. However, with smaller datasets the samples can be either too homogeneous (i.e. everyone recruited were your friends who are like you and your results may not be generalizable) or too different (you recruited from a general population and got a mix of people who were SO different from each other on every dimension and these differences will mask what you are looking for). (see Musing about Age with EEG as an example of getting to larger scale datasets)
Further, sample sizes at this scale are often not sufficient to tell you what the distribution across the population truly looks like. Therefore you may inadvertently make totally wrong assumptions about the distribution in your choice of statistical test. A small homogeneous sample may give you a sense that the distribution is normal (with just a couple outliers that you might simply remove) whereas sampling across a large scale may reveal a skewed or even long tailed distribution (see here for an example). If this were the case, using a t-test, as often done, would be entirely inappropriate and misleading.
With larger datasets (in the hundreds at least), uncertainty about the distribution and therefore an appropriate statistical test can be overcome by bootstrapping methods that involve shuffling of the data and reporting the probabilities of a particular result.
- Repetitive sampling
While larger datasets can give you more to work with statistically, it still does not answer many of the real confounds such as whether the effect is persistently significant and not dependent on the state of mind of the subjects at the time of experiment. One way this can be overcome is by repeating the experiment with the same set of subjects on different days to look at intra-person variability. Did the effect occur in the same way and same magnitude every time or did it only show up sometimes? If it only showed up sometimes, it’s a sure sign that other things besides what you are measuring (or think you are measuring) are at play.
- Collecting more control data
Finally however, the more control data we can collect, the more we can tease apart what is truly responsible for any ‘statistically significant’ effects or what is masking one. It is good practice to collect a basic demographic as well as information about the subject’s state of mind at the time of recording to understand their mood and alertness, what they have consumed prior to the session and any relevant clinical history or common current symptoms such as headaches or stomach cramps that can interfere with experimental performance. Many public datasets we have tried to work with – for example from PLOS articles or public repositories of EEG data – have very limited metadata making interpretation extremely difficult. Often even basic factors such as age and gender are missing and certainly there is no information that tells us anything about the person’s clinical history or mood disposition (maybe they are depressed? Or they have a bit of a headache just that day) or stimulant consumption (did they come to the experiment right after their morning coffee?).
Towards common metadata structures
By having common metadata it will allow easier comparison across studies and meta-analysis and also easy aggregation of datasets to deliver far greater and more reliable insights. Here are forms we have created to collect basic subject information as well as information about the subject’s state at the time of recording that can accompany any experimental design and provide a rich set of control data.
Subject Form: This form covers demographics and basic health information and is designed for relevance to a global population.
Session Form: This form covers the subject’s physical and mental state at the time of the recording along with information about consumption of stimulants or medication.
These will form part of a standard set of forms that will be available through Brainbase when we launch (stay tuned for more info, some labs are alpha testing it and it will be openly available in a couple months!). Our hope is to create a consistent metadata that can enable more extensive controls, easy aggregation of datasets, as well as the ability to find new insights that may not have been part of the specific experimental design. We invite you to use them and give us your feedback. Think we’ve left out anything critical? Leave a comment to let us know.