Category methods

Selection bias and the perils of data science

According to the Guardian Data Blog, Obama is heading for electoral success, on the basis of a Twitter-based analysis. It’s all very nice to see mapped out, and the use of geocoding is cool (though possibly flawed), but underlying the approach is a massive potential for selection bias. The problem is quite simply this: if […]

Correlation vs Causation (part 1)

I’m a massive fan of the webcomic xkcd. Don’t be surprised if you find me using Randall Munroe’s creative outputs on a regular basis to help me get my point across. It’s easy to find things that correlate in every day life. Cold weather spells correlate with higher fuel bills. The start of the festival […]

On the limitations of binary measures

Scientists like to measure things. And they like to do it accurately. Striking a balance between real world variation and manageable data sets can be a challenge. Epidemiologists measure things that fall into three broad categories. Firstly, the presence of a disease or health state (the ‘outcome’). Secondly, the presence of factors thought to contribute […]