1 min read 0 comments

Today, I was able to play around with a Gapminder data set, the “Body Mass Index (BMI), women, Kg/m2”. I chose this data because I got curious about what trend I can obtain from it. This year, I was able to lose more than 10 pounds, but I stopped looking at my weight. So I think it has been slowly creeping back in. Anyways, you can understand my fascination on BMIs.

The data are available as xls file, with different tabs that explain the data. But I saved the first tab, which is the actual data, composed of years and the corresponding average BMIs of different countries. There were 199 countries in this data set. (According to answers to a Quora question, the number of countries of the world varies.) BMI values were from 1980 to 2008. After saving the first tab into a csv file in an appropriate folder in my computer, I loaded it on R using the “read.csv” function with the default arguments.

bmi_female <- read.csv('indicator_bmi_female.csv')

The resulting dataframe has dimension of 199 observations (countries) and 30 variables (the first is “Country” and the rest are the years of the character class).


After playing with the table for a while, I saw the need to convert it to a data with three columns. The first column would have to be the country, the second is the year, and the third column is the BMI values. I used the tidyr package to this end.

bmi_female2 <- gather(bmi_female, "year", "bmi", 2:30)

The resulting dataframe is now composed of 5771 observations and three variables.


In this way, I can use shorter lines of code creating my boxplots which are better than the histograms to show the trend in the female BMI throughout the years. For the following boxplot, I limited the y-axis to nine years out of the 29 available.

ggplot(data = bmi_female2, aes(x = year, y = bmi, color = year, fill = year, alpha = 0.5)) +
  geom_boxplot() +
  scale_x_discrete(limits = c("X1980", "X1982", "X1984", "X1988", "X1992","X1996", "X2000", "X2004", "X2008"))


So one can see that throughout the years, the female BMIs are getting higher, in general. The histograms reveal that female BMI distribution is sort of bimodal. I’m now curious to see what the trend is in the corresponding male BMI values.