National Child Measurement Programme: using statistical and machine learning approaches

Title: National Child Measurement Programme: using statistical and machine learning approaches to inform public health practice

Supervisor: Oliver Mytton, Mario Cortina Borja

Project Description:
Background
The National Child Measurement Programme (NCMP) measures the height and weight of every child in Reception and Year 6 in England each year. It is a very large dataset, over one million children measured each year since 2006/7.

It is primarily used for surveillance. However, the dataset can be used to understand the effects of public health interventions. For example, correlating the implementation of interventions (like the Soft Drinks Industry Levy or Sure Start) with changes in childhood obesity trends or identifying outlying local authorities. To date, however, it has been under used as a research tool.

The aim of this PhD project is to explore the use of statistical and machine learning methods applied to this dataset to inform public health practice.

The PhD would be well suited to somebody with a strong interest in data science and public health.

The PhD will be in partnership with the Office for Health Improvement and Disparities who manage the dataset.

Aims/Objectives
1.   To develop skills in the appropriate statistical analyses of a very large dataset and communication of findings to inform public health action.
2.   To identify and apply the most appropriate statistical methods to identify outlying local authorities with respect to trends or absolute levels of childhood obesity having adjusted for socio-demographic factors (ethnicity, deprivation).
3.   To use modern statistical and machine learning methods leveraging variation in implementation (‘natural experiments’) to evaluate the impact of public health interventions (e.g. Sure Start, Free School Meals) on children’s health (i.e. weight status, height).

Methods
The PhD will consist of two discrete pieces of work using the NCMP dataset.
The first part will build on statistical and machine learning methods (e.g. latent class models, modern regression analysis, advanced visualisation techniques) to identify outliers and outlying groups of local authorities with respect to childhood obesity. This will characterise the differences in childhood obesity between areas, after accounting for socio-demographic factors.

The second part will model the association between a public health intervention that might plausibly be considered to have an impact on the health of children aged 4/5 years and/or 10/11 years, as assessed by either weight status and/or height and that have been delivered differently over time or area (i.e. a natural experiment). There are a range of interventions that could be considered spanning individually focused delivery (e.g. health visiting services), school-level (e.g. free school meals), and population interventions specifically focused on health (e.g. restrictions on takeaway outlets near schools, soft drinks industry levy) or non-health interventions likely to impact on children’s health (e.g. austerity). The student will use a range of data analytic methods to estimate the effects of the interventions on trends of childhood obesity, and to develop predictive models to construct future scenarios.

Timeline
0-3 months: orientation, literature review and background reading
3-6 months: training in statistical programming/learning to use R
7-19 months: Project 1: Outliers: methods for detection and treatment
20-30 months: Project 2: Analysis of interventions’ effects
31-36 months: Synthesis and writing-up the PhD thesis

References
1.   Viner & Hargreaves. Trajectories of change in childhood obesity prevalence across local authorities: a latent class analysis. Journal of Public Health, 2018
2.   Mason et al. Impact of cuts to local government spending on Sure Start children’s centres on childhood obesity. JECH, 2021
3.   Rogers et al. Associations between trajectories of obesity prevalence in English primary school children and the soft drinks industry levy. PLOS Medicine, 2023
4.   Barnett & Lewis. Outliers in Statistical Data, 1994
5.   Efron & Hastie. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science Cambridge, 2001

Contact Information:
Oliver Mytton