This month we speak to Dr Brieuc Lehmann to find out more about Data Science for Health Equity (DSxHE) which aims to advance health equity through data science.
What is your role and what does it involve?
In my day job, I’m an Assistant Professor at the Department of Statistical Science here at UCL. I split my time between research and teaching undergrads and masters students across statistics and data science. I currently lead a research group of three fantastic PhD students and a postdoc, working on a variety of biomedical applications with a broad focus on fairness and health equity.
Outside of UCL, I’m the co-founder of Data Science for Health Equity (DSxHE), a cross-sector community of over 2000 academics, health and care professionals, policy-makers and enthusiasts working to advance health equity through data science. DSxHE runs a range of activities from workshops to webinars on a variety of themes from participatory research to musculoskeletal health. As a community, DSxHE focuses on bridging connections between the data science and health equity worlds, raising awareness of how data science can be used to understand and reduce health inequalities, and improving practice both at the individual and institutional level in how data-driven innovations are developed and deployed across the healthcare system. You can find out more here.
How are you improving the health of the public?
My group’s research centres around developing and applying statistical methodology to understand and mitigate biases due to the lack of diversity in today’s biomedical datasets. For example, genomic datasets are hugely skewed towards people of European ancestry. This means that models trained on these datasets tend to perform worse for people of non-European ancestry, which can have serious implications when these models are applied in clinical practice. Our goal is to both understand the limitations of these models and figure out how to best mitigate any differences in performance.
This type of research helps inform clinical decision making both at the individual and population level. With more and more complex models being used in biomedical research or deployed in clinical practice, it’s crucial to know when we can rely on the models or when we should be cautious about the outputs
What do you find most interesting or enjoyable about your work?
The variety - one of the best bits about being a statistician is the range of problems you get to work on. Since I began my research career 10 years ago, I’ve had the opportunity to work on methodology for neuroimaging data, genomic data, epidemiological data, health records, clinical trials… the list goes on. I love delving into each of these applications, trying to understand the common problems facing each of them, and what aspects make them distinct. That extra layer of abstraction that statistics offers is what I find most fascinating and exciting: for example, what is it about a method developed specifically for genomic data analysis that could be adapted to other applications.
Being involved with DSxHE means I get exposed to an even broader range of expertise that I might not otherwise have through my day-to-day research, particularly from beyond academia. I’ve learnt so much through the different themes within DSxHE (currently Participatory Research, Mental Health, Social Justice, Musculoskeletal Health, Statistical Methods). Our fantastic organising team includes clinicians, communications experts, and academics from many different disciplines. Working with such a diverse and passionate set of volunteers from across the world is also a huge added bonus that I find greatly rewarding and motivating.
How have cross-disciplinary collaborations shaped your research?
Interdisciplinary work has been fundamental for me. This started with my PhD, during which I was supervised by a statistician and a neuroscientist. Coming from a mathematics background, picking up some of the biological aspects was a real challenge and a steep learning curve. Fortunately, my supervisors were heroically patient with me and happy to answer my very, very basic questions.
During the pandemic, I was seconded to the ‘Alan Turing Institute - Royal Statistical Society Health Data Laboratory’. This was set up to support the UK Health Security Agency (UKHSA). We worked closely with colleagues from UKHSA to provide statistical modelling support to understand how SARs-COV-2 was developing through the population. One of these projects aimed to estimate how many people had COVID-19 in different regions of the UK in a given week. We used this statistical methodology in some follow-up work to try and tease apart the role of ethnicity and socioeconomic status and its association with COVID-19. These projects were hugely collaborative, relying on expertise from different aspects of statistics, epidemiology, public health, behavioural science and beyond.
These collaborations have really highlighted to me how different fields often face similar challenges, at least from a statistical perspective. In fact, this was one of the main motivations for setting up DSxHE, as a way to share learnings between disciplines and accelerate the uptake of the best data science practices across different parts of the healthcare ecosystem.
What advice would you offer to others interested in developing cross-disciplinary research?
Develop a common language! One of the hardest things I encounter in cross-disciplinary work is confusion in terminology. The same concepts are given different names in different fields; even worse is when the same names are given to totally different concepts. This kind of translation requires a lot of effort, for example, regular meetings with collaborators or attending domain-specific conferences, but it’s worth putting in the work early to avoid misunderstandings further down the line.
At DSxHE, we have a couple of projects in the pipeline aiming to create tutorial-style materials on topics like algorithmic fairness and bias mitigation, aimed at healthcare professionals and non-academic data scientists working in healthcare. Our hope is that these resources will help speed up this translation process… stay tuned
What's next on the research horizon for you?
My main aim is to make biomedical research more inclusive and effective for everyone. Right now, a lot of research studies don’t represent the full diversity of the population—they tend to leave out groups based on ethnicity, socioeconomic status, and other important factors. This can lead to findings that don’t work as well for everyone, especially those who’ve been historically underrepresented in research.
My goal is to fix that by developing new statistical tools with two main purposes. First, we need tools that can evaluate how diverse an existing dataset is and highlight which parts of the population are insufficiently represented. This means evaluating the accuracy of models trained on current data across the population and identifying where these models are performing worst. Second, we need tools to pinpoint how much more data is needed to bridge any existing gaps. This is similar to sample size calculations for clinical trials, i.e. how many people do you need to recruit into your study to ensure adequate statistical performance across the population. The overarching aim is to improve the way studies are designed to represent everyone, especially those historically underserved by biomedical research.
If you could make one change in the world today, what would it be?
With my statistician’s hat on, the main thing I would love to see is better statistical literacy across the board. A recent Royal Statistical Society survey found that about half of MPs were unable to answer a simple probability question. Now, MPs might not be a representative sample, and one might hope that the general population might do better… But given the increasing prominence of data throughout society, a better understanding of statistics is going to become more and more important.
This is particularly important in healthcare settings, where the stakes are so high. Being able to understand statistical results—like the effectiveness of treatments, the reliability of clinical trial outcomes, or the risk factors for diseases—is essential for providing the best care. It can also help patients understand the information they receive from medical tests and be best informed to make decisions regarding their own care. In short, better statistical literacy would help healthcare providers, researchers, policymakers, and patients to make more informed, transparent decisions, ultimately improving health outcomes.