About Me
Welcome to my personal page. I am a PhD student in Statistics at the University of Cambridge, where I explore innovative functional and spatial statistical approaches with an eye toward addressing public policy questions. My work is driven by a passion for making sense of complex data to create impactful insights.
Academic Background
I completed my undergraduate studies in Mathematics at the University of Cambridge where I developed a keen interest in Statistics, which led me to specialise the field for my postgraduate studies at the institution. This passion continued to grow as I engaged in a research internship in Bayesian inversion in positron emission tomography (PET) under the supervision of Sergio Bacallado, and a research dissertation where I developed a novel methodology for excess death estimation, applied to Covid-19 pandemic, utilising tools from robust statistics. This paved the way for my doctoral studies here at Cambridge.
Research Interests
My research currently focuses on the intersection of spatial statistics and functional data analysis. The former grapples with the problem of observations which are not independent, namely are correlated according to a spatial model, typically in two or three dimensions. This is a varied field which arose initially in the field on an ad hoc basis independently across many countries, although more recently the topic has been consolidated, in no small part thanks to the work of Cressie among many others. Meanwhile, functional data analysis—a relatively recent area of statistical research—focuses on the types of data objects that are observed: instead of a numeric vector, it is a function, living in an infinitely dimensional Hilbert space. This causes many theoretical and computational issues, perhaps most strikingly the inability to invert the covariance operator. I have developed novel methodologies in this intersection in two areas: Covid-19 incidence trajectories, and house prices. The following paragraphs discuss the central concepts of my research in these applications.
Consider first Covid-19 incidence trajectories, in particular the positive test results for the virus between March and June 2020 for each of 380 local authorities in the United Kingdom. These form the functional data, which are correlated with each other by geographical proximity. The trajectories vary in amplitude, yes, but perhaps of more interest is the relative earliness/lateness or sharpness/flatness of the waves in different authorities. Indeed, some visualisation quickly reveals local authorities in London have early and sharp waves, whereas those from the East Midlands are much later and flatter. This phase variation of the curves can be understood by registration of the curves, which provide encodings of the phase variation via an increasing bijection on [0,1], translating calendar time to progression through a typical wave. I have developed spatially aware methodologies (which take into account correlation between the observations to improve efficiency) that infer on these encodings (known as registration/warping functions). Moreover, it is clear in analysis that Euclidean distance between the local authorities is not necessarily the most appropriate metric to understand correlation between the curves; instead, a methodology has been developed to use instead driving distances which overcomes the theoretical issues with not using a Euclidean distance. Finally, is it indeed the case that London is different to outside of London in terms of the shape or phase variation of the incidence trajectories? To answer this question, I developed a nonparametric and functional wombling methodology to analyse the boundary, which outperforms an established Bayesian approach both in terms of assumptions required, accuracy and computational cost. And the answer is that indeed there is a notable boundary around London unlike other regional boundaries in the UK.
Now consider house sale prices since 1995 in the United Kingdom. These form a marked point process: they are point processes indexed by both space (via address) and time (date of sale), marked by the price of the house. How have house prices changed over time at different locations? I have considered this question in two ways. First, these events can be thought of as draws from a latent random distribution at each point in space and time. By correlation across space and time, this distribution can be estimated at each point, for which a robust methodology has been developed. Then, how that distribution changes at each spatial location can be analysed, for example by functional regression techniques with splines of time as the covariate, while noting that distributions do not belong to a vector space due to special constraints. This allows us to identify features of distributional changes, for example identifying gentrification not as a simplistic increase in house prices, but more specifically a removal of cheaper housing stock from the market; in general, more sophisticated understandings of house price changes can be inferred. Second, the house prices can be seen as noisy measurements of a mean house price process over time, and functions of house prices over time at each location can be inferred, again taking into account spatial correlation structures. Then, what has influenced the trajectory of house prices in differing spatial locations can be considered. For example, beyond the standard upward trend (save during the Great Financial Crisis), how have local factors such as school Ofsted reports impacted house prices. This can also be done via functional regression, although it is made interesting by the fact that the covariate is a factor (Ofsted reports provide categorical evaluations of schools' performances), and there is a temporal direction (an Ofsted report cannot affect house prices in the past).
Rowing
When I’m not immersed in research, my main focus is rowing. I learnt to row in 2018 as an undergraduate fresher at the University of Cambridge and have not stopped since. I became Captain at my College Boat Club, and now hold a position on the CUCBC, the organisation that regulates collegiate rowing on the Cam, as well as runs the famous Bumps races. Indeed, I personally publish the Bumps programmes.