Analysis of Relational Data

Broadly speaking, relational data are observations and outcomes as measured between two individual units: people, schools, countries, and so forth -- a field that includes (binary) social network analysis as a subgroup. I focus on methods for evaluating and predicting relations based on individual and relational characteristics. In particular, I use hierarchical/multilevel modelling and tools provided by Bayesian computational statistics to shed new light on old methods and models.

Projects and Applications:

  • Marginally Specified Hierarchical Models for Relational Data, with Joseph Blitzstein. This paper is an attempt to reconcile many separate methods for modelling networks, both binary and valued in nature, by considering all interaction pairs to be independent, conditional on additional data. This includes latent variables for individuals, latent geometries, and the addition of predictors/covariates. The method is demonstrated on simple cases, including so-called small-world and scale-free systems, before expanding to links with non-binary outcomes.

  • Collaborations between U.S. Senators through Joint Press Releases, with Justin Grimmer. Senators identify working relationships by coauthoring press releases on various topics with other senators. After establishing implicit coauthorship in some press releases, and noting explicit coauthorship in others, we model the relationships according to two factors: how likely they are to form a relationship, and the number of co-releases (possibly zero) they produce.

  • Ohmic Circuit Interpretations of Network Distance and Centrality. I use the analogy of electrical resistors to propose a measure of social conductance (or, inverse social distance) that can account for parallel paths and valued ties. This is then expanded to show measures of closeness and betweenness that better reflect the totality of social connection than shortest-path metrics, as well as to evaluate the effective strength of a tie in a social context. This approach has been considered for estimating a type of betweenness centrality in other works; the treatment I give is considerably more extensive.

  • The manual for the ElectroGraph package for R, which implements the methods in the previous paper. The package itself is available on CRAN.

  • The Thresholding Problem: Uncertainties Due To Dichotomization of Valued Ties, with Joseph Blitzstein. Because most tools in network analysis take binary ties for their input, a common technique is to dichotomize the data according to their positions with respect to a fixed threshold value. However, there are two issues to consider: how said fixed value should be chosen, and how the results of the analysis depend on the choice of threshold. I demonstrate consequences of each of these problems with respect to a set of commonly used generative network models -- namely, that there is considerably more loss of efficiency compared to the original undichotomized data than in standard linear modelling.

  • The Effect of Censoring Out-Degree On Network Inferences, with Joseph Blitzstein. If the measurement of a network is hampered by the censoring of outdegree -- such as the case where a person is asked to "name their best friend", there are severe consequences for the measurement of network effects, let alone the autocorrelation of a person's own characteristics through time.

  • Hierarchical Models for Time-Dependent Networks (working paper). The integrated method of hierarchical modelling has its advantages, namely its expandability. In particular, I demonstrate probit-based models for network tie evolution.

  • My dissertation, defended August 20, 2009.

Content copyright (c) 2009, Andrew C. Thomas.