Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.
ATTRIB
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Mansi Sakarvadia, Arham Khan, Aswathy Ajith, and
5 more authors
2023
Accepted to Workshop on Attributing Model Behavior At Scale (ATTRIB) Workshop @ NeurIPS.
Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Much recent work has attempted to decode the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks, including by reverse-engineering the role of linear layers. Yet little is known about the role of attention heads in producing the final token prediction. We propose the Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized and specific roles in language models.
e-Science
Lazy Python Dependency Management in Large-Scale Systems
Alok Kamatar, Mansi Sakarvadia, Valerie Hayot-Sasson, and
2 more authors
In 2023 IEEE 19th International Conference on e-Science (e-Science), 2023
Python has become the language of choice for managing many scientific applications. However, when distributing a Python application, it is necessary that all application dependencies be distributed and available in the target execution environment. A specific consequence is that Python workflows suffer from slow scale out due to the time required to import dependencies. We describe ProxyImports, a method to package and distribute Python dependencies in a lazy fashion while remaining transparent and easy to use. Using ProxyImports, Python packages are loaded only once (eg, by a workflow head node) and are transferred asynchronously to compute nodes. We evaluate our implementation on the Perlmutter and Theta supercomputers and in an HPC cloud-bursting scenario. Our experiments show that ProxyImports significantly reduces the average time to import large modules across an HPC system and demonstrate that this method can be used easily to distribute user-packages to cloud resources. We conclude that ProxyImports improves application runtime, reduces contention on metadata servers and facilitates runtime portability of Python applications.
BDCAT
Trillion Parameter AI Serving Infrastructure for Scientific
Discovery: A Survey and Vision
Nathaniel Hudson, J. Gregory Pauloski, Matt Baughman, and
13 more authors
In IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT2023), 2023
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries.
As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters—such as
Huawei’s PanGu-Σ. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community.
We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
2020
MICCAI
Atypical Neonate Extra-axial CSF is Associated with Reduced Cognitive Development at Age 1 year (poster)
Mansi Sakarvadia, Rui Li, SunHyung Kim, and
6 more authors
Perinatal Preterm and Pediatric Image Analysis workshop at the Medical Image Computing and Computer Assisted Interventions conference, 2020
We aim to assess if enlarged extra-axial cerebrospinal fluid (EA-CSF) volume at neonatal age is associated with a child’s performance on the Mullen Scales of Early Learning (MSEL) at 12 and 24 months of age. 3T MRI scans were acquired from 651 infants at neonate age (20.8+/-8.9 postnatal days). EA-CSF and global tissue volumes were computed via a new tool called AutoEACSF1. The MSEL was administered to these infants at 12 and 24 months, measuring ability in gross motor and four domains that comprise an overall cognitive composite score: fine motor, visual reception, receptive language, expressive language. General linear models including intracranial cavity volume, gestational age at birth, maternal education and sex as covariate were employed. The subgroup of infants whose EA-CSF volumes measured in the top 5th percentile (i.e., 2 SDs above the mean; n=33) displayed significant negative correlations between elevated EA-CSF at neonatal age and expressive language (p=0.001) and cognitive composite scores (p=0.016) at 12 months. However, at 24 months of age, these associations were no longer significant. No significant associations were found for subjects with EACSF volumes below the top 10th percentile. This study finds that atypically high levels of EA-CSF volume shortly after birth are associated with lower expressive language and overall cognitive ability at 12 months of age. These results suggest that there may be a pathological threshold of high EA-CSF volume that could serve as an early biomarker of a child’s reduced cognitive ability at 12 months.