Foundations of Ai

Foundations of AI

https://allenai.org/foundations-of-ai (opens in a new tab)

Understanding LLM training data

Large text corpora are the backbone of today’s language models, but we have a limited understanding of the content of these datasets. Our research in this area seeks to uncover important facts about these popular corpora including general statistics, quality, social factors, and potential contamination by evaluation data.

What's In My Big Data? (opens in a new tab)


Retrieval-augmented generation

Retrieval-augmented generation (RAG) improves the responses of large language models by allowing them to access an additional authoritative knowledge source outside of their training data. Our work in this field offers ways to make RAG more scalable, performant, and more respectful of data concerns like copyright.


Human-AI interaction

To realize the full promise of AI, it is critical to design user interfaces that support effective collaboration with human users. This research area explores a variety of novel interfaces for humans that maximize the helpfulness of AI when engaging with scientific literature, supporting better access, rapid and deep interactive information gathering, and more.


Theoretical insights about LLMs

While language models are somewhat opaque, it is possible to apply theoretical techniques to analyze their intrinsic capabilities and limitations, yielding important fundamental insights about such systems.


Intelligent language agents

As well as answering questions, language models can also act as intelligent agents, interacting autonomously with an external environment to perform complex tasks. Our research focuses on having such agents plan and learn in these environments in order to rapidly improve their performance.


Systematic reasoning with language

While language models are innately good at question-answering, our research has developed new methods for enabling them to reason systematically and to arrive at conclusions in a sound and explainable way.