LLM
Training

LLM Training

Test Hugging Face Transformers for ML tasks: training, fine-tuning, etc.

if it works in a stable and predictable manner, then:

Use Hugging Face Transformers by default for ML tasks: training, fine-tuning, etc.

Hugging Face Transformers

huggingface/transformers (opens in a new tab)

hf.co/transformers (opens in a new tab)

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

These models can be applied on:

📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages. 🖼️ Images, for tasks like image classification, object detection, and segmentation. 🗣️ Audio, for tasks like speech recognition and audio classification. Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.

composer

https://github.com/mosaicml/composer (opens in a new tab)

Composer is an open-source deep learning training library by MosaicML. Built on top of PyTorch, the Composer library makes it easier to implement distributed training workflows on large-scale clusters.

We built Composer to be optimized for scalability and usability, integrating best practices for efficient, multi-node training. By abstracting away low-level complexities like parallelism techniques, distributed data loading, and memory optimization, you can focus on training modern ML models and running experiments without slowing down.

We recommend using Composer to speedup your experimentation workflow if you’re training neural networks of any size, including:

Large Language Models (LLMs) Diffusion models Embedding models (e.g. BERT) Transformer-based models Convolutional Neural Networks (CNNs)

https://docs.mosaicml.com/en/latest/ (opens in a new tab)

gpt-neox

https://github.com/EleutherAI/gpt-neox (opens in a new tab)

This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. This library is in widespread use in academic, industry, and government labs, including by researchers at Oak Ridge National Lab, CarperAI, Stability AI, Together.ai, Korea University, Carnegie Mellon University, and the University of Tokyo among others. Uniquely among similar libraries GPT-NeoX supports a wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.

If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.


Allen AI suite

https://allenai.org/olmo (opens in a new tab)


Pretrain models on custom datasets

  • nanotron (opens in a new tab)

    • Minimalistic large language model 3D-parallelism training
    • Nanotron is a library for pretraining transformer models. It provides a simple and flexible API to pretrain models on custom datasets. Nanotron is designed to be easy to use, fast, and scalable. It is built with the following principles in mind:
      • Simplicity: Nanotron is designed to be easy to use. It provides a simple and flexible API to pretrain models on custom datasets.
      • Performance: Optimized for speed and scalability, Nanotron uses the latest techniques to train models faster and more efficiently.
  • trlx (opens in a new tab)

    • trlX: A Framework for Large Scale Reinforcement Learning from Human Feedback (opens in a new tab) paper
    • A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
    • trlX is a distributed training framework designed from the ground up to focus on fine-tuning large language models with reinforcement learning using either a provided reward function or a reward-labeled dataset.
    • 🧀 CHEESE (opens in a new tab) Collect human annotations for your RL application with our human-in-the-loop data collection library.
      • Used for adaptive human in the loop evaluation of language and embedding models.