[ MANIFEST · TRANSFORMERS ]

// 5 entries · in TRANSFORMERS

  1. 2025 1 entry
  2. Python Project Management and Packaging: PEP 751 update and some of the remaining issues of packaging

  3. 2024 3 entries
  4. Position Information in Transformer-Based Models: Exploring the main Methods and Approaches

    This article explains the main position encoding methods and how they went about making them: - Learnable absolue PE and sinusoidal - Relative PEs: T5, ALiBi, FIRE - Both: RoPE - and no position encoding

  5. Sparse Transformers

    This article delves deep into the Sparse Transformers as introduced in the paper "Generating Long Sequences with Sparse Transformers". The main points of interest are the explanation of the motivation and intuition behind the sparse factorizations, their theory as well as complexity proofs.

  6. Decoder-only Language Models Architecture Evolution (Part I)

    This is the first of a series of articles on the evolution of LLM architectures. This first article dives deep in the first three GPT models.

  7. 2023 1 entry
  8. Transformers: Attention Is All You Need

    Explore the Transformer architecture as presented in the paper 'Attention Is All You Need' by Vaswani et al. (2017). This article offers detailed code implementations and mathematical insights for each component, providing a comprehensive understanding of the model.