[ MANIFEST · TRANSFORMERS ]

// 5 entries · in TRANSFORMERS

2025 1 entry
// 2025.05.02 TRANSFORMERS
Python Project Management and Packaging: PEP 751 update and some of the remaining issues of packaging
2024 3 entries
// 2024.06.25 TRANSFORMERS
Position Information in Transformer-Based Models: Exploring the main Methods and Approaches
This article explains the main position encoding methods and how they went about making them: - Learnable absolue PE and sinusoidal - Relative PEs: T5, ALiBi, FIRE - Both: RoPE - and no position encoding
// 2024.06.03 TRANSFORMERS
Sparse Transformers
This article delves deep into the Sparse Transformers as introduced in the paper "Generating Long Sequences with Sparse Transformers". The main points of interest are the explanation of the motivation and intuition behind the sparse factorizations, their theory as well as complexity proofs.
// 2024.05.30 TRANSFORMERS
Decoder-only Language Models Architecture Evolution (Part I)
This is the first of a series of articles on the evolution of LLM architectures. This first article dives deep in the first three GPT models.
2023 1 entry
// 2023.12.24 TRANSFORMERS
Transformers: Attention Is All You Need
Explore the Transformer architecture as presented in the paper 'Attention Is All You Need' by Vaswani et al. (2017). This article offers detailed code implementations and mathematical insights for each component, providing a comprehensive understanding of the model.

Python Project Management and Packaging: PEP 751 update and some of the remaining issues of packaging