[ MANIFEST ]

// 12 entries · spanning 2023 / 2026

2026 2 entries
// 2026.07.20 DATA[NEW]
When the translator starts solving the problem
An evaluation of task execution failures, chunking, and translation quality when translating long reasoning traces.
- data curation
- evaluation
// 2026.06.01 POST-TRAINING
Anatomy of verl, the RL post-training framework I lived in
2025 3 entries
// 2025.09.23 PYTHON
Install flash-attn without crying while using uv
- uv
// 2025.07.19 PYTHON
A story of using langchain/langgraph
// 2025.05.02 TRANSFORMERS
Python Project Management and Packaging: PEP 751 update and some of the remaining issues of packaging
2024 6 entries
// 2024.11.20 PYTHON
A Comprehensive Guide to Python Project Management and Packaging: Concepts Illustrated with uv - Part II
// 2024.11.06 PYTHON
A Comprehensive Guide to Python Project Management and Packaging: Concepts Illustrated with uv - Part I
// 2024.10.30 PYTHON
Deploying a Streamlit app on AWS EC2 (with your own domain name)
// 2024.06.25 TRANSFORMERS
Position Information in Transformer-Based Models: Exploring the main Methods and Approaches
This article explains the main position encoding methods and how they went about making them: - Learnable absolue PE and sinusoidal - Relative PEs: T5, ALiBi, FIRE - Both: RoPE - and no position encoding
// 2024.06.03 TRANSFORMERS
Sparse Transformers
This article delves deep into the Sparse Transformers as introduced in the paper "Generating Long Sequences with Sparse Transformers". The main points of interest are the explanation of the motivation and intuition behind the sparse factorizations, their theory as well as complexity proofs.
// 2024.05.30 TRANSFORMERS
Decoder-only Language Models Architecture Evolution (Part I)
This is the first of a series of articles on the evolution of LLM architectures. This first article dives deep in the first three GPT models.
2023 1 entry
// 2023.12.24 TRANSFORMERS
Transformers: Attention Is All You Need
Explore the Transformer architecture as presented in the paper 'Attention Is All You Need' by Vaswani et al. (2017). This article offers detailed code implementations and mathematical insights for each component, providing a comprehensive understanding of the model.