projects
GenLM↗completed
Genomic language model pretrained from scratch on 13 Drosophila species genomes — 12-layer dilated CNN with a custom character-level tokenizer, trained to predict variant effects across the genome.
ChottaLLM↗completed
Small-scale LLM built from scratch in PyTorch — character-level tokenizer, transformer architecture, trained on custom text corpora.
GPT from Scratch↗completed
GPT-style autoregressive language model implemented in PyTorch — full training loop, attention, positional encoding, and character-level generation.
DDP from Scratch↗completed
Distributed Data Parallel training implemented from first principles in PyTorch — gradient synchronization, process groups, and multi-GPU scaling without using the DDP wrapper.