bc
← tags

tag

#pytorch

posts
projects
GenLMcompleted

Genomic language model pretrained from scratch on 13 Drosophila species genomes — 12-layer dilated CNN with a custom character-level tokenizer, trained to predict variant effects across the genome.

ChottaLLMcompleted

Small-scale LLM built from scratch in PyTorch — character-level tokenizer, transformer architecture, trained on custom text corpora.

GPT-style autoregressive language model implemented in PyTorch — full training loop, attention, positional encoding, and character-level generation.

Distributed Data Parallel training implemented from first principles in PyTorch — gradient synchronization, process groups, and multi-GPU scaling without using the DDP wrapper.

Repository with all of the companion notebooks that focused on diving into ML concepts and ideas from website posts.