log

Public archive of articles in my readlist.

Database School/databaseschool.com

Database School is the best place to learn, understand, optimize, and confidently use databases.

Understanding Attention in LLMs/bartoszmilewski.co…

There are many excellent AI papers and tutorials that explain the attention pattern in Large Language Models. But this essentially simple pattern is often obscured by implementation details and optimizations. In this post I will try to cut to the essentials. In a nutshell, the…

How Free AI Video Upscaler uses WebGPU + WebCodecs to serve 250k MAU with zero server costs  |  web.dev/web.dev

Learn how Free AI Video Upscaler uses WebGPU and WebCodecs to serve 250k MAU with zero server costs.

Quantization from the ground up | ngrok blog/Sam Rose

A complete guide to what quantization is, how it works, and how it's used to compress large language models

by Sam Rose
WebCode: Search Evals for Coding Agents/Exa Labs

How do you measure retrieval quality for the increasingly complex ecosystem coding agents consume from?

by Exa Labs
How prompt caching works - Paged Attention and Automatic Prefix Caching plus practical tips/sankalp.bearblog.d…

A deep dive into prompt caching - practical tips to improve cache hits and how vLLM's paged attention enables KV-cache reuse across requests via automatic prefix-caching

Building a High-Performance Distributed Search System for 10,000 QPS./Devanshusharma

Building a High-Performance Distributed Search System for 10,000 QPS. During my internship days at cal.com, I discovered a real passion for distributed systems, databases, and learning Golang. The …

by Devanshusharma
Boris Tane/boristane.com

Coding agents didn't make poor engineers dangerous. They made them unstoppable.