io-bound-data-processing Guide
Processing, transforming, or moving datasets that may exceed RAM on a single low-compute box — covers memory discipline (streaming, generators, dtype shrinkage), I/O access patterns (sequential vs random, mmap, async), data formats (Parquet vs CSV vs JSON, predicate pushdown), chunking & batching, spill-to-disk (external merge sort, DuckDB/Polars), pipelining (bounded queues, backpressure, checkpointing), codec selection (zstd/lz4/gzip), concurrency for I/O-bound workloads (asyncio, threads, prefetch), and observability (iowait vs CPU%, rows/sec, py-spy/strace). Trigger on "process a large file", "stream this", "out-of-core", "OOM kill", "this is slow", or code with `pd.read_csv` of multi-GB files, `requests.get(...).content` on big bodies, `BytesIO` on unbounded inputs, per-row INSERTs, sequential `requests.get` loops, falling `tqdm` rates — even if I/O or memory isn't mentioned. Complement to computer-science-algorithms.
When to use io-bound-data-processing
Processing, transforming, or moving datasets that may exceed RAM on a single low-compute box — covers memory discipline (streaming, generators, dtype shrinkage), I/O access patterns (sequential vs random, mmap, async), data formats (Parquet vs CSV vs JSON, predicate pushdown), chunking & batching, spill-to-disk (external merge sort, DuckDB/Polars), pipelining (bounded queues, backpressure, checkpointing), codec selection (zstd/lz4/gzip), concurrency for I/O-bound workloads (asyncio, threads, prefetch), and observability (iowait vs CPU%, rows/sec, py-spy/strace). Trigger on "process a large file", "stream this", "out-of-core", "OOM kill", "this is slow", or code with `pd.read_csv` of multi-GB files, `requests.get(...).content` on big bodies, `BytesIO` on unbounded inputs, per-row INSERTs, sequential `requests.get` loops, falling `tqdm` rates — even if I/O or memory isn't mentioned. Complement to computer-science-algorithms.
How to use io-bound-data-processing
io-bound-data-processing is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.