io-bound-data-processing

Community

Processing, transforming, or moving datasets that may exceed RAM on a single low-compute box — covers memory discipline (streaming, generators, dtype shrinkage), I/O access patterns (sequential vs random, mmap, async), data formats (Parquet vs CSV vs JSON, predicate pushdown), chunking & batching, spill-to-disk (external merge sort, DuckDB/Polars), pipelining (bounded queues, backpressure, checkpointing), codec selection (zstd/lz4/gzip), concurrency for I/O-bound workloads (asyncio, threads, prefetch), and observability (iowait vs CPU%, rows/sec, py-spy/strace). Trigger on "process a large file", "stream this", "out-of-core", "OOM kill", "this is slow", or code with `pd.read_csv` of multi-GB files, `requests.get(...).content` on big bodies, `BytesIO` on unbounded inputs, per-row INSERTs, sequential `requests.get` loops, falling `tqdm` rates — even if I/O or memory isn't mentioned. Complement to computer-science-algorithms.

Claude

147 stars Updated 1 months ago

Allowed Tools

This skill does not declare a tool allowlist. The agent host applies whatever default tools are available at runtime.

Source

SKILL.md / Manifest

https://raw.githubusercontent.com/pproenca/dot-skills/master/skills/.experimental/io-bound-data-processing/SKILL.md

Registry

github (via claudemarketplaces.com)

Trust Score

53Fair

Verification10/30

io-bound-data-processing

Allowed Tools

Source

Trust Score

Details