1 comments

  • vertix 2 hours ago

    Hi HN, I’m Sergey, the author of pos3.

    I built this because I hit a wall with ML pipelines where I needed to feed S3 data into libraries that only understand local paths (like OpenCV imread, pandas, or PyTorch), and I didn't want to rewrite all my I/O code to use boto3 or s3fs.

    Unlike s3fs which mounts S3 as a virtual filesystem (often slow for heavy random access), pos3 mirrors the specific data you need to a local cache before your code block runs. This means your script runs at native disk speed.

    It handles the diffing/syncing automatically using a context manager:

    ---

    import pos3, pandas as pd

    with pos3.mirror():

        dataset = pos3.download("s3://bucket/dataset")
        df = pd.read_csv(dataset / "data.csv")
    
        logs = pos3.upload("s3://bucket/output/", interval=30)
        df.to_csv(logs / "processed.csv")
    ---

    It's open source (Apache 2.0). I’d love to hear your feedback or if you've solved this differently!