Archive.org Scraping with Python

スクレイピング

A python script that creates a .csv of direct URLs, lets you trim contents and automates downloading one file at a time, all while working in the background and not acting too sus.

github link: https://github.com/retrobuiltRyan/Archive.org-Web-scraping-with-Python/tree/main
Archive.org

コメント

  1. @ChristopherCampNYC より:

    I simply used a Chrome extension. Way fewer clicks, no CSV, no Python environment.

  2. @Micro_Repairs より:

    Is there any reason not to use a BitTorrent client to do this? I don’t know how well seeded those files are, and whether the downloaded speed is high enough for your use. But at least it avoids having to restart partial downloads and can verify the downloaded files.

  3. @HitchensTV より:

    Great tool, but I would really like to see the source core when visiting the repo, and not just a .zip with it packaged D:

    Makes me hesitate grabbing it

タイトルとURLをコピーしました