Archive.org Scraping with Python

スクレイピング

2025.05.31 2025.05.30

A python script that creates a .csv of direct URLs, lets you trim contents and automates downloading one file at a time, all while working in the background and not acting too sus.

github link: https://github.com/retrobuiltRyan/Archive.org-Web-scraping-with-Python/tree/main
Archive.org

@ChristopherCampNYC より:

2025年5月30日 1:43 PM

I simply used a Chrome extension. Way fewer clicks, no CSV, no Python environment.

返信
@Micro_Repairs より:

2025年5月30日 6:23 PM

Is there any reason not to use a BitTorrent client to do this? I don’t know how well seeded those files are, and whether the downloaded speed is high enough for your use. But at least it avoids having to restart partial downloads and can verify the downloaded files.

返信
@HitchensTV より:

2025年5月30日 7:51 PM

Great tool, but I would really like to see the source core when visiting the repo, and not just a .zip with it packaged D:

Makes me hesitate grabbing it

返信