Same as everyone else, curl + php or wget from the command line...
However the other day I was scraping an ebook archive. I used curl + php to scrape the html pages and get the actual pdf urls. Then in the next stage I downloaded all the pdfs using a shell script. However I found some hosts would issue a 206 error (partial content) when using wget. I ended up having to use curl from the command line to get the goods.
|