GoFuckYourself.com - Adult Webmaster Forum - View Single Post

AutumnBH · 12-07-2012, 04:00 PM

Same as everyone else, curl + php or wget from the command line...

However the other day I was scraping an ebook archive. I used curl + php to scrape the html pages and get the actual pdf urls. Then in the next stage I downloaded all the pdfs using a shell script. However I found some hosts would issue a 206 error (partial content) when using wget. I ended up having to use curl from the command line to get the goods.

12-07-2012, 04:00 PM
AutumnBH Confirmed User Industry Role: Join Date: Oct 2012 Location: Spamville Posts: 294	Same as everyone else, curl + php or wget from the command line... However the other day I was scraping an ebook archive. I used curl + php to scrape the html pages and get the actual pdf urls. Then in the next stage I downloaded all the pdfs using a shell script. However I found some hosts would issue a 206 error (partial content) when using wget. I ended up having to use curl from the command line to get the goods.