View Single Post
Old 04-08-2017, 10:59 AM  
johnnyloadproductions
Account Shutdown
 
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
GFY parsed profile data dump (nothing sensitive, just for the curious), and how I did it.

There's no sigs or anything that can be spammed in the database or csv files in the download link. Simply for those that are curious, image of what to expect for sql and csv.

https://www.dropbox.com/sh/3shozxdxs...qRDg37Sda?dl=0



Over 250k rows in the uses_view.csv, might take a while to load.

Profiles aren't like general threads, I had to have a valid session to visit a persons url using my account. That means for the time being, just about every account before 2014 has a recent visitor listed as "johnnyloadproductions."

I used python and the selenium webdriver along with pyvirtualdisplay to use iceweasel on a Raspberry Pi to get the profile data. This ran for about a month in the background. A cronjob would fire up the script that would then direct iceweasel (basically firefox) to got to "profilexxxx.html".
GFY friendly urls but fortunately I can still bot through profile numbers in order and go through them all.

Writing the robust parser took several hours but ended up working pretty well. It's just python with beautifulsoup (a text parsing library), and pymysql to talk to a mysql db.
Took 8 hours to parse.

I'm pretty sure someone has done this in the past for a webmaster spam campaign.

Bots are nice and can simplify tasks for you, thought I would share this information with all of you.
In general it is nice if a service has an API but if they don't you can actually do something similar to this to: upload videos or images, make posts at scheduled times, bypass captchas, etc.
johnnyloadproductions is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote