![]() |
GFY parsed profile data dump (nothing sensitive, just for the curious), and how I did it.
There's no sigs or anything that can be spammed in the database or csv files in the download link. Simply for those that are curious, image of what to expect for sql and csv.
https://www.dropbox.com/sh/3shozxdxs...qRDg37Sda?dl=0 http://i.imgur.com/PehEayL.png Over 250k rows in the uses_view.csv, might take a while to load. Profiles aren't like general threads, I had to have a valid session to visit a persons url using my account. That means for the time being, just about every account before 2014 has a recent visitor listed as "johnnyloadproductions." I used python and the selenium webdriver along with pyvirtualdisplay to use iceweasel on a Raspberry Pi to get the profile data. This ran for about a month in the background. A cronjob would fire up the script that would then direct iceweasel (basically firefox) to got to "profilexxxx.html". GFY friendly urls but fortunately I can still bot through profile numbers in order and go through them all. Writing the robust parser took several hours but ended up working pretty well. It's just python with beautifulsoup (a text parsing library), and pymysql to talk to a mysql db. Took 8 hours to parse. I'm pretty sure someone has done this in the past for a webmaster spam campaign. Bots are nice and can simplify tasks for you, thought I would share this information with all of you. In general it is nice if a service has an API but if they don't you can actually do something similar to this to: upload videos or images, make posts at scheduled times, bypass captchas, etc. |
doubt they like people scraping the site and causing heavy loads ;)
|
what you are doing is a bit odd, all this drama so you can post your results on gfy and get 5 "interesting stats" replies? you are probably bored or trying to fine tune your skills, but why not crawl and parse something that has some value? like for example some tube site to discover most common keywords/niches/paysites/models/etc? that data would be 1000x more valuable, and you could actually score a few bucks by selling it or just using the data yourself...
|
Quote:
Those are all good suggestions, and something I can spin off or work with several people with. I'd be willing to work with people in the future in some kind of partnership for data and stat gathering. I like your posts woj, even if I troll you some. |
dump that shit into firebase, nobody wants to download a 6TB csv.
|
Quote:
|
interesting stats
|
All times are GMT -7. The time now is 02:26 PM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc