![]() |
![]() |
![]() |
||||
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
![]() ![]() |
|
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. |
|
Thread Tools |
![]() |
#1 |
Account Shutdown
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
|
![]() .jpeg, .png, .gif, .webm
I'm starting to realize the power of well thought out bots. Before anyone gets on me about being liberal with scraping and content, I thought it would be awesome to hash images in multiple formats and create a database on these hashes, then find out what images are popular (as they would appear multiple times) and see if that was a sponsor of some kind. Google works this way with their image search but this could be useful for affiliates and/or programs. Meaning? perhaps a good idea to promote them? This is my bot source, crowd sourced, way of telling what's hot without anyone directly telling me. Weeeeeeeeeee!!!!!! Simple script, It's brute force which means if the thread is seen again it will overwrite all the files again, it's probably pulled 200GB of files. Will modify in the future. Cool stuff, you are happy that I shared!!!!! ![]() Other idea is setting up something where if there were an image and you can't quite figure out the rest of the series, well search the hash and it'd pull up a thread from that search that just may have those images. FUCKING AWESOME IDEAS!!!! ![]() |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#2 |
confirmed loser
Industry Role:
Join Date: Jul 2012
Location: Florida
Posts: 1,092
|
Lapd.mobi seems to scrape 4chan threads. I searched an image and it came up with it.
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#3 |
So Fecking Bummed
Industry Role:
Join Date: Aug 2014
Posts: 3,682
|
Thought you were going to share the bot.
Like your stuff though, intelligent. |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#4 | |
Fakecoin Investor
Industry Role:
Join Date: Jul 2012
Location: New Delhi, IN
Posts: 7,127
|
Quote:
__________________
WARNING: Stay Away From Marlboroack aka aka Brandon Ackerman
https://gfy.com/21169705-post8.html Donny Long is Felon, Stalker, Scammer & Coward http://www.ripoffreport.com/reports/...lon-int-761244 |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#5 |
Account Shutdown
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
|
Here. You can run this on digital ocean or any PC on your house, you'll just have to have Python installed and run a cron job.
I have a cronjob setup to run every 10 minutes and it parses the front page threads and then follows and scrapes the media in those threads. Setup a cronjob: crontab -e or sudo crontab -e then at the bottom of the file input: */10 * * * * python /home/4chan/main.py Code:
# Edit this file to introduce tasks to be run by cron. # # Each task to run has to be defined through a single line # indicating with different fields when the task will be run # and what command to run for the task # # To define the time you can provide concrete values for # minute (m), hour (h), day of month (dom), month (mon), # and day of week (dow) or use '*' in these fields (for 'any').# # Notice that tasks will be started based on the cron's system # daemon's notion of time and timezones. # # Output of the crontab jobs (including errors) is sent through # email to the user the crontab file belongs to (unless redirected). # # For example, you can run a backup of all your user accounts # at 5 a.m every week with: # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/ # # For more information see the manual pages of crontab(5) and cron(8) # # m h dom mon dow command 0 0 */2 * * python /home/craigslist/Rasp1/final_craigslist.py */10 * * * * python /home/4chan/main.py Code:
from bs4 import BeautifulSoup import urllib import urllib2 import os import time start = time.time() def get_threads(url, board, dir): response = urllib2.urlopen(url) soup = BeautifulSoup(response.read(), "lxml") links = soup.find_all("a", attrs={"class": "replylink"}) for link in links: link_string = link['href'] thread_id = link_string.split('/') print thread_id[1] thread_url = "http://boards.4chan.org/" + dir + "/thread/" + thread_id[1] thread_response = urllib2.urlopen(thread_url) image_urls = BeautifulSoup(thread_response.read(), "lxml") images = image_urls.find_all("a", attrs={"class": "fileThumb"}) # Chage this to the path directory you want to save it to. This was for a usb drive. directory = os.path.dirname("/media/4chan/" + thread_id[1]) if not os.path.exists(directory + "/thread/" + thread_id[1]): os.makedirs(directory + "/thread/" + thread_id[1]) for image in images: string = image['href'] one = string.split('/b/') urllib.urlretrieve("http:" + image['href'], directory + "/thread/" + thread_id[1] + "/" + one[1]) prepend = ["boards",] append = ['b',] for dir in append: for board in prepend: print board url = "http://{}.4chan.org/{}".format(board, dir) print "This is the directory: " + dir get_threads(url, board, dir) end = time.time() print(end - start) |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#6 |
Confirmed User
Industry Role:
Join Date: Jul 2013
Posts: 2,725
|
that's a big order of cheese pizza.
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#7 |
So Fecking Bummed
Industry Role:
Join Date: Aug 2014
Posts: 3,682
|
I have Py installed, I will try to get it working if I have time.
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#8 |
So Fucking Banned
Industry Role:
Join Date: May 2001
Location: Your mom's front hole
Posts: 40,906
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#9 |
Confirmed User
Join Date: Sep 2009
Posts: 5,795
|
No shit. That's the last place I'd be scraping images from.
__________________
Get Paid Per Email Like The WEGCASH Days!!!! |
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#10 |
So Fucking Banned
Industry Role:
Join Date: May 2001
Location: Your mom's front hole
Posts: 40,906
|
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#11 |
So Fucking Banned
Industry Role:
Join Date: May 2001
Location: Your mom's front hole
Posts: 40,906
|
What happened to the bot?
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#12 | |
Fakecoin Investor
Industry Role:
Join Date: Jul 2012
Location: New Delhi, IN
Posts: 7,127
|
Quote:
![]() ![]() ![]() ![]() the bro thing do now is put up paywall and charge for those... list every adult producer as 2257 page.
__________________
WARNING: Stay Away From Marlboroack aka aka Brandon Ackerman
https://gfy.com/21169705-post8.html Donny Long is Felon, Stalker, Scammer & Coward http://www.ripoffreport.com/reports/...lon-int-761244 |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
![]() |
#13 |
Confirmed Fetishist
Industry Role:
Join Date: Mar 2005
Location: Fetishland
Posts: 11,522
|
nice
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() |