Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar Mark Forums Read
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 03-23-2016, 11:18 AM   #1
johnnyloadproductions
Account Shutdown
 
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
:stoned 4chan /b/, 2 days, 56 GB, 121830 media files.

.jpeg, .png, .gif, .webm

I'm starting to realize the power of well thought out bots.

Before anyone gets on me about being liberal with scraping and content, I thought it would be awesome to hash images in multiple formats and create a database on these hashes, then find out what images are popular (as they would appear multiple times) and see if that was a sponsor of some kind.
Google works this way with their image search but this could be useful for affiliates and/or programs.

Meaning? perhaps a good idea to promote them?

This is my bot source, crowd sourced, way of telling what's hot without anyone directly telling me. Weeeeeeeeeee!!!!!!

Simple script, It's brute force which means if the thread is seen again it will overwrite all the files again, it's probably pulled 200GB of files. Will modify in the future.

Cool stuff, you are happy that I shared!!!!!

Other idea is setting up something where if there were an image and you can't quite figure out the rest of the series, well search the hash and it'd pull up a thread from that search that just may have those images.

FUCKING AWESOME IDEAS!!!!

johnnyloadproductions is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 02:20 PM   #2
gnawledge
confirmed loser
 
gnawledge's Avatar
 
Industry Role:
Join Date: Jul 2012
Location: Florida
Posts: 1,092
Lapd.mobi seems to scrape 4chan threads. I searched an image and it came up with it.
gnawledge is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 02:35 PM   #3
clickity click
So Fecking Bummed
 
Industry Role:
Join Date: Aug 2014
Posts: 3,682
Thought you were going to share the bot.
Like your stuff though, intelligent.
clickity click is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 02:36 PM   #4
xXXtesy10
Fakecoin Investor
 
xXXtesy10's Avatar
 
Industry Role:
Join Date: Jul 2012
Location: New Delhi, IN
Posts: 7,127
Quote:
Originally Posted by johnnyloadproductions View Post
.jpeg, .png, .gif, .webm

I'm starting to realize the power of well thought out bots.

Before anyone gets on me about being liberal with scraping and content, I thought it would be awesome to hash images in multiple formats and create a database on these hashes, then find out what images are popular (as they would appear multiple times) and see if that was a sponsor of some kind.
Google works this way with their image search but this could be useful for affiliates and/or programs.

Meaning? perhaps a good idea to promote them?

This is my bot source, crowd sourced, way of telling what's hot without anyone directly telling me. Weeeeeeeeeee!!!!!!

Simple script, It's brute force which means if the thread is seen again it will overwrite all the files again, it's probably pulled 200GB of files. Will modify in the future.

Cool stuff, you are happy that I shared!!!!!

Other idea is setting up something where if there were an image and you can't quite figure out the rest of the series, well search the hash and it'd pull up a thread from that search that just may have those images.

FUCKING AWESOME IDEAS!!!!

that's fucking awesome bro! now how much cp you got on your pc?
__________________
WARNING: Stay Away From Marlboroack aka aka Brandon Ackerman
https://gfy.com/21169705-post8.html
Donny Long is Felon, Stalker, Scammer & Coward
http://www.ripoffreport.com/reports/...lon-int-761244
xXXtesy10 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 03:08 PM   #5
johnnyloadproductions
Account Shutdown
 
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
Quote:
Originally Posted by clickity click View Post
Thought you were going to share the bot.
Here. You can run this on digital ocean or any PC on your house, you'll just have to have Python installed and run a cron job.

I have a cronjob setup to run every 10 minutes and it parses the front page threads and then follows and scrapes the media in those threads.

Setup a cronjob:
crontab -e
or
sudo crontab -e

then at the bottom of the file input:
*/10 * * * * python /home/4chan/main.py

Code:
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h  dom mon dow   command
0 0 */2 * * python /home/craigslist/Rasp1/final_craigslist.py
*/10 * * * * python /home/4chan/main.py


Code:
from bs4 import BeautifulSoup
import urllib
import urllib2
import os

import time

start = time.time()

def get_threads(url, board, dir):


    response = urllib2.urlopen(url)
    soup = BeautifulSoup(response.read(), "lxml")

    links = soup.find_all("a", attrs={"class": "replylink"})

    for link in links:
        link_string = link['href']
        thread_id = link_string.split('/')
        print thread_id[1]
        thread_url = "http://boards.4chan.org/" + dir + "/thread/" + thread_id[1]
        thread_response = urllib2.urlopen(thread_url)

        image_urls = BeautifulSoup(thread_response.read(), "lxml")
        images = image_urls.find_all("a", attrs={"class": "fileThumb"})
        # Chage this to the path directory you want to save it to. This was for a usb drive.
        directory = os.path.dirname("/media/4chan/" + thread_id[1])


        if not os.path.exists(directory + "/thread/" + thread_id[1]):
            os.makedirs(directory + "/thread/" + thread_id[1])
        for image in images:
            string = image['href']
            one = string.split('/b/')
            urllib.urlretrieve("http:" + image['href'], directory + "/thread/" + thread_id[1] + "/" + one[1])



prepend = ["boards",]
append = ['b',]

for dir in append:
    for board in prepend:
        print board
        url = "http://{}.4chan.org/{}".format(board, dir)
        print "This is the directory: " + dir
        get_threads(url, board, dir)


end = time.time()
print(end - start)
johnnyloadproductions is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 03:12 PM   #6
ITraffic
Confirmed User
 
Industry Role:
Join Date: Jul 2013
Posts: 2,725
that's a big order of cheese pizza.
ITraffic is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 03:23 PM   #7
clickity click
So Fecking Bummed
 
Industry Role:
Join Date: Aug 2014
Posts: 3,682
I have Py installed, I will try to get it working if I have time.
clickity click is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 11:03 PM   #8
OneHungLo
So Fucking Banned
 
Industry Role:
Join Date: May 2001
Location: Your mom's front hole
Posts: 40,906
Quote:
Originally Posted by xXXtesy10 View Post
that's fucking awesome bro! now how much cp you got on your pc?
Inb4 the cops are knocking on johnny's door...again.
OneHungLo is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 11:21 PM   #9
MrBottomTooth
Confirmed User
 
MrBottomTooth's Avatar
 
Join Date: Sep 2009
Posts: 5,795
No shit. That's the last place I'd be scraping images from.
MrBottomTooth is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-23-2016, 11:35 PM   #10
OneHungLo
So Fucking Banned
 
Industry Role:
Join Date: May 2001
Location: Your mom's front hole
Posts: 40,906
Quote:
Originally Posted by MrBottomTooth View Post
No shit. That's the last place I'd be scraping images from.
I wonder if Johnny was hangin at the local college library scraping /b/ @ 2am via a bought student id
OneHungLo is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-24-2016, 10:47 AM   #11
OneHungLo
So Fucking Banned
 
Industry Role:
Join Date: May 2001
Location: Your mom's front hole
Posts: 40,906
What happened to the bot?
OneHungLo is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-24-2016, 10:49 AM   #12
xXXtesy10
Fakecoin Investor
 
xXXtesy10's Avatar
 
Industry Role:
Join Date: Jul 2012
Location: New Delhi, IN
Posts: 7,127
Quote:
Originally Posted by MrBottomTooth View Post
No shit. That's the last place I'd be scraping images from.


the bro thing do now is put up paywall and charge for those... list every adult producer as 2257 page.
__________________
WARNING: Stay Away From Marlboroack aka aka Brandon Ackerman
https://gfy.com/21169705-post8.html
Donny Long is Felon, Stalker, Scammer & Coward
http://www.ripoffreport.com/reports/...lon-int-761244
xXXtesy10 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-24-2016, 10:51 AM   #13
bns666
Confirmed Fetishist
 
bns666's Avatar
 
Industry Role:
Join Date: Mar 2005
Location: Fetishland
Posts: 11,522
nice
__________________
CAM SODASTRIPCHAT
CHATURBATEX LOVE CAM
bns666 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks

Tags
images, search, files, awesome, hash, multiple, telling, idea, thread, image, cool, days, overwrite, future, pulled, modify, means, 200gb, brute, directly, hot, media, weeeeeeeeeee, simple, force
Thread Tools



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.