Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 04-10-2014, 07:34 PM   #1
johnnyloadproductions
Account Shutdown
 
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
Forget mining bitcoin or any other crypto currency, mine GFY, with a Raspberry Pi?!?!?!?!

Over time I collected the ?Fucking Around and Discussion? Forum on here, raw scraps with PHP cURL that amounted to over 150GB of text, granted every raw scraped included analytics code and other redundant code for a persons browser, still a lot of text, more than Paul Markham could ever hope to type.

I?ve had a Pi computer for almost 2 weeks now, love it; also have a Beaglebone Black.
I set out to make good use of my Pi (however impractical it may sound) to mine text. This little computer below only hums along at about 3 Watts, excluding peripherals, it can be overclocked from 700 MHz to 1 GHz.

I originally was going to either use Perl or Python to mine the data, but found that I was running into to many issues to make it go smoothly. I set up preliminaries and debugging on my macbook air and either the Air or Pi would have issues running frameworks that relied on too many dependencies.
I was planning on using the Python framework scrapy and had it installed and working on the Pi, but scrapy only likes to pull urls and I couldn?t make an alias that URL?ed to an external drive localhost, either on Apache or Ngnix.

I returned to PHP, doink. Everything plugs and plays so well with the LAMP(HP) stack you just can?t go wrong with it. Used the simple HTML Dom Parser and have gotten underway to parsing, does everything I need. Unfortunately the Pi is painfully slow, millions of threads do add up. Luckily I can just set it up a desk, temporarily plug a keyboard and monitor into it and then do a command line argument execution with php and it?ll just run indefinitely (assuming no errors are thrown). I SSH into the thing but it?ll stop as soon as I terminate the session. ONLY 3 WATTS OF POWER!

http://simplehtmldom.sourceforge.net/

It?s probably just a matter of storing each post according to ID, with author and compressing the message body into a simple MySQL database; then just executing the right query lookup.
Then presenting data, at least initial queries in a clean format, preferably with a good library like: http://d3js.org/



Any ideas of things to mine for other than forum activity over the years outright?
Mark my words, I will get this done as a side project, as I really like playing around with this thing and there is opportunity here.

I?ve already ordered 3 more pi?s and will be building a Hadoop cluster for learning and taking it from there.


Many ideas already in the mix.
How about a Java client that allows you to follow certain phrases, follow users, when someone has quoted you, even trolls would find value in that.

Building a backend that used elastic search? Running multiple queries at once.

There?s a lot of noise out there but big data technology can be used to filter it, that's why I'm learning it. I see a lot of potential for doing work for midlevel business and other people out there.


Thoughts? Surely you have some, or at least something awful to say
johnnyloadproductions is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 04-10-2014, 07:41 PM   #2
iwantchixx
Too lazy to set a custom title
 
iwantchixx's Avatar
 
Industry Role:
Join Date: Oct 2002
Location: The Boonies
Posts: 12,860
Not sure if users or especially forum owners would like the board being scraped...
iwantchixx is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 04-10-2014, 07:43 PM   #3
iwantchixx
Too lazy to set a custom title
 
iwantchixx's Avatar
 
Industry Role:
Join Date: Oct 2002
Location: The Boonies
Posts: 12,860
Still though, that 3watt soak is very impressive.
iwantchixx is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 04-10-2014, 07:46 PM   #4
johnnyloadproductions
Account Shutdown
 
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
Quote:
Originally Posted by iwantchixx View Post
Not sure if users or especially forum owners would like the board being scraped...
Tell that to hahahahahahahahahahahaha or thereisnomoneyinporn.com (when it was around).

I've already talked to a couple board owners that are interested, boards that aren't, well they loose an edge.

I'd also be interested in providing exclusive data to owners that goes well beyond what simple javascript code on each page can do.

edit: (b*oard*tr*acker) hahahahahahahahahahahaha? WTF?
johnnyloadproductions is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.