04-10-2014, 07:34 PM
|
|
Account Shutdown
Industry Role:
Join Date: Oct 2008
Location: Gone
Posts: 3,611
|
Forget mining bitcoin or any other crypto currency, mine GFY, with a Raspberry Pi?!?!?!?!
Over time I collected the ?Fucking Around and Discussion? Forum on here, raw scraps with PHP cURL that amounted to over 150GB of text, granted every raw scraped included analytics code and other redundant code for a persons browser, still a lot of text, more than Paul Markham could ever hope to type. 
I?ve had a Pi computer for almost 2 weeks now, love it; also have a Beaglebone Black.
I set out to make good use of my Pi (however impractical it may sound) to mine text. This little computer below only hums along at about 3 Watts, excluding peripherals, it can be overclocked from 700 MHz to 1 GHz.

I originally was going to either use Perl or Python to mine the data, but found that I was running into to many issues to make it go smoothly. I set up preliminaries and debugging on my macbook air and either the Air or Pi would have issues running frameworks that relied on too many dependencies.
I was planning on using the Python framework scrapy and had it installed and working on the Pi, but scrapy only likes to pull urls and I couldn?t make an alias that URL?ed to an external drive localhost, either on Apache or Ngnix.
I returned to PHP, doink. Everything plugs and plays so well with the LAMP(HP) stack you just can?t go wrong with it. Used the simple HTML Dom Parser and have gotten underway to parsing, does everything I need. Unfortunately the Pi is painfully slow, millions of threads do add up. Luckily I can just set it up a desk, temporarily plug a keyboard and monitor into it and then do a command line argument execution with php and it?ll just run indefinitely (assuming no errors are thrown). I SSH into the thing but it?ll stop as soon as I terminate the session. ONLY 3 WATTS OF POWER!
http://simplehtmldom.sourceforge.net/
It?s probably just a matter of storing each post according to ID, with author and compressing the message body into a simple MySQL database; then just executing the right query lookup.
Then presenting data, at least initial queries in a clean format, preferably with a good library like: http://d3js.org/

Any ideas of things to mine for other than forum activity over the years outright?
Mark my words, I will get this done as a side project, as I really like playing around with this thing and there is opportunity here.
I?ve already ordered 3 more pi?s and will be building a Hadoop cluster for learning and taking it from there.
Many ideas already in the mix.
How about a Java client that allows you to follow certain phrases, follow users, when someone has quoted you, even trolls would find value in that.
Building a backend that used elastic search? Running multiple queries at once.
There?s a lot of noise out there but big data technology can be used to filter it, that's why I'm learning it. I see a lot of potential for doing work for midlevel business and other people out there.
Thoughts? Surely you have some, or at least something awful to say 
|
|
|