Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 03-16-2009, 07:41 AM   #1
Wolfy
Confirmed User
 
Wolfy's Avatar
 
Industry Role:
Join Date: Dec 2003
Location: Wisconsin
Posts: 3,574
Need an email scraper

I need to contact a shitload of people - (read: CONTACT, not spam) - who all happen to be members of a site that lists their email addresses and phone numbers. Is there a program that will search the site and collect the email addresses for me?
Wolfy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 07:42 AM   #2
The Duck
Adult Content Provider
 
The Duck's Avatar
 
Industry Role:
Join Date: May 2005
Location: Europe
Posts: 18,243
so you are going to write personal messages to each and every single one of them? If not, its spam
__________________
Skype Horusmaia
ICQ 41555245
Email [email protected]
The Duck is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 07:46 AM   #3
TeenCat
Too lazy to set a koala
 
TeenCat's Avatar
 
Industry Role:
Join Date: Jan 2007
Location: CZ/EU forever!
Posts: 16,139
if its your site you have access to the database and thats easy, if its not your site, you are going to spam no matter how you call it
__________________

6bot
/ Coming again very soon!
Svit Zlin Radio 24/7!
TeenCat is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 08:00 AM   #4
Wolfy
Confirmed User
 
Wolfy's Avatar
 
Industry Role:
Join Date: Dec 2003
Location: Wisconsin
Posts: 3,574
I'm not here for your opinion. I'm here to find a solution to my need.

If anyone knows of a program, feel free to email me - wolfyman at gmail.
Wolfy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 08:47 AM   #5
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 08:50 AM   #6
Antonio
Too lazy to set a custom title
 
Antonio's Avatar
 
Join Date: Oct 2001
Location: Spartaaaaaaaaa
Posts: 14,136
use httrack to download the site (grab html only) then run an email extractor on the html files, I'm sure that there are better ways, google email extractor and see what you come up with
Antonio is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 09:12 AM   #7
Wolfy
Confirmed User
 
Wolfy's Avatar
 
Industry Role:
Join Date: Dec 2003
Location: Wisconsin
Posts: 3,574
Quote:
Originally Posted by Killswitch View Post
^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$
Is that a search query? Where would I use that?

Google: "site:example.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$" returned nothing, what am I missing?

Quote:
Originally Posted by Antonio View Post
use httrack to download the site (grab html only) then run an email extractor on the html files, I'm sure that there are better ways, google email extractor and see what you come up with
That's a start, thanks
Wolfy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 10:03 AM   #8
applebee
Confirmed User
 
Join Date: Aug 2006
Location: Online (duh)
Posts: 167
LOL. That's a regular expression. Why don't you just hire some cheap coder on Rent-a-Coder and have him write you a script which does it? Probably cost like $50 max.
__________________
HeatSeek: The iTunes of Porn !
$30 PPS, conversion ratios 10x better than content sites.
applebee is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 10:10 AM   #9
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,304
do you have access to the sql db? if so it would be only a few lines of code
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 10:52 AM   #10
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Quote:
Originally Posted by Wolfy View Post
Is that a search query? Where would I use that?

Google: "site:example.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$" returned nothing, what am I missing?



That's a start, thanks
Regex to match email address.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 12:03 PM   #11
Wolfy
Confirmed User
 
Wolfy's Avatar
 
Industry Role:
Join Date: Dec 2003
Location: Wisconsin
Posts: 3,574
Killswitch, it doesn't appear to work using it in google - how would I use that "regex"?

Quote:
Your search - site:gfy.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$ - did not match any documents.
Wolfy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 12:11 PM   #12
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Quote:
Originally Posted by Wolfy View Post
Killswitch, it doesn't appear to work using it in google - how would I use that "regex"?
http://www.php.net/preg_match_all
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 12:13 PM   #13
Libertine
sex dwarf
 
Libertine's Avatar
 
Join Date: May 2002
Posts: 17,860
Quote:
Originally Posted by Wolfy View Post
Is that a search query? Where would I use that?

Google: "site:example.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$" returned nothing, what am I missing?
It's a regular expression. Used to identify patterns - in this case, email addresses.
__________________
/(bb|[^b]{2})/
Libertine is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 06:26 PM   #14
Wolfy
Confirmed User
 
Wolfy's Avatar
 
Industry Role:
Join Date: Dec 2003
Location: Wisconsin
Posts: 3,574
I've retrieved all the email addresses on the site, thanks to a cool mofo that I'll happily give credit to if he wants it.

Quote:
Originally Posted by Libertine View Post
It's a regular expression. Used to identify patterns - in this case, email addresses.
Ok, I get that part.

For the sake of knowledge only, since my mission has been accomplished - how do I apply that regex expression to a task like this? I don't have control of the site, so any php pages I would build would be outside of the domain and would not have access to any databases.

What am I missing?
Wolfy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 06:35 PM   #15
raven1083
Confirmed User
 
raven1083's Avatar
 
Join Date: Jul 2007
Posts: 7,687
i am sure you can buy it from internet classifieds.
raven1083 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 06:47 PM   #16
papill0n
Unregistered Abuser
 
Industry Role:
Join Date: Oct 2007
Posts: 15,547
if what your doing is not spam then I don't know what is
papill0n is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 06:50 PM   #17
GrouchyAdmin
Now choke yourself!
 
GrouchyAdmin's Avatar
 
Industry Role:
Join Date: Apr 2006
Posts: 12,085
I still prefer this regex.
__________________
GrouchyAdmin is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 07:19 PM   #18
SmokeyTheBear
►SouthOfHeaven
 
SmokeyTheBear's Avatar
 
Join Date: Jun 2004
Location: PlanetEarth MyBoardRank: GerbilMaster My-Penis-Size: extralarge MyWeapon: Computer
Posts: 28,609
Quote:
Originally Posted by RageCash-Ben View Post
if what your doing is not spam then I don't know what is

i would have to see the site , but if you put your email on a public site , arent you basiclly asking for unsolicited mail ?

Kinda like putting your # on a bathroom stall then saying every pervert who contacts you is "spamming"

the only exception to the rule i would think would be if its implied on the site the mail is to be used for a specific purpose, or its a whois info that isn't posted by the user
__________________
hatisblack at yahoo.com
SmokeyTheBear is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 07:23 PM   #19
Ozarkz
So Fucking Banned
 
Join Date: Jan 2009
Posts: 2,377
I need a poop scraper.
Ozarkz is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 07:26 PM   #20
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
were you able to sort the first site out?
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-16-2009, 07:47 PM   #21
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Quote:
Originally Posted by Wolfy View Post
I've retrieved all the email addresses on the site, thanks to a cool mofo that I'll happily give credit to if he wants it.



Ok, I get that part.

For the sake of knowledge only, since my mission has been accomplished - how do I apply that regex expression to a task like this? I don't have control of the site, so any php pages I would build would be outside of the domain and would not have access to any databases.

What am I missing?
First you use php to grab the source code of the page, then use the regex to browse that code to strip out the email addresses.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-17-2009, 12:30 AM   #22
Libertine
sex dwarf
 
Libertine's Avatar
 
Join Date: May 2002
Posts: 17,860
Quote:
Originally Posted by Killswitch View Post
First you use php to grab the source code of the page, then use the regex to browse that code to strip out the email addresses.
To elaborate on what he said, what you'd normallly do is something along the lines of setting up a script that takes an url as input, downloads whatever is at that url (usually the homepage), strips out all links using a regexp and saves those, strips out all email addresses using another regexp and saves those too. Then, it uses the links found to determine new pages to repeat the process with - only those on the same domain if you're just getting all the email addresses on that site, or all if you just want to keep finding new email addresses on new sites forever.

Personally, I'd go for another language than php for this, but really, it can be done in pretty much any programming language.

Set a bot like that loose on a big directory, and you'll eventually build up a list of millions of email addresses. Of course, others do the same thing as well, so the email addresses won't exactly be fresh.

Keep in mind that site owners might have email harvester traps, which generate a list of random invalid email addresses and generate dynamic links to themselves as well, ensuring that if your harvester bot isn't protected from them, it will keep getting new invalid email addresses from them forever.
__________________
/(bb|[^b]{2})/
Libertine is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-17-2009, 12:50 AM   #23
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
Perl allows you to be one lazy ass coder. Jump on CPAN and install the Net::Scan::Extract module

Code:
use Net::Scan::Extract qw( :all );

my @emails = Extract_Email($source);
print "$_\n" for @emails;
fetch the source code of the page in question, then scan it using that code. done and done.
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.