GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   Need an email scraper (https://gfy.com/showthread.php?t=893933)

Wolfy 03-16-2009 07:41 AM

Need an email scraper
 
I need to contact a shitload of people - (read: CONTACT, not spam) - who all happen to be members of a site that lists their email addresses and phone numbers. Is there a program that will search the site and collect the email addresses for me?

The Duck 03-16-2009 07:42 AM

so you are going to write personal messages to each and every single one of them? If not, its spam :)

TeenCat 03-16-2009 07:46 AM

if its your site you have access to the database and thats easy, if its not your site, you are going to spam no matter how you call it

Wolfy 03-16-2009 08:00 AM

I'm not here for your opinion. I'm here to find a solution to my need.

If anyone knows of a program, feel free to email me - wolfyman at gmail.

Killswitch - BANNED FOR LIFE 03-16-2009 08:47 AM

^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$

Antonio 03-16-2009 08:50 AM

use httrack to download the site (grab html only) then run an email extractor on the html files, I'm sure that there are better ways, google email extractor and see what you come up with

Wolfy 03-16-2009 09:12 AM

Quote:

Originally Posted by Killswitch (Post 15634995)
^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$

Is that a search query? Where would I use that?

Google: "site:example.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$" returned nothing, what am I missing?

Quote:

Originally Posted by Antonio (Post 15635002)
use httrack to download the site (grab html only) then run an email extractor on the html files, I'm sure that there are better ways, google email extractor and see what you come up with

That's a start, thanks :thumbsup

applebee 03-16-2009 10:03 AM

LOL. That's a regular expression. Why don't you just hire some cheap coder on Rent-a-Coder and have him write you a script which does it? Probably cost like $50 max.

fris 03-16-2009 10:10 AM

do you have access to the sql db? if so it would be only a few lines of code

Killswitch - BANNED FOR LIFE 03-16-2009 10:52 AM

Quote:

Originally Posted by Wolfy (Post 15635067)
Is that a search query? Where would I use that?

Google: "site:example.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$" returned nothing, what am I missing?



That's a start, thanks :thumbsup

Regex to match email address.

Wolfy 03-16-2009 12:03 PM

Killswitch, it doesn't appear to work using it in google - how would I use that "regex"?

Quote:

Your search - site:gfy.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$ - did not match any documents.

Killswitch - BANNED FOR LIFE 03-16-2009 12:11 PM

Quote:

Originally Posted by Wolfy (Post 15635736)
Killswitch, it doesn't appear to work using it in google - how would I use that "regex"?

http://www.php.net/preg_match_all

Libertine 03-16-2009 12:13 PM

Quote:

Originally Posted by Wolfy (Post 15635067)
Is that a search query? Where would I use that?

Google: "site:example.com ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$" returned nothing, what am I missing?

It's a regular expression. Used to identify patterns - in this case, email addresses.

Wolfy 03-16-2009 06:26 PM

I've retrieved all the email addresses on the site, thanks to a cool mofo that I'll happily give credit to if he wants it.

Quote:

Originally Posted by Libertine (Post 15635790)
It's a regular expression. Used to identify patterns - in this case, email addresses.

Ok, I get that part.

For the sake of knowledge only, since my mission has been accomplished - how do I apply that regex expression to a task like this? I don't have control of the site, so any php pages I would build would be outside of the domain and would not have access to any databases.

What am I missing?

raven1083 03-16-2009 06:35 PM

i am sure you can buy it from internet classifieds.

papill0n 03-16-2009 06:47 PM

if what your doing is not spam then I don't know what is :1orglaugh

GrouchyAdmin 03-16-2009 06:50 PM

I still prefer this regex.

SmokeyTheBear 03-16-2009 07:19 PM

Quote:

Originally Posted by RageCash-Ben (Post 15637644)
if what your doing is not spam then I don't know what is :1orglaugh


i would have to see the site , but if you put your email on a public site , arent you basiclly asking for unsolicited mail ?

Kinda like putting your # on a bathroom stall then saying every pervert who contacts you is "spamming"

the only exception to the rule i would think would be if its implied on the site the mail is to be used for a specific purpose, or its a whois info that isn't posted by the user

Ozarkz 03-16-2009 07:23 PM

I need a poop scraper.

Angry Jew Cat - Banned for Life 03-16-2009 07:26 PM

were you able to sort the first site out?

Killswitch - BANNED FOR LIFE 03-16-2009 07:47 PM

Quote:

Originally Posted by Wolfy (Post 15637590)
I've retrieved all the email addresses on the site, thanks to a cool mofo that I'll happily give credit to if he wants it.



Ok, I get that part.

For the sake of knowledge only, since my mission has been accomplished - how do I apply that regex expression to a task like this? I don't have control of the site, so any php pages I would build would be outside of the domain and would not have access to any databases.

What am I missing?

First you use php to grab the source code of the page, then use the regex to browse that code to strip out the email addresses.

Libertine 03-17-2009 12:30 AM

Quote:

Originally Posted by Killswitch (Post 15637784)
First you use php to grab the source code of the page, then use the regex to browse that code to strip out the email addresses.

To elaborate on what he said, what you'd normallly do is something along the lines of setting up a script that takes an url as input, downloads whatever is at that url (usually the homepage), strips out all links using a regexp and saves those, strips out all email addresses using another regexp and saves those too. Then, it uses the links found to determine new pages to repeat the process with - only those on the same domain if you're just getting all the email addresses on that site, or all if you just want to keep finding new email addresses on new sites forever.

Personally, I'd go for another language than php for this, but really, it can be done in pretty much any programming language.

Set a bot like that loose on a big directory, and you'll eventually build up a list of millions of email addresses. Of course, others do the same thing as well, so the email addresses won't exactly be fresh.

Keep in mind that site owners might have email harvester traps, which generate a list of random invalid email addresses and generate dynamic links to themselves as well, ensuring that if your harvester bot isn't protected from them, it will keep getting new invalid email addresses from them forever.

Angry Jew Cat - Banned for Life 03-17-2009 12:50 AM

Perl allows you to be one lazy ass coder. Jump on CPAN and install the Net::Scan::Extract module

Code:

use Net::Scan::Extract qw( :all );

my @emails = Extract_Email($source);
print "$_\n" for @emails;

fetch the source code of the page in question, then scan it using that code. done and done.


All times are GMT -7. The time now is 08:11 PM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123