Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 12-30-2005, 03:23 AM   #1
More Booze
Confirmed User
 
Join Date: Mar 2004
Posts: 5,116
A program that can extract data from a list of URLs?

Anyone know a program that can extract data from a list of URLs? I want to be able to set where in the document it should start to grab data. Example:

Start from: <a href="
Stop at: ">

Is there any program that can do that?

More Booze is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 05:05 AM   #2
u-Bob
there's no $$$ in porn
 
u-Bob's Avatar
 
Industry Role:
Join Date: Jul 2005
Location: icq: 195./568.-230 (btw: not getting offline msgs)
Posts: 33,063
perl is your friend.
u-Bob is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 05:44 AM   #3
tgpmakers
Confirmed User
 
Join Date: Feb 2004
Location: United Kingdom
Posts: 575
Ya I was gonna say Perl as well check out m{href\=\"(.*?)\"} or the module tokenparser. Have fun!!!
__________________
http://www.tgpmakers.com/
tgpmakers is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 05:55 AM   #4
Adultnet
Confirmed User
 
Join Date: Sep 2003
Posts: 8,713
Regex is your friend..
__________________


TrafficCashGold Paying Webmasters Since 1996!

Awesome Conversions! Fast Weekly Payments! Over 125 Tours!
Adultnet is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 06:02 AM   #5
MGPspots
Confirmed User
 
Join Date: Jan 2005
Posts: 422
Quote:
Originally Posted by Adultnet
Regex is your friend..
yep, doesn't matter what language you use. Your looking for an implementation of Regex (regular expressions). Regex can be tricky at first but must common needs can just be googled.
__________________
SIG TOO BIG! Maximum 120x60 button and no more than 3 text lines of DEFAULT SIZE and COLOR. Unless your sig is for a GFY top banner sponsor, you may use a 624x80 instead of a 120x60. Let me repeat... A 120 x 60 button and no more that 3 lines of DEFAULT SIZE AND COLOR text.
MGPspots is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 06:08 AM   #6
More Booze
Confirmed User
 
Join Date: Mar 2004
Posts: 5,116
I could program something in visual basic... but I know there have to be some applications out there already for this. (and im also lazy)
More Booze is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 06:41 AM   #7
Doc911
Confirmed User
 
Join Date: Feb 2004
Location: If i was up your ass you'd know
Posts: 3,695
perl was designed to do exactly that. Php is pretty good for doing it as well.
__________________


For PHP/MySQL scripts ICQ 161480555 or email [email protected]
Doc911 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 09:48 AM   #8
sarettah
see you later, I'm gone
 
Industry Role:
Join Date: Oct 2002
Posts: 14,111
Here is a way to do it without regex:

<?php

// buffer is a variable to hold the data we are working on
$buffer='';

// set vars for the beginning of what we want to parse and the end of what we want to parse
$begin_pattern='<a href="';
$end_pattern='">';

// set up var for data being extracted. This could be an array or string to write to a file whatever
// here I am just using it to echo the data extracted
$dataout='';

// set file2read to point at the path and file that the list is stored in
$file2read='testfile.txt';

// open the file
$filein=fopen('testfile.txt','r');

// suck the entire file into a variable
while (!feof($filein)){
$buffer=$buffer . fgets($filein);
}

// close the file
fclose($filein);

// check to make sure we got something out of the file
if ($buffer>''){

// do this while any occurences of the beginning pattern are still in the data
while( substr_count(strtolower($buffer),$begin_pattern)>0 ){
// trim the data to just past the next beginning pattern occurence
$buffer=substr($buffer, strpos(strtolower($buffer),$begin_pattern)+strlen( $begin_pattern));

// pull the data in from where we trimmed the data to the occurence of the next end pattern
$dataout=substr($buffer,0,strpos($buffer,$end_patt ern));

// trim the buffer by the length of the data we pulled
$buffer=substr($buffer,strlen($dataout));

// output the data we pulled - could go into an array here or write it to a file whatever
echo $dataout . '<br>';
}
}

?>

takes a file that looks like this:

<a href="testurl1.com">crapcrapcrap<a href="testurl2.com">morecrapmorecrap<a href="testurl3.com">yesevenmore<a href="testurl4.com">awholelottacrap<a href="testurl5.com"><a href="testurl6.com"><a href="testurl7.com"><a href="testurl8.com"><a href="testurl9.com"><a href="testurl10.com"><a href="testurl11.com"><a href="testurl12.com"><a href="testurl13.com"><a href="testurl14.com"><a href="testurl15.com"><a href="testurl16.com"><a href="testurl17.com"><a href="testurl18.com"><a href="testurl19.com"><a href="testurl20.com"><a href="testurl21.com"><a href="testurl22.com"><a href="testurl23.com"><a href="testurl24.com"><a href="testurl25.com">

and outputs it like this:

testurl1.com
testurl2.com
testurl3.com
testurl4.com
testurl5.com
testurl6.com
testurl7.com
testurl8.com
testurl9.com
testurl10.com
testurl11.com
testurl12.com
testurl13.com
testurl14.com
testurl15.com
testurl16.com
testurl17.com
testurl18.com
testurl19.com
testurl20.com
testurl21.com
testurl22.com
testurl23.com
testurl24.com
testurl25.com
__________________
All cookies cleared!
sarettah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 10:58 AM   #9
woj
<&(©¿©)&>
 
woj's Avatar
 
Industry Role:
Join Date: Jul 2002
Location: Chicago
Posts: 47,882
if you want a custom solution, icq: 33375924
__________________
Custom Software Development, email: woj#at#wojfun#.#com to discuss details or skype: wojl2000 or gchat: wojfun or telegram: wojl2000
Affiliate program tools: Hosted Galleries Manager Banner Manager Video Manager
Wordpress Affiliate Plugin Pic/Movie of the Day Fansign Generator Zip Manager
woj is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 11:27 AM   #10
More Booze
Confirmed User
 
Join Date: Mar 2004
Posts: 5,116
Wow, thank you VERY much sarettah! =))
More Booze is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 11:41 AM   #11
sarettah
see you later, I'm gone
 
Industry Role:
Join Date: Oct 2002
Posts: 14,111
Quote:
Originally Posted by More Booze
Wow, thank you VERY much sarettah! =))

You're welcome.

However, looking back I have a code error in there from when I was making it "friendly"

where I have:
// open the file
$filein=fopen('testfile.txt','r');


Change it to:

// open the file
$filein=fopen($file2read,'r');


__________________
All cookies cleared!
sarettah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-30-2005, 11:43 AM   #12
BigCashCrew
Registered User
 
Join Date: Aug 2005
Posts: 3,570
hey sweet i'm going to play with this too
BigCashCrew is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.