GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   A program that can extract data from a list of URLs? (https://gfy.com/showthread.php?t=558200)

More Booze 12-30-2005 03:23 AM

A program that can extract data from a list of URLs?
 
Anyone know a program that can extract data from a list of URLs? I want to be able to set where in the document it should start to grab data. Example:

Start from: <a href="
Stop at: ">

Is there any program that can do that?

:upsidedow

u-Bob 12-30-2005 05:05 AM

perl is your friend.

tgpmakers 12-30-2005 05:44 AM

Ya I was gonna say Perl as well check out m{href\=\"(.*?)\"} or the module tokenparser. Have fun!!!

Adultnet 12-30-2005 05:55 AM

Regex is your friend..:)

MGPspots 12-30-2005 06:02 AM

Quote:

Originally Posted by Adultnet
Regex is your friend..:)

yep, doesn't matter what language you use. Your looking for an implementation of Regex (regular expressions). Regex can be tricky at first but must common needs can just be googled.

More Booze 12-30-2005 06:08 AM

I could program something in visual basic... but I know there have to be some applications out there already for this. (and im also lazy)

Doc911 12-30-2005 06:41 AM

perl was designed to do exactly that. Php is pretty good for doing it as well.

sarettah 12-30-2005 09:48 AM

Here is a way to do it without regex:

<?php

// buffer is a variable to hold the data we are working on
$buffer='';

// set vars for the beginning of what we want to parse and the end of what we want to parse
$begin_pattern='<a href="';
$end_pattern='">';

// set up var for data being extracted. This could be an array or string to write to a file whatever
// here I am just using it to echo the data extracted
$dataout='';

// set file2read to point at the path and file that the list is stored in
$file2read='testfile.txt';

// open the file
$filein=fopen('testfile.txt','r');

// suck the entire file into a variable
while (!feof($filein)){
$buffer=$buffer . fgets($filein);
}

// close the file
fclose($filein);

// check to make sure we got something out of the file
if ($buffer>''){

// do this while any occurences of the beginning pattern are still in the data
while( substr_count(strtolower($buffer),$begin_pattern)>0 ){
// trim the data to just past the next beginning pattern occurence
$buffer=substr($buffer, strpos(strtolower($buffer),$begin_pattern)+strlen( $begin_pattern));

// pull the data in from where we trimmed the data to the occurence of the next end pattern
$dataout=substr($buffer,0,strpos($buffer,$end_patt ern));

// trim the buffer by the length of the data we pulled
$buffer=substr($buffer,strlen($dataout));

// output the data we pulled - could go into an array here or write it to a file whatever
echo $dataout . '<br>';
}
}

?>

takes a file that looks like this:

<a href="testurl1.com">crapcrapcrap<a href="testurl2.com">morecrapmorecrap<a href="testurl3.com">yesevenmore<a href="testurl4.com">awholelottacrap<a href="testurl5.com"><a href="testurl6.com"><a href="testurl7.com"><a href="testurl8.com"><a href="testurl9.com"><a href="testurl10.com"><a href="testurl11.com"><a href="testurl12.com"><a href="testurl13.com"><a href="testurl14.com"><a href="testurl15.com"><a href="testurl16.com"><a href="testurl17.com"><a href="testurl18.com"><a href="testurl19.com"><a href="testurl20.com"><a href="testurl21.com"><a href="testurl22.com"><a href="testurl23.com"><a href="testurl24.com"><a href="testurl25.com">

and outputs it like this:

testurl1.com
testurl2.com
testurl3.com
testurl4.com
testurl5.com
testurl6.com
testurl7.com
testurl8.com
testurl9.com
testurl10.com
testurl11.com
testurl12.com
testurl13.com
testurl14.com
testurl15.com
testurl16.com
testurl17.com
testurl18.com
testurl19.com
testurl20.com
testurl21.com
testurl22.com
testurl23.com
testurl24.com
testurl25.com

woj 12-30-2005 10:58 AM

if you want a custom solution, icq: 33375924

More Booze 12-30-2005 11:27 AM

Wow, thank you VERY much sarettah! =))

sarettah 12-30-2005 11:41 AM

Quote:

Originally Posted by More Booze
Wow, thank you VERY much sarettah! =))


You're welcome.

However, looking back I have a code error in there from when I was making it "friendly"

where I have:
// open the file
$filein=fopen('testfile.txt','r');


Change it to:

// open the file
$filein=fopen($file2read,'r');


:)

BigCashCrew 12-30-2005 11:43 AM

hey sweet :) i'm going to play with this too


All times are GMT -7. The time now is 10:45 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123