![]() |
A program that can extract data from a list of URLs?
Anyone know a program that can extract data from a list of URLs? I want to be able to set where in the document it should start to grab data. Example:
Start from: <a href=" Stop at: "> Is there any program that can do that? :upsidedow |
perl is your friend.
|
Ya I was gonna say Perl as well check out m{href\=\"(.*?)\"} or the module tokenparser. Have fun!!!
|
Regex is your friend..:)
|
Quote:
|
I could program something in visual basic... but I know there have to be some applications out there already for this. (and im also lazy)
|
perl was designed to do exactly that. Php is pretty good for doing it as well.
|
Here is a way to do it without regex:
<?php // buffer is a variable to hold the data we are working on $buffer=''; // set vars for the beginning of what we want to parse and the end of what we want to parse $begin_pattern='<a href="'; $end_pattern='">'; // set up var for data being extracted. This could be an array or string to write to a file whatever // here I am just using it to echo the data extracted $dataout=''; // set file2read to point at the path and file that the list is stored in $file2read='testfile.txt'; // open the file $filein=fopen('testfile.txt','r'); // suck the entire file into a variable while (!feof($filein)){ $buffer=$buffer . fgets($filein); } // close the file fclose($filein); // check to make sure we got something out of the file if ($buffer>''){ // do this while any occurences of the beginning pattern are still in the data while( substr_count(strtolower($buffer),$begin_pattern)>0 ){ // trim the data to just past the next beginning pattern occurence $buffer=substr($buffer, strpos(strtolower($buffer),$begin_pattern)+strlen( $begin_pattern)); // pull the data in from where we trimmed the data to the occurence of the next end pattern $dataout=substr($buffer,0,strpos($buffer,$end_patt ern)); // trim the buffer by the length of the data we pulled $buffer=substr($buffer,strlen($dataout)); // output the data we pulled - could go into an array here or write it to a file whatever echo $dataout . '<br>'; } } ?> takes a file that looks like this: <a href="testurl1.com">crapcrapcrap<a href="testurl2.com">morecrapmorecrap<a href="testurl3.com">yesevenmore<a href="testurl4.com">awholelottacrap<a href="testurl5.com"><a href="testurl6.com"><a href="testurl7.com"><a href="testurl8.com"><a href="testurl9.com"><a href="testurl10.com"><a href="testurl11.com"><a href="testurl12.com"><a href="testurl13.com"><a href="testurl14.com"><a href="testurl15.com"><a href="testurl16.com"><a href="testurl17.com"><a href="testurl18.com"><a href="testurl19.com"><a href="testurl20.com"><a href="testurl21.com"><a href="testurl22.com"><a href="testurl23.com"><a href="testurl24.com"><a href="testurl25.com"> and outputs it like this: testurl1.com testurl2.com testurl3.com testurl4.com testurl5.com testurl6.com testurl7.com testurl8.com testurl9.com testurl10.com testurl11.com testurl12.com testurl13.com testurl14.com testurl15.com testurl16.com testurl17.com testurl18.com testurl19.com testurl20.com testurl21.com testurl22.com testurl23.com testurl24.com testurl25.com |
if you want a custom solution, icq: 33375924
|
Wow, thank you VERY much sarettah! =))
|
Quote:
You're welcome. However, looking back I have a code error in there from when I was making it "friendly" where I have: // open the file $filein=fopen('testfile.txt','r'); Change it to: // open the file $filein=fopen($file2read,'r'); :) |
hey sweet :) i'm going to play with this too
|
All times are GMT -7. The time now is 10:45 AM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123