Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar Mark Forums Read
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 07-31-2009, 08:20 PM   #1
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
What the fuck is wrong with my regex?

Code:
|<a\s[^>]*href\s*=\s*(\"??)'.$pData['profile_url'].'\\1[^>]*>'.$aData['link_backlink'].'<\/a>
Pretty much what it's doing is being populated by the array's to find a link in the source code of a page that has atleast the href tag for the url, and the anchor, any other attributes it ignores, but for some reason it's just not working correctly.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-31-2009, 08:46 PM   #2
calmlikeabomb
Confirmed User
 
calmlikeabomb's Avatar
 
Join Date: May 2004
Location: SW Palm Bay, Florida
Posts: 1,323
Try this, assuming I understand the question

Code:
<?php

	eregi('href="([^"]+)"[^>]*>([^<]+)', $page->source, $page->links);
	print_r($page->links);

?>
__________________
subarus.

Last edited by calmlikeabomb; 07-31-2009 at 08:49 PM..
calmlikeabomb is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 07-31-2009, 11:00 PM   #3
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
That doesn't work.

Pretty much it's looking for <a href="http://somesite.com">something</a> sounds easy, but it will also return true if theres other attributes in the a tag also.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 05:05 AM   #4
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,232
Code:
<?

$link = '<a href="http://www.google.com" id="external">google</a>';
preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);
print_r($matches[2]);

?>
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 11:30 AM   #5
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,232
bump for kill switch
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 12:15 PM   #6
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Quote:
Originally Posted by fris View Post
Code:
<?

$link = '<a href="http://www.google.com" id="external">google</a>';
preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);
print_r($matches[2]);

?>
That's the same thing I have except that looks for all links, not ones with the url and anchor specified by the array.

Thanks anyway..

Bump.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 12:18 PM   #7
CyberHustler
So Fucking Banned
 
Industry Role:
Join Date: Feb 2006
Posts: 25,214
Bump
CyberHustler is online now   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 12:23 PM   #8
nation-x
Confirmed User
 
nation-x's Avatar
 
Industry Role:
Join Date: Mar 2004
Location: Rock Hill, SC
Posts: 5,370
Here is a way to do it without regex
Code:
$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$baseNodes = $xpath->evaluate("//base/@href");
if ($baseNodes->length == 1) {
	$baseUrl = rtrim($baseNodes->item(0)->nodeValue, '/');
}

$hrefs = $xpath->evaluate("//a");
nation-x is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 12:25 PM   #9
KillerK
Confirmed User
 
Join Date: May 2008
Posts: 3,406
PHP Code:
$link '<a href="http://www.google.com">google</a>';
preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/'$link$matches);
print_r($matches);

should print

Array
(
    [
0] => a href="http://www.google.com">google</a>
    [
1] => http://www.google.com
    
[2] => google

That help at all?
KillerK is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 12:32 PM   #10
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Quote:
Originally Posted by KillerK View Post
PHP Code:
$link '<a href="http://www.google.com">google</a>';
preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/'$link$matches);
print_r($matches);

should print

Array
(
    [
0] => a href="http://www.google.com">google</a>
    [
1] => http://www.google.com
    
[2] => google

That help at all?
Yeah works perfect if it's just <a href="http://www.google.com">google</a> and not if it was <a rel="nofollow" href="http://www.google.com" title="google">google</a>

What I need it to do is find any a tag, with the specified href and anchor, ignores other attributes but returns true if the a tag has both the anchor and href
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 01:31 PM   #11
ProG
Confirmed User
 
Join Date: Apr 2009
Posts: 1,319
I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:
$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';

$uri = 'www.bing.com';
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );
__________________
History will be kind to me for I intend to write it.
ProG is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 02:07 PM   #12
nation-x
Confirmed User
 
nation-x's Avatar
 
Industry Role:
Join Date: Mar 2004
Location: Rock Hill, SC
Posts: 5,370
Quote:
Originally Posted by ProG View Post
I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:
$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';

$uri = 'www.bing.com';
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );
That throws an error for me.
Quote:
Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '/'
This is what I came up with and it works... but it uses a loop.

Code:
<?php
$url = 'http://www.crazyfilth.com';
$anchor_text = 'Crazy Porn';
$html = file_get_contents('http://www.filthdump.com');

echo checkUrl($url, $anchor_text, $html);

function checkUrl($url, $anchor_text, $html) {
	$found = false;
	$dom = new domDocument(); 
	@$dom->loadHTML($html); 
	$anchors = $dom->getElementsByTagName('a'); 
	foreach ($anchors as $anchor) { 
		 $found_url = $anchor->getAttribute('href'); 
		 $urltext = trim($anchor->nodeValue);
		 if (($found_url == $url) && ($anchor_text == $urltext))  {
				return true;
		 }
	}
	return false;
}
?>

Last edited by nation-x; 08-01-2009 at 02:10 PM..
nation-x is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 02:11 PM   #13
ProG
Confirmed User
 
Join Date: Apr 2009
Posts: 1,319
Quote:
Originally Posted by nation-x View Post
That throws an error for me.[/code]
I don't get that error but it would mean that a / was specified in the $uri

You could either put the http:// part in the regexp like I did, or replace all / with \/
__________________
History will be kind to me for I intend to write it.
ProG is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 02:11 PM   #14
who
So Fucking Banned
 
Join Date: Aug 2003
Location: ICQ #23642053
Posts: 19,593
echo "I love google"; //for extra PR
who is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 02:19 PM   #15
nation-x
Confirmed User
 
nation-x's Avatar
 
Industry Role:
Join Date: Mar 2004
Location: Rock Hill, SC
Posts: 5,370
Quote:
Originally Posted by ProG View Post
I don't get that error but it would mean that a / was specified in the $uri

You could either put the http:// part in the regexp like I did, or replace all / with \/
aaah... yeah I see.
nation-x is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 02:39 PM   #16
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Quote:
Originally Posted by ProG View Post
I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:
$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';

$uri = 'www.bing.com';
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );
I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 03:02 PM   #17
ProG
Confirmed User
 
Join Date: Apr 2009
Posts: 1,319
Quote:
Originally Posted by Killswitch View Post
I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.
Hm.. this works for me, did you double escape so that both \/ show?

Code:
$uri = 'http://www.bing.com/';
$uri = str_replace( '/', '\\/', $uri );
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)({$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );
__________________
History will be kind to me for I intend to write it.
ProG is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 03:04 PM   #18
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Thanks for all the help, nation-x got me going with his function and it works perfectly.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 03:09 PM   #19
nation-x
Confirmed User
 
nation-x's Avatar
 
Industry Role:
Join Date: Mar 2004
Location: Rock Hill, SC
Posts: 5,370
Here is the final version for anyone that might need something similar... we found that there was a small issue with urls that had an ending slash... fixed.
Code:
<?php
$url = 'http://www.crazyfilth.com';
$anchor_text = 'Porn Videos';
$html = file_get_contents('http://aisle69.com/');

echo checkUrl($url, $anchor_text, $html);

function checkUrl($url, $anchor_text, $html) {
	$found = false;
	$dom = new domDocument(); 
	@$dom->loadHTML($html); 
	$anchors = $dom->getElementsByTagName('a'); 
	foreach ($anchors as $anchor) { 
		 $found_url = preg_replace('{/$}', '', $anchor->getAttribute('href'));
		 $urltext = trim($anchor->nodeValue);
		 if (($found_url == $url) && ($anchor_text == $urltext))  {
				return true;
		 }
	}
	return false;
}
?>
nation-x is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 03:10 PM   #20
nation-x
Confirmed User
 
nation-x's Avatar
 
Industry Role:
Join Date: Mar 2004
Location: Rock Hill, SC
Posts: 5,370
stupid gfy board :P double post
nation-x is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 08-01-2009, 03:15 PM   #21
ProG
Confirmed User
 
Join Date: Apr 2009
Posts: 1,319
It's a good function for sure I only see one issue..

If the link has any extra info it isn't going to match. For example:

Code:
$url = 'http://www.crazyfilth.com/';
$anchor_text = 'Porn Videos';
$html = '<a href="http://www.crazyfilth.com/?PHPSESSID=777" id="extra">Porn Videos</a>';
__________________
History will be kind to me for I intend to write it.
ProG is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-25-2009, 11:02 AM   #22
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,232
Quote:
Originally Posted by Killswitch View Post
Yeah works perfect if it's just <a href="http://www.google.com">google</a> and not if it was <a rel="nofollow" href="http://www.google.com" title="google">google</a>

What I need it to do is find any a tag, with the specified href and anchor, ignores other attributes but returns true if the a tag has both the anchor and href
I know this is an old post but was doing something like this recently.

Code:
<?php

$content = file_get_contents('test.html');

$regex = "/<a.*? href=(\"|')(.*?)(\"|').*?>(.*?)<\/a>/i";

if (preg_match_all($regex,$content,$matches,PREG_SET_ORDER)) {
    foreach ($matches as $match) {
        // echo $match[0]; // full link including href
        // echo $match[1]; // type of opening quote
        // echo $match[2]; // url
        // echo $match[3]; // type of closing quote
        // echo $match[4]; // link text
    }
}

?>
example urls that will work

Quote:
<a href="http://www.google.com" rel="external">google</a>
<a href='http://www.live.com' id="#links">links</a><br/><p></p>
<a class="links" href="http://www.google.com">google! google!</a>
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-25-2009, 11:10 AM   #23
fatfoo
ICQ:649699063
 
Industry Role:
Join Date: Mar 2003
Posts: 27,763
This is too complicated for me.

Merry Christmas

Happy New Year
__________________
Send me an email: [email protected]
fatfoo is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-25-2009, 12:39 PM   #24
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Nice info Fris, Merry Christmas.
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-25-2009, 12:52 PM   #25
calmlikeabomb
Confirmed User
 
calmlikeabomb's Avatar
 
Join Date: May 2004
Location: SW Palm Bay, Florida
Posts: 1,323
Good call on the follow up. I've since started using SIMPLE HTML DOM it's a PHP class that uses jQuery style selectors.

http://simplehtmldom.sourceforge.net/

So the original solution to this thread for accessing all anchor tag "hrefs" can be accomplished like this:

Code:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find & print all link hrefs
foreach($html->find('a') as $element) echo $element->href . '<br>';
It can also be done using an OO style. See the docs for more info. This is a sexy class. Merry XMAS!
__________________
subarus.
calmlikeabomb is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 12-25-2009, 12:56 PM   #26
Killswitch - BANNED FOR LIFE
Guest
 
Posts: n/a
Awesome reply Levi, Merry Christmas to you too man!
  Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks
Thread Tools



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.