GoFuckYourself.com - Adult Webmaster Forum - What the fuck is wrong with my regex?

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)

- Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)

- - What the fuck is wrong with my regex? (https://gfy.com/showthread.php?t=919195)

Killswitch - BANNED FOR LIFE

07-31-2009 08:20 PM

What the fuck is wrong with my regex?

Code:

|<a\s[^>]*href\s*=\s*(\"??)'.$pData['profile_url'].'\\1[^>]*>'.$aData['link_backlink'].'<\/a>

Pretty much what it's doing is being populated by the array's to find a link in the source code of a page that has atleast the href tag for the url, and the anchor, any other attributes it ignores, but for some reason it's just not working correctly.

calmlikeabomb

07-31-2009 08:46 PM

Try this, assuming I understand the question :)

Code:

<?php



        eregi('href="([^"]+)"[^>]*>([^<]+)', $page->source, $page->links);

        print_r($page->links);



?>

Killswitch - BANNED FOR LIFE

07-31-2009 11:00 PM

That doesn't work.

Pretty much it's looking for <a href="http://somesite.com">something</a> sounds easy, but it will also return true if theres other attributes in the a tag also.

fris	08-01-2009 05:05 AM

Code:

<?



$link = '<a href="http://www.google.com" id="external">google</a>';

preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);

print_r($matches[2]);



?>

fris	08-01-2009 11:30 AM

bump for kill switch

Killswitch - BANNED FOR LIFE

08-01-2009 12:15 PM

Quote:

Originally Posted by fris (Post 16132202)

Code:

<?



$link = '<a href="http://www.google.com" id="external">google</a>';

preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);

print_r($matches[2]);



?>

That's the same thing I have except that looks for all links, not ones with the url and anchor specified by the array.

Thanks anyway..

Bump.

CyberHustler

08-01-2009 12:18 PM

:thumbsup:thumbsup Bump

nation-x

08-01-2009 12:23 PM

Here is a way to do it without regex

Code:

$dom = new DOMDocument();

@$dom->loadHTML($html);



$xpath = new DOMXPath($dom);



$baseNodes = $xpath->evaluate("//base/@href");

if ($baseNodes->length == 1) {

        $baseUrl = rtrim($baseNodes->item(0)->nodeValue, '/');

}



$hrefs = $xpath->evaluate("//a");

KillerK

08-01-2009 12:25 PM

PHP Code:


		
			
$link = '<a href="http://www.google.com">google</a>';

preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/', $link, $matches);

print_r($matches);



should print



Array

(

    [0] => a href="http://www.google.com">google</a>

    [1] => http://www.google.com

    [2] => google

)

That help at all?

Killswitch - BANNED FOR LIFE

08-01-2009 12:32 PM

Quote:

Originally Posted by KillerK (Post 16133102)

PHP Code:


		
			
$link = '<a href="http://www.google.com">google</a>';

preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/', $link, $matches);

print_r($matches);



should print



Array

(

    [0] => a href="http://www.google.com">google</a>

    [1] => http://www.google.com

    [2] => google

)

That help at all?

Yeah works perfect if it's just <a href="http://www.google.com">google</a> and not if it was <a rel="nofollow" href="http://www.google.com" title="google">google</a>

What I need it to do is find any a tag, with the specified href and anchor, ignores other attributes but returns true if the a tag has both the anchor and href

ProG	08-01-2009 01:31 PM

I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:

$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';

$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';

$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';

$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';

$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';



$uri = 'www.bing.com';

$back = 'bing';



preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);

print_r( $matches );

nation-x

08-01-2009 02:07 PM

Quote:

Originally Posted by ProG (Post 16133237)

I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:

$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';

$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';

$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';

$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';

$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';



$uri = 'www.bing.com';

$back = 'bing';



preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);

print_r( $matches );

That throws an error for me.

Quote:

Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '/'

This is what I came up with and it works... but it uses a loop.

Code:

<?php

$url = 'http://www.crazyfilth.com';

$anchor_text = 'Crazy Porn';

$html = file_get_contents('http://www.filthdump.com');



echo checkUrl($url, $anchor_text, $html);



function checkUrl($url, $anchor_text, $html) {

        $found = false;

        $dom = new domDocument(); 

        @$dom->loadHTML($html); 

        $anchors = $dom->getElementsByTagName('a'); 

        foreach ($anchors as $anchor) { 

                 $found_url = $anchor->getAttribute('href'); 

                 $urltext = trim($anchor->nodeValue);

                 if (($found_url == $url) && ($anchor_text == $urltext))  {

                                return true;

                 }

        }

        return false;

}

?>

ProG	08-01-2009 02:11 PM

Quote:

Originally Posted by nation-x (Post 16133308)

That throws an error for me.[/code]

I don't get that error but it would mean that a / was specified in the $uri

You could either put the http:// part in the regexp like I did, or replace all / with \/

who	08-01-2009 02:11 PM

echo "I love google"; //for extra PR

nation-x

08-01-2009 02:19 PM

Quote:

Originally Posted by ProG (Post 16133320)

I don't get that error but it would mean that a / was specified in the $uri

You could either put the http:// part in the regexp like I did, or replace all / with \/

aaah... yeah I see.

Killswitch - BANNED FOR LIFE

08-01-2009 02:39 PM

Quote:

Originally Posted by ProG (Post 16133237)

I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:

$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';

$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';

$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';

$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';

$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';



$uri = 'www.bing.com';

$back = 'bing';



preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);

print_r( $matches );

I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.

ProG	08-01-2009 03:02 PM

Quote:

Originally Posted by Killswitch (Post 16133383)

I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.

Hm.. this works for me, did you double escape so that both \/ show?

Code:

$uri = 'http://www.bing.com/';

$uri = str_replace( '/', '\\/', $uri );

$back = 'bing';



preg_match_all("/<a\s[^>]*href=([\"\']??)({$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);

print_r( $matches );

Killswitch - BANNED FOR LIFE

08-01-2009 03:04 PM

Thanks for all the help, nation-x got me going with his function and it works perfectly.

nation-x

08-01-2009 03:09 PM

Here is the final version for anyone that might need something similar... we found that there was a small issue with urls that had an ending slash... fixed.

Code:

<?php

$url = 'http://www.crazyfilth.com';

$anchor_text = 'Porn Videos';

$html = file_get_contents('http://aisle69.com/');



echo checkUrl($url, $anchor_text, $html);



function checkUrl($url, $anchor_text, $html) {

        $found = false;

        $dom = new domDocument(); 

        @$dom->loadHTML($html); 

        $anchors = $dom->getElementsByTagName('a'); 

        foreach ($anchors as $anchor) { 

                 $found_url = preg_replace('{/$}', '', $anchor->getAttribute('href'));

                 $urltext = trim($anchor->nodeValue);

                 if (($found_url == $url) && ($anchor_text == $urltext))  {

                                return true;

                 }

        }

        return false;

}

?>

nation-x

08-01-2009 03:10 PM

stupid gfy board :P double post

ProG	08-01-2009 03:15 PM

It's a good function for sure :) I only see one issue..

If the link has any extra info it isn't going to match. For example:

Code:

$url = 'http://www.crazyfilth.com/';

$anchor_text = 'Porn Videos';

$html = '<a href="http://www.crazyfilth.com/?PHPSESSID=777" id="extra">Porn Videos</a>';

fris	12-25-2009 11:02 AM

Quote:

Originally Posted by Killswitch (Post 16133114)

I know this is an old post but was doing something like this recently.

Code:

<?php



$content = file_get_contents('test.html');



$regex = "/<a.*? href=(\"|')(.*?)(\"|').*?>(.*?)<\/a>/i";



if (preg_match_all($regex,$content,$matches,PREG_SET_ORDER)) {

    foreach ($matches as $match) {

        // echo $match[0]; // full link including href

        // echo $match[1]; // type of opening quote

        // echo $match[2]; // url

        // echo $match[3]; // type of closing quote

        // echo $match[4]; // link text

    }

}



?>

example urls that will work

Quote:

<a href="http://www.google.com" rel="external">google</a>
<a href='http://www.live.com' id="#links">links</a><br/><p></p>
<a class="links" href="http://www.google.com">google! google!</a>

fatfoo

12-25-2009 11:10 AM

This is too complicated for me.

Merry Christmas

Happy New Year

Killswitch - BANNED FOR LIFE

12-25-2009 12:39 PM

Nice info Fris, Merry Christmas.

calmlikeabomb

12-25-2009 12:52 PM

Good call on the follow up. I've since started using SIMPLE HTML DOM it's a PHP class that uses jQuery style selectors.

http://simplehtmldom.sourceforge.net/

So the original solution to this thread for accessing all anchor tag "hrefs" can be accomplished like this:

Code:

// Create DOM from URL or file

$html = file_get_html('http://www.google.com/');



// Find & print all link hrefs

foreach($html->find('a') as $element) echo $element->href . '<br>';

It can also be done using an OO style. See the docs for more info. This is a sexy class. Merry XMAS!

Killswitch - BANNED FOR LIFE

12-25-2009 12:56 PM

Awesome reply Levi, Merry Christmas to you too man!

All times are GMT -7. The time now is 01:38 PM.