GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   What the fuck is wrong with my regex? (https://gfy.com/showthread.php?t=919195)

Killswitch - BANNED FOR LIFE 07-31-2009 08:20 PM

What the fuck is wrong with my regex?
 
Code:

|<a\s[^>]*href\s*=\s*(\"??)'.$pData['profile_url'].'\\1[^>]*>'.$aData['link_backlink'].'<\/a>
Pretty much what it's doing is being populated by the array's to find a link in the source code of a page that has atleast the href tag for the url, and the anchor, any other attributes it ignores, but for some reason it's just not working correctly.

calmlikeabomb 07-31-2009 08:46 PM

Try this, assuming I understand the question :)

Code:

<?php

        eregi('href="([^"]+)"[^>]*>([^<]+)', $page->source, $page->links);
        print_r($page->links);

?>


Killswitch - BANNED FOR LIFE 07-31-2009 11:00 PM

That doesn't work.

Pretty much it's looking for <a href="http://somesite.com">something</a> sounds easy, but it will also return true if theres other attributes in the a tag also.

fris 08-01-2009 05:05 AM

Code:

<?

$link = '<a href="http://www.google.com" id="external">google</a>';
preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);
print_r($matches[2]);

?>


fris 08-01-2009 11:30 AM

bump for kill switch

Killswitch - BANNED FOR LIFE 08-01-2009 12:15 PM

Quote:

Originally Posted by fris (Post 16132202)
Code:

<?

$link = '<a href="http://www.google.com" id="external">google</a>';
preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);
print_r($matches[2]);

?>


That's the same thing I have except that looks for all links, not ones with the url and anchor specified by the array.

Thanks anyway..

Bump.

CyberHustler 08-01-2009 12:18 PM

:thumbsup:thumbsup Bump

nation-x 08-01-2009 12:23 PM

Here is a way to do it without regex
Code:

$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$baseNodes = $xpath->evaluate("//base/@href");
if ($baseNodes->length == 1) {
        $baseUrl = rtrim($baseNodes->item(0)->nodeValue, '/');
}

$hrefs = $xpath->evaluate("//a");


KillerK 08-01-2009 12:25 PM

PHP Code:

$link '<a href="http://www.google.com">google</a>';
preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/'$link$matches);
print_r($matches);

should print

Array
(
    [
0] => a href="http://www.google.com">google</a>
    [
1] => http://www.google.com
    
[2] => google


That help at all?

Killswitch - BANNED FOR LIFE 08-01-2009 12:32 PM

Quote:

Originally Posted by KillerK (Post 16133102)
PHP Code:

$link '<a href="http://www.google.com">google</a>';
preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/'$link$matches);
print_r($matches);

should print

Array
(
    [
0] => a href="http://www.google.com">google</a>
    [
1] => http://www.google.com
    
[2] => google


That help at all?

Yeah works perfect if it's just <a href="http://www.google.com">google</a> and not if it was <a rel="nofollow" href="http://www.google.com" title="google">google</a>

What I need it to do is find any a tag, with the specified href and anchor, ignores other attributes but returns true if the a tag has both the anchor and href

ProG 08-01-2009 01:31 PM

I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:

$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';

$uri = 'www.bing.com';
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );


nation-x 08-01-2009 02:07 PM

Quote:

Originally Posted by ProG (Post 16133237)
I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:

$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';

$uri = 'www.bing.com';
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );


That throws an error for me.
Quote:

Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '/'
This is what I came up with and it works... but it uses a loop.

Code:

<?php
$url = 'http://www.crazyfilth.com';
$anchor_text = 'Crazy Porn';
$html = file_get_contents('http://www.filthdump.com');

echo checkUrl($url, $anchor_text, $html);

function checkUrl($url, $anchor_text, $html) {
        $found = false;
        $dom = new domDocument();
        @$dom->loadHTML($html);
        $anchors = $dom->getElementsByTagName('a');
        foreach ($anchors as $anchor) {
                $found_url = $anchor->getAttribute('href');
                $urltext = trim($anchor->nodeValue);
                if (($found_url == $url) && ($anchor_text == $urltext))  {
                                return true;
                }
        }
        return false;
}
?>


ProG 08-01-2009 02:11 PM

Quote:

Originally Posted by nation-x (Post 16133308)
That throws an error for me.[/code]

I don't get that error but it would mean that a / was specified in the $uri

You could either put the http:// part in the regexp like I did, or replace all / with \/

who 08-01-2009 02:11 PM

echo "I love google"; //for extra PR

nation-x 08-01-2009 02:19 PM

Quote:

Originally Posted by ProG (Post 16133320)
I don't get that error but it would mean that a / was specified in the $uri

You could either put the http:// part in the regexp like I did, or replace all / with \/

aaah... yeah I see.

Killswitch - BANNED FOR LIFE 08-01-2009 02:39 PM

Quote:

Originally Posted by ProG (Post 16133237)
I don't entirely understand what you are trying to do but maybe this will help... :shrug:

Code:

$links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
$links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
$links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
$links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
$links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';

$uri = 'www.bing.com';
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );


I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.

ProG 08-01-2009 03:02 PM

Quote:

Originally Posted by Killswitch (Post 16133383)
I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.

Hm.. this works for me, did you double escape so that both \/ show?

Code:

$uri = 'http://www.bing.com/';
$uri = str_replace( '/', '\\/', $uri );
$back = 'bing';

preg_match_all("/<a\s[^>]*href=([\"\']??)({$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
print_r( $matches );


Killswitch - BANNED FOR LIFE 08-01-2009 03:04 PM

Thanks for all the help, nation-x got me going with his function and it works perfectly.

nation-x 08-01-2009 03:09 PM

Here is the final version for anyone that might need something similar... we found that there was a small issue with urls that had an ending slash... fixed.
Code:

<?php
$url = 'http://www.crazyfilth.com';
$anchor_text = 'Porn Videos';
$html = file_get_contents('http://aisle69.com/');

echo checkUrl($url, $anchor_text, $html);

function checkUrl($url, $anchor_text, $html) {
        $found = false;
        $dom = new domDocument();
        @$dom->loadHTML($html);
        $anchors = $dom->getElementsByTagName('a');
        foreach ($anchors as $anchor) {
                $found_url = preg_replace('{/$}', '', $anchor->getAttribute('href'));
                $urltext = trim($anchor->nodeValue);
                if (($found_url == $url) && ($anchor_text == $urltext))  {
                                return true;
                }
        }
        return false;
}
?>


nation-x 08-01-2009 03:10 PM

stupid gfy board :P double post

ProG 08-01-2009 03:15 PM

It's a good function for sure :) I only see one issue..

If the link has any extra info it isn't going to match. For example:

Code:

$url = 'http://www.crazyfilth.com/';
$anchor_text = 'Porn Videos';
$html = '<a href="http://www.crazyfilth.com/?PHPSESSID=777" id="extra">Porn Videos</a>';


fris 12-25-2009 11:02 AM

Quote:

Originally Posted by Killswitch (Post 16133114)
Yeah works perfect if it's just <a href="http://www.google.com">google</a> and not if it was <a rel="nofollow" href="http://www.google.com" title="google">google</a>

What I need it to do is find any a tag, with the specified href and anchor, ignores other attributes but returns true if the a tag has both the anchor and href

I know this is an old post but was doing something like this recently.

Code:

<?php

$content = file_get_contents('test.html');

$regex = "/<a.*? href=(\"|')(.*?)(\"|').*?>(.*?)<\/a>/i";

if (preg_match_all($regex,$content,$matches,PREG_SET_ORDER)) {
    foreach ($matches as $match) {
        // echo $match[0]; // full link including href
        // echo $match[1]; // type of opening quote
        // echo $match[2]; // url
        // echo $match[3]; // type of closing quote
        // echo $match[4]; // link text
    }
}

?>

example urls that will work

Quote:

<a href="http://www.google.com" rel="external">google</a>
<a href='http://www.live.com' id="#links">links</a><br/><p></p>
<a class="links" href="http://www.google.com">google! google!</a>

fatfoo 12-25-2009 11:10 AM

This is too complicated for me.

Merry Christmas

Happy New Year

Killswitch - BANNED FOR LIFE 12-25-2009 12:39 PM

Nice info Fris, Merry Christmas.

calmlikeabomb 12-25-2009 12:52 PM

Good call on the follow up. I've since started using SIMPLE HTML DOM it's a PHP class that uses jQuery style selectors.

http://simplehtmldom.sourceforge.net/

So the original solution to this thread for accessing all anchor tag "hrefs" can be accomplished like this:

Code:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find & print all link hrefs
foreach($html->find('a') as $element) echo $element->href . '<br>';

It can also be done using an OO style. See the docs for more info. This is a sexy class. Merry XMAS!

Killswitch - BANNED FOR LIFE 12-25-2009 12:56 PM

Awesome reply Levi, Merry Christmas to you too man!


All times are GMT -7. The time now is 04:54 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc