What the fuck is wrong with my regex?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Killswitch - BANNED FOR LIFE
    • Jul 2026

    #1

    What the fuck is wrong with my regex?

    Code:
    |<a\s[^>]*href\s*=\s*(\"??)'.$pData['profile_url'].'\\1[^>]*>'.$aData['link_backlink'].'<\/a>
    Pretty much what it's doing is being populated by the array's to find a link in the source code of a page that has atleast the href tag for the url, and the anchor, any other attributes it ignores, but for some reason it's just not working correctly.
  • calmlikeabomb
    Confirmed User
    • May 2004
    • 1323

    #2
    Try this, assuming I understand the question

    Code:
    <?php
    
    	eregi('href="([^"]+)"[^>]*>([^<]+)', $page->source, $page->links);
    	print_r($page->links);
    
    ?>
    Last edited by calmlikeabomb; 07-31-2009, 07:49 PM.
    subarus.

    Comment

    • Killswitch - BANNED FOR LIFE

      #3
      That doesn't work.

      Pretty much it's looking for <a href="http://somesite.com">something</a> sounds easy, but it will also return true if theres other attributes in the a tag also.

      Comment

      • fris
        Too lazy to set a custom title
        • Aug 2002
        • 55679

        #4
        Code:
        <?
        
        $link = '<a href="http://www.google.com" id="external">google</a>';
        preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);
        print_r($matches[2]);
        
        ?>
        Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.

        Comment

        • fris
          Too lazy to set a custom title
          • Aug 2002
          • 55679

          #5
          bump for kill switch
          Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.

          Comment

          • Killswitch - BANNED FOR LIFE

            #6
            Originally posted by fris
            Code:
            <?
            
            $link = '<a href="http://www.google.com" id="external">google</a>';
            preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU",$link,$matches);
            print_r($matches[2]);
            
            ?>
            That's the same thing I have except that looks for all links, not ones with the url and anchor specified by the array.

            Thanks anyway..

            Bump.

            Comment

            • CyberHustler
              Masterbaiter
              • Feb 2006
              • 28739

              #7
              Bump
              “If you can convince the lowest white man he’s better than the best colored man, he won’t notice you’re picking his pocket. Hell, give him somebody to look down on, and he’ll empty his pockets for you.”

              Comment

              • nation-x
                Confirmed User
                • Mar 2004
                • 5370

                #8
                Here is a way to do it without regex
                Code:
                $dom = new DOMDocument();
                @$dom->loadHTML($html);
                
                $xpath = new DOMXPath($dom);
                
                $baseNodes = $xpath->evaluate("//base/@href");
                if ($baseNodes->length == 1) {
                	$baseUrl = rtrim($baseNodes->item(0)->nodeValue, '/');
                }
                
                $hrefs = $xpath->evaluate("//a");

                Comment

                • KillerK
                  Confirmed User
                  • May 2008
                  • 3406

                  #9
                  PHP Code:
                  $link = '<a href="http://www.google.com">google</a>';
                  preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/', $link, $matches);
                  print_r($matches);
                  
                  should print
                  
                  Array
                  (
                      [0] => a href="http://www.google.com">google</a>
                      [1] => http://www.google.com
                      [2] => google
                  ) 
                  
                  That help at all?

                  Comment

                  • Killswitch - BANNED FOR LIFE

                    #10
                    Originally posted by KillerK
                    PHP Code:
                    $link = '<a href="http://www.google.com">google</a>';
                    preg_match('/a href="([^"]*?)">([^"]*?)<\/a>/', $link, $matches);
                    print_r($matches);
                    
                    should print
                    
                    Array
                    (
                        [0] => a href="http://www.google.com">google</a>
                        [1] => http://www.google.com
                        [2] => google
                    ) 
                    
                    That help at all?
                    Yeah works perfect if it's just <a href="http://www.google.com">google</a> and not if it was <a rel="nofollow" href="http://www.google.com" title="google">google</a>

                    What I need it to do is find any a tag, with the specified href and anchor, ignores other attributes but returns true if the a tag has both the anchor and href

                    Comment

                    • ProG
                      Confirmed User
                      • Apr 2009
                      • 1319

                      #11
                      I don't entirely understand what you are trying to do but maybe this will help... :shrug:

                      Code:
                      $links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
                      $links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
                      $links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
                      $links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
                      $links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';
                      
                      $uri = 'www.bing.com';
                      $back = 'bing';
                      
                      preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
                      print_r( $matches );
                      History will be kind to me for I intend to write it.

                      Comment

                      • nation-x
                        Confirmed User
                        • Mar 2004
                        • 5370

                        #12
                        Originally posted by ProG
                        I don't entirely understand what you are trying to do but maybe this will help... :shrug:

                        Code:
                        $links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
                        $links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
                        $links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
                        $links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
                        $links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';
                        
                        $uri = 'www.bing.com';
                        $back = 'bing';
                        
                        preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
                        print_r( $matches );
                        That throws an error for me.
                        Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '/'
                        This is what I came up with and it works... but it uses a loop.

                        Code:
                        <?php
                        $url = 'http://www.crazyfilth.com';
                        $anchor_text = 'Crazy Porn';
                        $html = file_get_contents('http://www.filthdump.com');
                        
                        echo checkUrl($url, $anchor_text, $html);
                        
                        function checkUrl($url, $anchor_text, $html) {
                        	$found = false;
                        	$dom = new domDocument(); 
                        	@$dom->loadHTML($html); 
                        	$anchors = $dom->getElementsByTagName('a'); 
                        	foreach ($anchors as $anchor) { 
                        		 $found_url = $anchor->getAttribute('href'); 
                        		 $urltext = trim($anchor->nodeValue);
                        		 if (($found_url == $url) && ($anchor_text == $urltext))  {
                        				return true;
                        		 }
                        	}
                        	return false;
                        }
                        ?>
                        Last edited by nation-x; 08-01-2009, 01:10 PM.

                        Comment

                        • ProG
                          Confirmed User
                          • Apr 2009
                          • 1319

                          #13
                          Originally posted by nation-x
                          That throws an error for me.[/code]
                          I don't get that error but it would mean that a / was specified in the $uri

                          You could either put the http:// part in the regexp like I did, or replace all / with \/
                          History will be kind to me for I intend to write it.

                          Comment

                          • who
                            So Fucking Banned
                            • Aug 2003
                            • 19593

                            #14
                            echo "I love google"; //for extra PR

                            Comment

                            • nation-x
                              Confirmed User
                              • Mar 2004
                              • 5370

                              #15
                              Originally posted by ProG
                              I don't get that error but it would mean that a / was specified in the $uri

                              You could either put the http:// part in the regexp like I did, or replace all / with \/
                              aaah... yeah I see.

                              Comment

                              • Killswitch - BANNED FOR LIFE

                                #16
                                Originally posted by ProG
                                I don't entirely understand what you are trying to do but maybe this will help... :shrug:

                                Code:
                                $links  = '<a rel="nofollow" href="http://www.google.com/" id="extra">google</a>\r\n';
                                $links  .= '<a rel="nofollow" href="http://www.yahoo.com/" id="extra">yahoo</a>\r\n';
                                $links  .= '<a rel="nofollow" href=\'http://www.msn.com/\' id="extra">msn</a>\r\n';
                                $links  .= '<a href="http://www.bing.com/" id="extra">bing</a>\r\n';
                                $links  .= '<a href="http://www.ask.com/" id="extra">ask</a>\r\n';
                                
                                $uri = 'www.bing.com';
                                $back = 'bing';
                                
                                preg_match_all("/<a\s[^>]*href=([\"\']??)(http:\/\/{$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
                                print_r( $matches );
                                I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.

                                Comment

                                • ProG
                                  Confirmed User
                                  • Apr 2009
                                  • 1319

                                  #17
                                  Originally posted by Killswitch
                                  I get the same error as nation-x, I had to remove the http:\/\/ from the regex as it's already in the database as having it, but I even str_replace'd it and still get the error.
                                  Hm.. this works for me, did you double escape so that both \/ show?

                                  Code:
                                  $uri = 'http://www.bing.com/';
                                  $uri = str_replace( '/', '\\/', $uri );
                                  $back = 'bing';
                                  
                                  preg_match_all("/<a\s[^>]*href=([\"\']??)({$uri}*?)([\"\']??)[^>]*>({$back})<\/a>/siU", $links, $matches);
                                  print_r( $matches );
                                  History will be kind to me for I intend to write it.

                                  Comment

                                  • Killswitch - BANNED FOR LIFE

                                    #18
                                    Thanks for all the help, nation-x got me going with his function and it works perfectly.

                                    Comment

                                    • nation-x
                                      Confirmed User
                                      • Mar 2004
                                      • 5370

                                      #19
                                      Here is the final version for anyone that might need something similar... we found that there was a small issue with urls that had an ending slash... fixed.
                                      Code:
                                      <?php
                                      $url = 'http://www.crazyfilth.com';
                                      $anchor_text = 'Porn Videos';
                                      $html = file_get_contents('http://aisle69.com/');
                                      
                                      echo checkUrl($url, $anchor_text, $html);
                                      
                                      function checkUrl($url, $anchor_text, $html) {
                                      	$found = false;
                                      	$dom = new domDocument(); 
                                      	@$dom->loadHTML($html); 
                                      	$anchors = $dom->getElementsByTagName('a'); 
                                      	foreach ($anchors as $anchor) { 
                                      		 $found_url = preg_replace('{/$}', '', $anchor->getAttribute('href'));
                                      		 $urltext = trim($anchor->nodeValue);
                                      		 if (($found_url == $url) && ($anchor_text == $urltext))  {
                                      				return true;
                                      		 }
                                      	}
                                      	return false;
                                      }
                                      ?>

                                      Comment

                                      • nation-x
                                        Confirmed User
                                        • Mar 2004
                                        • 5370

                                        #20
                                        stupid gfy board :P double post

                                        Comment

                                        • ProG
                                          Confirmed User
                                          • Apr 2009
                                          • 1319

                                          #21
                                          It's a good function for sure I only see one issue..

                                          If the link has any extra info it isn't going to match. For example:

                                          Code:
                                          $url = 'http://www.crazyfilth.com/';
                                          $anchor_text = 'Porn Videos';
                                          $html = '<a href="http://www.crazyfilth.com/?PHPSESSID=777" id="extra">Porn Videos</a>';
                                          History will be kind to me for I intend to write it.

                                          Comment

                                          • fris
                                            Too lazy to set a custom title
                                            • Aug 2002
                                            • 55679

                                            #22
                                            Originally posted by Killswitch
                                            Yeah works perfect if it's just <a href="http://www.google.com">google</a> and not if it was <a rel="nofollow" href="http://www.google.com" title="google">google</a>

                                            What I need it to do is find any a tag, with the specified href and anchor, ignores other attributes but returns true if the a tag has both the anchor and href
                                            I know this is an old post but was doing something like this recently.

                                            Code:
                                            <?php
                                            
                                            $content = file_get_contents('test.html');
                                            
                                            $regex = "/<a.*? href=(\"|')(.*?)(\"|').*?>(.*?)<\/a>/i";
                                            
                                            if (preg_match_all($regex,$content,$matches,PREG_SET_ORDER)) {
                                                foreach ($matches as $match) {
                                                    // echo $match[0]; // full link including href
                                                    // echo $match[1]; // type of opening quote
                                                    // echo $match[2]; // url
                                                    // echo $match[3]; // type of closing quote
                                                    // echo $match[4]; // link text
                                                }
                                            }
                                            
                                            ?>
                                            example urls that will work

                                            <a href="http://www.google.com" rel="external">google</a>
                                            <a href='http://www.live.com' id="#links">links</a><br/><p></p>
                                            <a class="links" href="http://www.google.com">google! google!</a>
                                            Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.

                                            Comment

                                            • fatfoo
                                              ICQ:649699063
                                              • Mar 2003
                                              • 27763

                                              #23
                                              This is too complicated for me.

                                              Merry Christmas

                                              Happy New Year
                                              Send me an email: [email protected]

                                              Comment

                                              • Killswitch - BANNED FOR LIFE

                                                #24
                                                Nice info Fris, Merry Christmas.

                                                Comment

                                                • calmlikeabomb
                                                  Confirmed User
                                                  • May 2004
                                                  • 1323

                                                  #25
                                                  Good call on the follow up. I've since started using SIMPLE HTML DOM it's a PHP class that uses jQuery style selectors.

                                                  http://simplehtmldom.sourceforge.net/

                                                  So the original solution to this thread for accessing all anchor tag "hrefs" can be accomplished like this:

                                                  Code:
                                                  // Create DOM from URL or file
                                                  $html = file_get_html('http://www.google.com/');
                                                  
                                                  // Find & print all link hrefs
                                                  foreach($html->find('a') as $element) echo $element->href . '<br>';
                                                  It can also be done using an OO style. See the docs for more info. This is a sexy class. Merry XMAS!
                                                  subarus.

                                                  Comment

                                                  • Killswitch - BANNED FOR LIFE

                                                    #26
                                                    Awesome reply Levi, Merry Christmas to you too man!

                                                    Comment

                                                    Working...