GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   regex to match a class element (https://gfy.com/showthread.php?t=1060442)

fris 03-08-2012 03:42 PM

regex to match a class element
 
im looking to match a h1 tag with a class, but also match after that class so

<h1 class="video-title one two clearfloat">get this</h1>

/<h1 class="video-title .*?\">(.*?)<\/h1>/

is this correct?

nm_ 03-08-2012 04:30 PM

i suck at remembering regex, so i always use dom->xpath queries instead. you could find the tag with a query something like this:

<?php

$dom = new DOMDocument();

$dom->loadHTML(HTMLHERE);

$xpath = new DOMXpath($dom);

$selectH1 = $xpath->query('//h1[@class="classnamehere"]');

?>

just came up with that right now, so dunno if that's the exact syntax, but i prefer traversing html w/ dom rather than regex :)

this is a good xpath query guide: http://www.earthinfo.org/xpaths-with-php-by-example/

Barry-xlovecam 03-08-2012 05:01 PM

\"video-title

Why 03-08-2012 05:43 PM

i know regex well enough to know thats not correct, but i dont know regex well enough to help you either. :( sorry

raymor 03-08-2012 06:19 PM

Quote:

Originally Posted by fris (Post 18812142)
im looking to match a h1 tag with a class, but also match after that class so

<h1 class="video-title one two clearfloat">get this</h1>

/<h1 class="video-title .*?\">(.*?)<\/h1>/

is this correct?

You probably should be using a parser because your regexes and such are going to get out of hand as the project grows. That said, the .* will include > so you need a class there and the ? is meaningless given the * . So you're looking at something like:


/<h1 class="video-title([^>]*)\">([^<]*)<\/h1>/

Brujah 03-08-2012 08:11 PM

Like post #2 I prefer to use DOM for this type thing but something else I tend to do is use negatives in my regex so like this where $html contains the html/string:

Code:

if ( preg_match( '/<h1 class="video-title[^"]+">([^<]+)/', $html, $m ) )
{
        // var_dump($m);
        $h1 = $m[1];
}

edit: Ah I see ray had a similar thought.

fris 03-08-2012 08:22 PM

Quote:

Originally Posted by raymor (Post 18812352)
You probably should be using a parser because your regexes and such are going to get out of hand as the project grows. That said, the .* will include > so you need a class there and the ? is meaningless given the * . So you're looking at something like:


/<h1 class="video-title([^>]*)\">([^<]*)<\/h1>/

ya its only one regex in the script.

Brujah 03-08-2012 08:26 PM

I didn't even think to test fris' regex, but it works in this example too

Brujah 03-08-2012 08:33 PM

I want to expand on #2 post, so you can see a working example.

Code:


$html = '<h1 class="video-title one two clearfloat">get this</h1>';

$dom = new DOMDocument();

$dom->loadHTML( $html );

$xpath = new DOMXpath( $dom );

$h1 = $xpath->query( '//h1[contains(@class,"video-title")]' );

var_dump( $h1->item(0)->nodeValue );

DOM is so much better for parsing/scraping once you figure out how to use xpath queries. I had to use "contains(@class,"video-title")" in the above query because the class actually contains more than just 'video-title'

fris 03-09-2012 09:25 AM

Quote:

Originally Posted by Brujah (Post 18812497)
I want to expand on #2 post, so you can see a working example.

Code:


$html = '<h1 class="video-title one two clearfloat">get this</h1>';

$dom = new DOMDocument();

$dom->loadHTML( $html );

$xpath = new DOMXpath( $dom );

$h1 = $xpath->query( '//h1[contains(@class,"video-title")]' );

var_dump( $h1->item(0)->nodeValue );

DOM is so much better for parsing/scraping once you figure out how to use xpath queries. I had to use "contains(@class,"video-title")" in the above query because the class actually contains more than just 'video-title'

i actually timed it, the preg match is faster

Brujah 03-09-2012 11:56 AM

Quote:

Originally Posted by fris (Post 18813361)
i actually timed it, the preg match is faster

I can believe it, the DOM method does a lot more but is definitely overkill for just a simple one-off regex.

fris 03-09-2012 02:21 PM

ya only 1 regex was needed

Mr Cheeks 03-09-2012 02:37 PM

also try http://www.rubular.com/

mikke 03-09-2012 03:19 PM

Quote:

/<h1 class="video-title\s+(.*?)">(.*?)<\/h1>/i
try this one..

Brujah 03-09-2012 03:38 PM

These regex examples also rely on video-title to be the first class immediately after the quote mark, the DOM example is smart enough that it doesn't matter where video-title is located in the class order.

raymor 03-09-2012 03:55 PM

Quote:

Originally Posted by mikke (Post 18814018)
/<h1 class="video-title\s+(.*?)">(.*?)<\/h1>/i

try this one..

See my post for four reasons that's wrong.

fris 03-09-2012 04:12 PM

Quote:

Originally Posted by Brujah (Post 18814043)
These regex examples also rely on video-title to be the first class immediately after the quote mark, the DOM example is smart enough that it doesn't matter where video-title is located in the class order.

ya it is the first one

fris 03-10-2012 08:22 AM

Quote:

Originally Posted by raymor (Post 18814061)
See my post for four reasons that's wrong.

i used this one, it works correctly in php.

fris 03-10-2012 08:28 AM

Code:

<?php

$html = '<h1 class="video-title one two clearfloat">get this</h1>';

$raymor_regex = '/<h1 class="video-title([^>]*)\">([^<]*)<\/h1>/';
$brujah_regex = '/<h1 class="video-title[^"]+">([^<]+)/';
$mikke_regex = '/<h1 class="video-title\s+(.*?)">(.*?)<\/h1>/i';

preg_match($raymor_regex,$html,$raymor);
preg_match($brujah_regex,$html,$brujah);
preg_match($mikke_regex,$html,$mikke);

echo "raymor \n\n";
print_r($raymor);

echo "brujah \n\n";
print_r($brujah);

echo "mikke \n\n";
print_r($mikke);

these are the results

Code:


raymor

Array
(
    [0] => <h1 class="video-title one two clearfloat">get this</h1>
    [1] =>  one two clearfloat
    [2] => get this
)
brujah

Array
(
    [0] => <h1 class="video-title one two clearfloat">get this
    [1] => get this
)
mikke

Array
(
    [0] => <h1 class="video-title one two clearfloat">get this</h1>
    [1] => one two clearfloat
    [2] => get this
)


fris 03-10-2012 08:32 AM

modified brujah's with ending </h1>

Code:

/<h1 class="video-title[^"]+">([^<]+)<\/h1>/
gets 2 results, the orig, and the text between

Code:

Array
(
    [0] => <h1 class="video-title one two clearfloat">get this</h1>
    [1] => get this
)



All times are GMT -7. The time now is 03:26 PM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123