Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 03-08-2012, 03:42 PM   #1
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
regex to match a class element

im looking to match a h1 tag with a class, but also match after that class so

<h1 class="video-title one two clearfloat">get this</h1>

/<h1 class="video-title .*?\">(.*?)<\/h1>/

is this correct?
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 04:30 PM   #2
nm_
Confirmed User
 
Industry Role:
Join Date: May 2011
Location: San Diego
Posts: 328
i suck at remembering regex, so i always use dom->xpath queries instead. you could find the tag with a query something like this:

<?php

$dom = new DOMDocument();

$dom->loadHTML(HTMLHERE);

$xpath = new DOMXpath($dom);

$selectH1 = $xpath->query('//h1[@class="classnamehere"]');

?>

just came up with that right now, so dunno if that's the exact syntax, but i prefer traversing html w/ dom rather than regex

this is a good xpath query guide: http://www.earthinfo.org/xpaths-with-php-by-example/
nm_ is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 05:01 PM   #3
Barry-xlovecam
It's 42
 
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
\"video-title
Barry-xlovecam is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 05:43 PM   #4
Why
MFBA
 
Industry Role:
Join Date: Mar 2003
Location: PNW
Posts: 7,230
i know regex well enough to know thats not correct, but i dont know regex well enough to help you either. sorry
Why is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 06:19 PM   #5
raymor
Confirmed User
 
Join Date: Oct 2002
Posts: 3,745
Quote:
Originally Posted by fris View Post
im looking to match a h1 tag with a class, but also match after that class so

<h1 class="video-title one two clearfloat">get this</h1>

/<h1 class="video-title .*?\">(.*?)<\/h1>/

is this correct?
You probably should be using a parser because your regexes and such are going to get out of hand as the project grows. That said, the .* will include > so you need a class there and the ? is meaningless given the * . So you're looking at something like:


/<h1 class="video-title([^>]*)\">([^<]*)<\/h1>/
raymor is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 08:11 PM   #6
Brujah
Beer Money Baron
 
Brujah's Avatar
 
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
Like post #2 I prefer to use DOM for this type thing but something else I tend to do is use negatives in my regex so like this where $html contains the html/string:

Code:
if ( preg_match( '/<h1 class="video-title[^"]+">([^<]+)/', $html, $m ) )
{
        // var_dump($m);
        $h1 = $m[1];
}
edit: Ah I see ray had a similar thought.
__________________

Last edited by Brujah; 03-08-2012 at 08:12 PM..
Brujah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 08:22 PM   #7
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
Quote:
Originally Posted by raymor View Post
You probably should be using a parser because your regexes and such are going to get out of hand as the project grows. That said, the .* will include > so you need a class there and the ? is meaningless given the * . So you're looking at something like:


/<h1 class="video-title([^>]*)\">([^<]*)<\/h1>/
ya its only one regex in the script.
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 08:26 PM   #8
Brujah
Beer Money Baron
 
Brujah's Avatar
 
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
I didn't even think to test fris' regex, but it works in this example too
__________________
Brujah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2012, 08:33 PM   #9
Brujah
Beer Money Baron
 
Brujah's Avatar
 
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
I want to expand on #2 post, so you can see a working example.

Code:
$html = '<h1 class="video-title one two clearfloat">get this</h1>';

$dom = new DOMDocument();

$dom->loadHTML( $html );

$xpath = new DOMXpath( $dom );

$h1 = $xpath->query( '//h1[contains(@class,"video-title")]' );

var_dump( $h1->item(0)->nodeValue );
DOM is so much better for parsing/scraping once you figure out how to use xpath queries. I had to use "contains(@class,"video-title")" in the above query because the class actually contains more than just 'video-title'
__________________
Brujah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 09:25 AM   #10
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
Quote:
Originally Posted by Brujah View Post
I want to expand on #2 post, so you can see a working example.

Code:
$html = '<h1 class="video-title one two clearfloat">get this</h1>';

$dom = new DOMDocument();

$dom->loadHTML( $html );

$xpath = new DOMXpath( $dom );

$h1 = $xpath->query( '//h1[contains(@class,"video-title")]' );

var_dump( $h1->item(0)->nodeValue );
DOM is so much better for parsing/scraping once you figure out how to use xpath queries. I had to use "contains(@class,"video-title")" in the above query because the class actually contains more than just 'video-title'
i actually timed it, the preg match is faster
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 11:56 AM   #11
Brujah
Beer Money Baron
 
Brujah's Avatar
 
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
Quote:
Originally Posted by fris View Post
i actually timed it, the preg match is faster
I can believe it, the DOM method does a lot more but is definitely overkill for just a simple one-off regex.
__________________
Brujah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 02:21 PM   #12
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
ya only 1 regex was needed
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 02:37 PM   #13
Mr Cheeks
Confirmed User
 
Mr Cheeks's Avatar
 
Join Date: Apr 2002
Posts: 901
:stoned

also try http://www.rubular.com/
Mr Cheeks is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 03:19 PM   #14
mikke
Confirmed User
 
mikke's Avatar
 
Industry Role:
Join Date: Jan 2010
Location: Europe
Posts: 1,327
Quote:
/<h1 class="video-title\s+(.*?)">(.*?)<\/h1>/i
try this one..
__________________
icq: 395 294 346
http://www.adultsubmitter.eu - submit any adult site to 20 directories from 1 form!
now 20 domains!
http://www.porndeals.eu http://www.ebonybangbros.com
mikke is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 03:38 PM   #15
Brujah
Beer Money Baron
 
Brujah's Avatar
 
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
These regex examples also rely on video-title to be the first class immediately after the quote mark, the DOM example is smart enough that it doesn't matter where video-title is located in the class order.
__________________
Brujah is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 03:55 PM   #16
raymor
Confirmed User
 
Join Date: Oct 2002
Posts: 3,745
Quote:
Originally Posted by mikke View Post
/<h1 class="video-title\s+(.*?)">(.*?)<\/h1>/i

try this one..
See my post for four reasons that's wrong.
__________________
For historical display only. This information is not current:
support&#64;bettercgi.com ICQ 7208627
Strongbox - The next generation in site security
Throttlebox - The next generation in bandwidth control
Clonebox - Backup and disaster recovery on steroids
raymor is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-09-2012, 04:12 PM   #17
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
Quote:
Originally Posted by Brujah View Post
These regex examples also rely on video-title to be the first class immediately after the quote mark, the DOM example is smart enough that it doesn't matter where video-title is located in the class order.
ya it is the first one
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-10-2012, 08:22 AM   #18
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
Quote:
Originally Posted by raymor View Post
See my post for four reasons that's wrong.
i used this one, it works correctly in php.
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-10-2012, 08:28 AM   #19
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
Code:
<?php

$html = '<h1 class="video-title one two clearfloat">get this</h1>';

$raymor_regex = '/<h1 class="video-title([^>]*)\">([^<]*)<\/h1>/';
$brujah_regex = '/<h1 class="video-title[^"]+">([^<]+)/';
$mikke_regex = '/<h1 class="video-title\s+(.*?)">(.*?)<\/h1>/i';

preg_match($raymor_regex,$html,$raymor);
preg_match($brujah_regex,$html,$brujah);
preg_match($mikke_regex,$html,$mikke);

echo "raymor \n\n";
print_r($raymor);

echo "brujah \n\n";
print_r($brujah);

echo "mikke \n\n";
print_r($mikke);
these are the results

Code:
raymor

Array
(
    [0] => <h1 class="video-title one two clearfloat">get this</h1>
    [1] =>  one two clearfloat
    [2] => get this
)
brujah

Array
(
    [0] => <h1 class="video-title one two clearfloat">get this
    [1] => get this
)
mikke

Array
(
    [0] => <h1 class="video-title one two clearfloat">get this</h1>
    [1] => one two clearfloat
    [2] => get this
)
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-10-2012, 08:32 AM   #20
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
modified brujah's with ending </h1>

Code:
/<h1 class="video-title[^"]+">([^<]+)<\/h1>/
gets 2 results, the orig, and the text between

Code:
Array
(
    [0] => <h1 class="video-title one two clearfloat">get this</h1>
    [1] => get this
)
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.