• Subscribe to the RSS feed!
  • Subscribe by Email
  • home
  • blog
  • dev
  • Recent Posts

    • Xdebug is full of awesome
    • Creating a chat bot with PHP and Dbus
    • A year in review: 2011
    • Notes on shell scripting
    • Listening to Dbus signals with PHP
    • Configuring 2 monitors with xrandr
    • A quick note on Dojo’s data grids and dojox.data.HtmlStore
    • Communicating with Pidgin from PHP via D-Bus
    • Upgrading to Fedora 16
    • Contributing to Zend Framework 2
  • Recent Comments

    • Creating a chat bot with PHP and Dbus ~ Robert Basic on Communicating with Pidgin from PHP via D-Bus
    • A year in review: 2011 ~ Robert Basic on Announcing Hex
    • Anon on A quick note on Dojo’s data grids and dojox.data.HtmlStore
    • James on Communicating with Pidgin from PHP via D-Bus
    • A Zend Framework 2 EventManager use case ~ Robert Basic « Bookmarks on A Zend Framework 2 EventManager use case
    • Zend_Auth | Kerek egy ég alatt on Login example with Zend_Auth
    • Jowee on A Zend Framework 2 EventManager use case
    • Jurian Sluiman on A Zend Framework 2 EventManager use case
    • Robert on A Zend Framework 2 EventManager use case
    • Jurian Sluiman on A Zend Framework 2 EventManager use case
  • Tags

    about apache ape blog book comic community conference contributing dbus dojo events example facebook framework hack introduction lamp linux me mysql netbeans open source php pidgin plugin pyqt python random registration review script security setup shell signals site svn talk ubuntu web wordpress xdebug zend zend framework
  • Categories

    • Blablabla
    • Development
    • Free time
    • Places on the web
    • Programming
    • Software
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011
    • May 2011
    • April 2011
    • March 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • July 2010
    • June 2010
    • April 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • August 2009
    • May 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • September 2008
  • Find me on

    • DZone
    • Google Code
    • Google Reader
    • Last.fm
    • StumbleUpon
    • Twitter
    • Vimeo
  • Friends and Blogs

    • Andrew Taylor
    • Andy Sowards
    • Bojan Pejić
    • Eran Galperin
    • Graham Smith
    • Jani Hartikainen
    • Jasper Tandy
    • Matthew Turland
    • Matthew Weier O’Phinney
    • Miff
    • Miloš Ćuković
    • Nebojša Radović
    • Nemanja Avramović
    • Nemanja Tobić
    • Nikola Krajačić
    • Nikola Plejić
    • Pádraic Brady
    • Rob Allen
    • Swizec Teller
    • Vladimir Stanković
    • WeAreJustCreative
    • Željko Stevanović
  • I use

    • 960 Grid System
    • jQuery
    • Notepad++
    • Subversion
    • Trac
    • Vim
    • Zend Framework

Posts Tagged ‘regexp’

Regular expressions with PHP

by Robert Basic on September 22nd, 2008

I just want to write some real examples. These regexps are (and always will be, ’cause I plan to write several posts on this topic) for the PHP’s PCRE library. Here’s a good PHP PCRE cheat sheet, it’s an excellent resource for regexps. If you know nothing about regexps, first read this Wiki page.

Regexps for <a> tags

A common case is when you have a source of some web page and you want to parse out all the links from it.
An anchor tag goes something like this:

<a href="http://example.com/" title="Some website">Website</a>

Also it can have more attributes, like class, target etc. Knowing how it’s built up, we can start writing a pattern, depending on what we want.
Here are some examples, some explanations are in the comments:

<?php
// Regexp examples for <a> tags

/**
* Different combinations...
* $matches_comb[0] contains the whole <a> tag
* $matches_comb[1] contains what's inside the "href" attribute
* $matches_comb[2] contains what's after <a> and before </a>
* with the "s" modifier mathces <a> tags that are broken in several lines,
* ie. matches <a> tags with newlines
* without the "s" modifier, matches only <a> tags without a newline
*/
preg_match_all(
    '#<a\s.*href=["\'](.*)["\'].*>(.*)</a>#isxU',
    $string,
    $matches_comb
);

/**
* Match only what's inside the href attributes...
*/
preg_match_all(
    '#<a\s.*href=["\'](.*)["\'].*>.*</a>#isxU',
    $string,
    $matches_href
);

/**
* Match only what's inside the href attirbutes,
* only when it starts with http:// and includes http://
* $mathces_href_http[0] contains some trash also, nevermind,
* $mathces_href_http[1] contains exactly what we need
*/
preg_match_all(
    '#<a\s.*href=["\'](http://.*)["\'].*>.*</a>#isxU',
    $string,
    $matches_href_http
);

/**
* Match all Email addresses - mailto:
*/
preg_match_all(
    '#"mailto:(.*)"#',
    $string,
    $matches_emails
);

?>

Play around with these patterns, see what’s for what, experiment, that’s the best way to learn regexps.
Do you have some more regexps for links? Some better ones than these here?
Happy hacking!

Tags: example, pcre, php, regex, regexp.
Categories: Development, Programming.
Comments: None.
Robert Basic © 2008 — 2012
Design & graphics by: Livia Radvanski — Lady L.
Coded by: Robert Basic
Home page last updated on November 30th, 2009.
Frameworks used: Zend Framework, jQuery, 960 Grid System
Blog is powered by Wordpress
Subscribe: Entries — RSS & Comments — RSS