==================
Regular Expression
==================
.. contents:: Table of Contents
:backlinks: none
Compare HTML tags
-----------------
+------------+--------------+--------------+
| tag type | format | example |
+============+==============+==============+
| all tag | <[^>]+> |
, |
+------------+--------------+--------------+
| open tag | <[^/>][^>]*> | ,
|
+------------+--------------+--------------+
| close tag | [^>]+> | , |
+------------+--------------+--------------+
| self close | <[^/>]+/> |
|
+------------+--------------+--------------+
.. code-block:: python
# open tag
>>> re.search('<[^/>][^>]*>', '') != None
False
# close tag
>>> re.search('[^>]+>', '
') != None
True
# self close
>>> re.search('<[^/>]+/>', '
') != None
True
``re.findall()`` match string
-----------------------------
.. code-block:: python
# split all string
>>> source = "Hello World Ker HAHA"
>>> re.findall('[\w]+', source)
['Hello', 'World', 'Ker', 'HAHA']
# parsing python.org website
>>> import urllib
>>> import re
>>> s = urllib.urlopen('https://www.python.org')
>>> html = s.read()
>>> s.close()
>>> print("open tags")
open tags
>>> re.findall('<[^/>][^>]*>', html)[0:2]
['', '