================== Regular Expression ================== .. contents:: Table of Contents :backlinks: none Compare HTML tags ----------------- +------------+--------------+--------------+ | tag type | format | example | +============+==============+==============+ | all tag | <[^>]+> |
, | +------------+--------------+--------------+ | open tag | <[^/>][^>]*> | , | +------------+--------------+--------------+ | close tag | ]+> |

, | +------------+--------------+--------------+ | self close | <[^/>]+/> |
| +------------+--------------+--------------+ .. code-block:: python # open tag >>> re.search('<[^/>][^>]*>', '

') != None True >>> re.search('<[^/>][^>]*>', '') != None True >>> re.search('<[^/>][^>]*>', '

') != None True >>> re.search('<[^/>][^>]*>', '

') != None False # close tag >>> re.search(']+>', '') != None True # self close >>> re.search('<[^/>]+/>', '
') != None True ``re.findall()`` match string ----------------------------- .. code-block:: python # split all string >>> source = "Hello World Ker HAHA" >>> re.findall('[\w]+', source) ['Hello', 'World', 'Ker', 'HAHA'] # parsing python.org website >>> import urllib >>> import re >>> s = urllib.urlopen('https://www.python.org') >>> html = s.read() >>> s.close() >>> print("open tags") open tags >>> re.findall('<[^/>][^>]*>', html)[0:2] ['', '