RegEx [Regular Expression] is a block with rules for text patterns used by searching and validating text. It is effective but difficult for reading. Most RegEx rules are the same for all programming languages but there are important exceptions for every language.
. | Any character except the new line |
---|---|
\d | Decimal digit (0-9) |
\D | Not a digit (0-9) |
\w | Word character (a-z, A-Z, 0-9, _) |
\W | Not a word character |
\s | White space - space, tab, new line) |
\S | Not a white space |
\b | Word boundary |
\B | Not a word boundary |
^ | Beginning of a string |
$ | End of a string |
[...] | Matches characters |
[^ ] | Unmatched characters |
| | Or |
(...) | Group |
+ | 1 or more |
? | 0 or more |
{2} | Exact number |
{1,3} | Between 1 and 3 |
i | Case insensitive search |
m | Multiline search |
u | For UTF-8 encoded patterns |
. [ ] { } \ ^ $ | ? * + | Characters that need to be escaped |
The same popular link is https://regex101.com/ It is possible to test Regex for different programming languages.
Another way in PHP is to use function
preg_match:
preg_match($pattern, $subject, &$matches, $flag, $offset). The last three arguments are optional.
Popular pattern examples from Codeigniter - libraries – form_validation
/^[\-+]?[0-9]+$/ function integer($str) ------------------------------------------------------ /^$/ Begin – end of pattern /^[\-+]$/ One plus or one minus sign. /^[\-+]?$/ Zero or one times /^[\-+]?[0-9]$/ One 0 or 1 or … or 9 /^[\-+]?[0-9]+$/ One or more times repeated
/^[\-+]?[0-9]+\.[0-9]+$/ function decimal($str) ------------------------------------------------------ /^$/ Begin – end of pattern /^[\-+]?$/ One plus or one minus sign. Zero or one times (optional) /^[\-+]?[0-9]+$/ 0 or 1 or … or 9, mandatory, one or more times repeated. /^[\-+]?[0-9]+\.$/ Point sign, mandatory, escaped \. /^[\-+]?[0-9]+\.[0-9]+$/ 0 or 1 or … or 9, decimal part, one or more times repeated.
/^[a-z0-9_-]+$/i function alpha_dash() ------------------------------------------------------ /^$/ Begin – end of pattern /^[a-z0-9_-]/ One letter, a to z or, digit, underscore or minus. /^[a-z0-9_-]+$/ One or more times repeated /^[a-z0-9_-]+$/i Case insensitive
Suppose you need to validate the text field with letters – small or capital, digits, question sign, comma, point, dash sign, and exclamation sign.
The first step – all signs without escaping | [a-zA-Z0-9! ,] |
The second step – signs with escaping | [a-zA-Z0-9! ,\?\.\-] |
The third step – begin – end of pattern for one or more characters | /^[a-zA-Z0-9! ,\?\.\-]+$/ |
If the max length of text must be 255 characters and empty field is OK | /^[a-zA-Z0-9! ,\?\.\-]{0,255}$/ |
Test php code:
<?php $str='Hello! How are you? One, two, three. 2021-11-08'; $pattern='/^[a-zA-Z0-9! ,\?\.\-]{0,255}$/'; echo 'preg_match: '.preg_match($pattern,$str).PHP_EOL;//preg_match: 1 $maches=array(); echo 'preg_match and array: '.preg_match($pattern,$str, $matches).PHP_EOL; print_r($matches); $str='<h1>Heading 1 </h1>'; echo 'preg_match, array, html: '.preg_match($pattern,$str, $matches).PHP_EOL; print_r($matches);// 0, print empty array ?>
(http[s]?) Match http or https http[s]?):\/\/ Adding :// Match http:// https:// (http[s]?):\/\/([a-z0-9-]+) Domain name – letters, digit, dash Match http://dir http://kkk-oop ^(http[s]?):\/\/([a-z0-9-]+).([a-z]{2,6})$ Dot and one domain type Match http://dir.bg ^(http[s]?):\/\/([a-z0-9-]+)(.([a-z]{2,6})){1,2}$ Dot and one or two domain types https://hh-uu.co.uk /^(http[s]?):\/\/([a-z0-9-]+)(.[a-z]{2,6})(.[a-z]{2,6})?$/ Right divide in groups. Match https://hh-uu.co.uk
The code for test:
<?php echo 'preg_match and array: '.preg_match($pattern,$str, $matches).PHP_EOL; print_r($matches);//( [0] => https://hh-uu.co.uk [1] => https [2] => hh-uu [3] => .co [4] => .uk ) ?>
The match array has more elements. You can separate the domain name and its type into different groups. And you can easily make replacements by the group!
Function idn-to-ascii can convert to ASCII the domain name. https://www.php.net/manual/en/function.idn-to-ascii.php
<?php echo idn_to_ascii('количка.info').PHP_EOL; //xn--80apfbdr9d.info ?>
From url regex is easy to make email regex:
([a-z0-9_\.\-]+)@ Match jhon_123.smith ^([a-z0-9_\.\-]+)@([a-z0-9-]+)(.[a-z]{2,6})(.[a-z]{2,6})?$ jhon_123.smith@hh-uu.co.uk
The match array in this case: ( [0] => Jhon_123.smith@hh-uu.co.uk [1] => Jhon_123.smith [2] => hh-uu [3] => .co [4] => .uk ) Below is the full php code:
<?php $pattern='/^([a-z0-9_\.\-]+)@([a-z0-9-]+)(.[a-z]{2,6})(.[a-z]{2,6})?$/i';//case insensitive $str='Jhon_123.smith@hh-uu.co.uk'; echo preg_match($pattern, $str, $matches); print_r($matches); ?>
By using a match array is easy to replace values in groups and make statistic calculations. There are some functions with similar possibilities.
preg_match_all() return all matches
preg_split() can split string with several different characters by using specific flags
preg_filter() return only these elements in the given input array that match the pattern
Ctype functions are variant to avoid Regex but only for simple strings. As an example, ctype_alpha validates upper and lower Latin letters and digit. https://www.php.net/manual/en/function.ctype-alpha.php
Regex can validate the input data from users and often is used with PHP filters to sanitize the input.