ПОЛИТИКА ЗА ПОВЕРИТЕЛНОСТ И ЗАЩИТА НА ЛИЧНИ ДАННИ

PHP RegEx – Cheat Sheet and Popular Examples

RegEx [Regular Expression] is a block with rules for text patterns used by searching and validating text. It is effective but difficult for reading. Most RegEx rules are the same for all programming languages but there are important exceptions for every language.

Regex Cheat Sheet

. Any character except the new line
\d Decimal digit (0-9)
\D Not a digit (0-9)
\w Word character (a-z, A-Z, 0-9, _)
\W Not a word character
\s White space - space, tab, new line)
\S Not a white space
\b Word boundary
\B Not a word boundary
^ Beginning of a string
$ End of a string
[...] Matches characters
[^ ] Unmatched characters
| Or
(...) Group
+ 1 or more
? 0 or more
{2} Exact number
{1,3} Between 1 and 3
i Case insensitive search
m Multiline search
u For UTF-8 encoded patterns
. [ ] { } \ ^ $ | ? * + Characters that need to be escaped

How to check Regex?

The same popular link is https://regex101.com/ It is possible to test Regex for different programming languages.

Another way in PHP is to use function preg_match:
preg_match($pattern, $subject, &$matches, $flag, $offset). The last three arguments are optional.

Regex is easy to understand with examples

Popular pattern examples from Codeigniter - libraries – form_validation

Integer number

/^[\-+]?[0-9]+$/	function integer($str) 
------------------------------------------------------
/^$/	                Begin – end of pattern
/^[\-+]$/	        One plus or one minus sign. 
/^[\-+]?$/              Zero or one times
/^[\-+]?[0-9]$/	        One 0 or 1 or … or 9
/^[\-+]?[0-9]+$/	One or more times repeated
            

Decimal number

/^[\-+]?[0-9]+\.[0-9]+$/    function decimal($str)
------------------------------------------------------
/^$/                        Begin – end of pattern
/^[\-+]?$/                  One plus or one minus sign. 
                            Zero or one times (optional)
/^[\-+]?[0-9]+$/            0 or 1 or … or 9, mandatory, 
                            one or more times repeated.
/^[\-+]?[0-9]+\.$/          Point sign, mandatory, escaped \.
/^[\-+]?[0-9]+\.[0-9]+$/    0 or 1 or … or 9, decimal part, 
                            one or more times repeated.
            

One or more letters, digits, minus or underscore

/^[a-z0-9_-]+$/i	function alpha_dash()
------------------------------------------------------
/^$/              Begin – end of pattern
/^[a-z0-9_-]/	  One letter, a to z or, digit, underscore or minus. 
/^[a-z0-9_-]+$/   One or more times repeated
/^[a-z0-9_-]+$/i	Case insensitive
            

Text filed examples

Suppose you need to validate the text field with letters – small or capital, digits, question sign, comma, point, dash sign, and exclamation sign.

The first step – all signs without escaping [a-zA-Z0-9! ,]
The second step – signs with escaping [a-zA-Z0-9! ,\?\.\-]
The third step – begin – end of pattern for one or more characters /^[a-zA-Z0-9! ,\?\.\-]+$/
If the max length of text must be 255 characters and empty field is OK /^[a-zA-Z0-9! ,\?\.\-]{0,255}$/

Test php code:

<?php
$str='Hello! How are you? One, two, three. 2021-11-08';
$pattern='/^[a-zA-Z0-9! ,\?\.\-]{0,255}$/';
echo 'preg_match:  '.preg_match($pattern,$str).PHP_EOL;//preg_match:  1
$maches=array();
echo 'preg_match and array:  '.preg_match($pattern,$str, $matches).PHP_EOL;
print_r($matches);
$str='<h1>Heading 1 </h1>';
echo 'preg_match, array, html:  '.preg_match($pattern,$str, $matches).PHP_EOL;
print_r($matches);// 0, print empty array
?>
            

Using groups with example for a valid Url

(http[s]?)	                                           Match http or https
http[s]?):\/\/		                                   Adding :// Match http://       https://
(http[s]?):\/\/([a-z0-9-]+)	                           Domain name – letters, digit, dash 
                                                           Match http://dir http://kkk-oop	
^(http[s]?):\/\/([a-z0-9-]+).([a-z]{2,6})$                 Dot and one domain type Match http://dir.bg		
^(http[s]?):\/\/([a-z0-9-]+)(.([a-z]{2,6})){1,2}$          Dot and one or two domain types https://hh-uu.co.uk
/^(http[s]?):\/\/([a-z0-9-]+)(.[a-z]{2,6})(.[a-z]{2,6})?$/ Right divide in groups. Match https://hh-uu.co.uk
            

The code for test:

<?php
echo 'preg_match and array:  '.preg_match($pattern,$str, $matches).PHP_EOL;     
print_r($matches);//( [0] => https://hh-uu.co.uk [1] => https [2] => hh-uu [3] => .co [4] => .uk )
?>            

The match array has more elements. You can separate the domain name and its type into different groups. And you can easily make replacements by the group!

If domain name is not in ascii code

Function idn-to-ascii can convert to ASCII the domain name. https://www.php.net/manual/en/function.idn-to-ascii.php

<?php
echo idn_to_ascii('количка.info').PHP_EOL; //xn--80apfbdr9d.info
 ?>      

Valid email

From url regex is easy to make email regex:

([a-z0-9_\.\-]+)@	                                        Match jhon_123.smith
^([a-z0-9_\.\-]+)@([a-z0-9-]+)(.[a-z]{2,6})(.[a-z]{2,6})?$	jhon_123.smith@hh-uu.co.uk
            

The match array in this case: ( [0] => Jhon_123.smith@hh-uu.co.uk [1] => Jhon_123.smith [2] => hh-uu [3] => .co [4] => .uk ) Below is the full php code:

<?php
$pattern='/^([a-z0-9_\.\-]+)@([a-z0-9-]+)(.[a-z]{2,6})(.[a-z]{2,6})?$/i';//case insensitive
$str='Jhon_123.smith@hh-uu.co.uk';
echo preg_match($pattern, $str, $matches);
print_r($matches);
 ?> 
            

Other functions

By using a match array is easy to replace values in groups and make statistic calculations. There are some functions with similar possibilities.

preg_match_all() return all matches

preg_split() can split string with several different characters by using specific flags

preg_filter() return only these elements in the given input array that match the pattern

Ctype functions are variant to avoid Regex but only for simple strings. As an example, ctype_alpha validates upper and lower Latin letters and digit. https://www.php.net/manual/en/function.ctype-alpha.php

Regex can validate the input data from users and often is used with PHP filters to sanitize the input.