Regular Expression
Regular Expression Tutorial for Beginners - Complete Regex Guide
What is Regular Expression (Regex)?
Regular Expression (Regex) is a powerful pattern-matching language used to search, match, and manipulate text. It's essential for developers, data analysts, and anyone working with text processing.
Basic Regex Syntax
Literal Characters
The simplest regex pattern matches exact characters:
Pattern: hello
Matches: "hello" in "hello world"
Metacharacters
Special characters with special meanings:
.- Matches any single character except newline^- Matches start of string$- Matches end of string*- Matches 0 or more repetitions+- Matches 1 or more repetitions?- Matches 0 or 1 repetition|- Alternation (OR)()- Grouping[]- Character class\- Escape character
Common Regex Patterns
Email Validation
Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Examples:
user@example.com
john.doe@company.co.uk
invalid@email (not matched)
@example.com (not matched)
Phone Number (US Format)
Pattern: ^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$
Examples:
(123) 456-7890
123-456-7890
123.456.7890
1234567890
URL Validation
Pattern: ^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b
Examples:
https://example.com
http://www.example.com
https://sub.example.com/path
Password Strength
Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requirements:
- At least 8 characters
- One uppercase letter
- One lowercase letter
- One number
- One special character (@$!%*?&)
Character Classes
Predefined sets of characters:
\d- Any digit (0-9)\D- Any non-digit\w- Any word character (a-z, A-Z, 0-9, _)\W- Any non-word character\s- Any whitespace (space, tab, newline)\S- Any non-whitespace
Custom Character Classes
[abc] - Matches a, b, or c
[^abc] - Matches any character except a, b, or c
[a-z] - Matches any lowercase letter
[A-Z] - Matches any uppercase letter
[0-9] - Matches any digit
[a-zA-Z] - Matches any letter
Quantifiers
Specify how many times a pattern should match:
{n}- Exactly n times{n,}- n or more times{n,m}- Between n and m times*- 0 or more times (same as {0,})+- 1 or more times (same as {1,})?- 0 or 1 time (same as {0,1})
Pattern: \d{3}-\d{2}-\d{4}
Matches: 123-45-6789 (SSN format)
Pattern: colou?r
Matches: "color" and "colour"
Pattern: \w+@\w+\.\w+
Matches: basic email format
Anchors and Boundaries
^- Start of string/line$- End of string/line\b- Word boundary\B- Non-word boundary
Pattern: ^Hello
Matches: "Hello" only at start of string
Pattern: world$
Matches: "world" only at end of string
Pattern: \bcat\b
Matches: "cat" as a whole word, not in "concatenate"
Groups and Capturing
Capturing Groups
Pattern: (\d{3})-(\d{3})-(\d{4})
String: 123-456-7890
Capture Group 1: 123
Capture Group 2: 456
Capture Group 3: 7890
Non-Capturing Groups
Pattern: (?:https?|ftp)://\w+
Matches: http://, https://, or ftp:// without capturing
Lookahead and Lookbehind
Positive Lookahead (?=...)
Pattern: \d+(?= dollars)
Matches: "100" in "100 dollars" but not in "100 euros"
Negative Lookahead (?!...)
Pattern: \d+(?! dollars)
Matches: "100" in "100 euros" but not in "100 dollars"
Common Use Cases
Data Validation
- Email addresses
- Phone numbers
- Credit card numbers
- Social Security Numbers
- ZIP codes
Text Processing
- Find and replace text
- Extract data from logs
- Parse CSV/JSON data
- Clean and format text
Web Scraping
- Extract URLs from HTML
- Parse HTML tags
- Extract specific data patterns
Regex Flags/Modifiers
i- Case-insensitive matchingg- Global search (find all matches)m- Multiline mode (^ and $ match line breaks)s- Dot matches newline
Pro Tip: Use our online Regex Tester to practice and test your patterns in real-time!
Best Practices
- Start Simple - Build complex patterns from simple ones
- Test Thoroughly - Test with various inputs including edge cases
- Use Comments - Many regex engines support comments for documentation
- Avoid Greedy Matching - Use lazy quantifiers (*?, +?) when needed
- Escape Special Characters - Use backslash to match literal metacharacters
Common Pitfalls
- Catastrophic Backtracking - Complex patterns can cause performance issues
- Overly Broad Patterns - Too permissive patterns match unwanted text
- Missing Anchors - Patterns may match partial strings unintentionally
- Not Escaping Special Characters - Causes unexpected behavior
Try Our Regex Tester
Test your regular expressions with live highlighting and match results:
Quick Reference Cheat Sheet
| Pattern | Description | Example |
|---|---|---|
. | Any character | a.c matches "abc", "adc" |
\d | Digit | \d{3} matches "123" |
\w | Word character | \w+ matches "hello" |
^ | Start of string | ^Hello matches start |
$ | End of string | bye$ matches end |
* | 0 or more | ab*c matches "ac", "abc", "abbc" |
+ | 1 or more | ab+c matches "abc", "abbc" |
? | 0 or 1 | colou?r matches "color", "colour" |