Regular Expression

Regular Expression Tutorial for Beginners - Complete Regex Guide

What is Regular Expression (Regex)?

Regular Expression (Regex) is a powerful pattern-matching language used to search, match, and manipulate text. It's essential for developers, data analysts, and anyone working with text processing.

Basic Regex Syntax

Literal Characters

The simplest regex pattern matches exact characters:

  Pattern: hello
    Matches: "hello" in "hello world"

Metacharacters

Special characters with special meanings:

  • . - Matches any single character except newline
  • ^ - Matches start of string
  • $ - Matches end of string
  • * - Matches 0 or more repetitions
  • + - Matches 1 or more repetitions
  • ? - Matches 0 or 1 repetition
  • | - Alternation (OR)
  • () - Grouping
  • [] - Character class
  • \ - Escape character

Common Regex Patterns

Email Validation

Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

  Examples:
    user@example.com
    john.doe@company.co.uk
    invalid@email (not matched)
    @example.com (not matched)

Phone Number (US Format)

Pattern: ^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$

  Examples:
    (123) 456-7890
    123-456-7890
    123.456.7890
    1234567890

URL Validation

Pattern: ^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b

  Examples:
    https://example.com
    http://www.example.com
    https://sub.example.com/path

Password Strength

Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

  Requirements:
    - At least 8 characters
    - One uppercase letter
    - One lowercase letter
    - One number
    - One special character (@$!%*?&)

Character Classes

Predefined sets of characters:

  • \d - Any digit (0-9)
  • \D - Any non-digit
  • \w - Any word character (a-z, A-Z, 0-9, _)
  • \W - Any non-word character
  • \s - Any whitespace (space, tab, newline)
  • \S - Any non-whitespace

Custom Character Classes

  [abc]     - Matches a, b, or c
    [^abc]    - Matches any character except a, b, or c
    [a-z]     - Matches any lowercase letter
    [A-Z]     - Matches any uppercase letter
    [0-9]     - Matches any digit
    [a-zA-Z]  - Matches any letter

Quantifiers

Specify how many times a pattern should match:

  • {n} - Exactly n times
  • {n,} - n or more times
  • {n,m} - Between n and m times
  • * - 0 or more times (same as {0,})
  • + - 1 or more times (same as {1,})
  • ? - 0 or 1 time (same as {0,1})
  Pattern: \d{3}-\d{2}-\d{4}
    Matches: 123-45-6789 (SSN format)

    Pattern: colou?r
    Matches: "color" and "colour"

    Pattern: \w+@\w+\.\w+
    Matches: basic email format

Anchors and Boundaries

  • ^ - Start of string/line
  • $ - End of string/line
  • \b - Word boundary
  • \B - Non-word boundary
  Pattern: ^Hello
    Matches: "Hello" only at start of string

    Pattern: world$
    Matches: "world" only at end of string

    Pattern: \bcat\b
    Matches: "cat" as a whole word, not in "concatenate"

Groups and Capturing

Capturing Groups

  Pattern: (\d{3})-(\d{3})-(\d{4})
    String: 123-456-7890

    Capture Group 1: 123
    Capture Group 2: 456
    Capture Group 3: 7890

Non-Capturing Groups

  Pattern: (?:https?|ftp)://\w+
    Matches: http://, https://, or ftp:// without capturing

Lookahead and Lookbehind

Positive Lookahead (?=...)

  Pattern: \d+(?= dollars)
    Matches: "100" in "100 dollars" but not in "100 euros"

Negative Lookahead (?!...)

  Pattern: \d+(?! dollars)
    Matches: "100" in "100 euros" but not in "100 dollars"

Common Use Cases

Data Validation

  • Email addresses
  • Phone numbers
  • Credit card numbers
  • Social Security Numbers
  • ZIP codes

Text Processing

  • Find and replace text
  • Extract data from logs
  • Parse CSV/JSON data
  • Clean and format text

Web Scraping

  • Extract URLs from HTML
  • Parse HTML tags
  • Extract specific data patterns

Regex Flags/Modifiers

  • i - Case-insensitive matching
  • g - Global search (find all matches)
  • m - Multiline mode (^ and $ match line breaks)
  • s - Dot matches newline
Pro Tip: Use our online Regex Tester to practice and test your patterns in real-time!

Best Practices

  1. Start Simple - Build complex patterns from simple ones
  2. Test Thoroughly - Test with various inputs including edge cases
  3. Use Comments - Many regex engines support comments for documentation
  4. Avoid Greedy Matching - Use lazy quantifiers (*?, +?) when needed
  5. Escape Special Characters - Use backslash to match literal metacharacters

Common Pitfalls

  • Catastrophic Backtracking - Complex patterns can cause performance issues
  • Overly Broad Patterns - Too permissive patterns match unwanted text
  • Missing Anchors - Patterns may match partial strings unintentionally
  • Not Escaping Special Characters - Causes unexpected behavior

Try Our Regex Tester

Test your regular expressions with live highlighting and match results:

Quick Reference Cheat Sheet

Pattern Description Example
.Any charactera.c matches "abc", "adc"
\dDigit\d{3} matches "123"
\wWord character\w+ matches "hello"
^Start of string^Hello matches start
$End of stringbye$ matches end
*0 or moreab*c matches "ac", "abc", "abbc"
+1 or moreab+c matches "abc", "abbc"
?0 or 1colou?r matches "color", "colour"