Regular Expressions
Regular expressions (regex) are powerful patterns used for matching and manipulating text. They're essential for validation, searching, and text processing. Let's master this important skill!
Regex Syntax and Patterns
Regular expressions in JavaScript can be created in two ways:
// 1. Literal notation (preferred for static patterns)
const regex1 = /pattern/flags;
// 2. Constructor notation (for dynamic patterns)
const regex2 = new RegExp('pattern', 'flags');
// Basic examples
const simplePattern = /hello/;
const caseInsensitive = /hello/i;
const dynamicPattern = new RegExp(userName, 'i');
// Testing a pattern
const text = 'Hello World';
console.log(/hello/i.test(text)); // true
console.log(simplePattern.test(text)); // false (case-sensitive)
// Matching a pattern
const match = text.match(/hello/i);
console.log(match); // ['Hello', index: 0, input: 'Hello World']
Character Classes and Quantifiers
Character classes define sets of characters to match:
Character Classes:
// Dot (.) - Matches any character except newline
/h.t/ // Matches: hat, hot, hit, h9t, etc.
// Character set [abc] - Matches any single character in the set
/[aeiou]/ // Matches any vowel
/[0-9]/ // Matches any digit
/[a-z]/ // Matches any lowercase letter
/[A-Z]/ // Matches any uppercase letter
/[a-zA-Z]/ // Matches any letter
// Negated set [^abc] - Matches any character NOT in the set
/[^0-9]/ // Matches any non-digit
/[^aeiou]/ // Matches any consonant
// Shorthand character classes
/\d/ // Digit [0-9]
/\D/ // Non-digit [^0-9]
/\w/ // Word character [a-zA-Z0-9_]
/\W/ // Non-word character [^a-zA-Z0-9_]
/\s/ // Whitespace (space, tab, newline)
/\S/ // Non-whitespace
Quantifiers:
// * - Zero or more times
/ab*c/ // Matches: ac, abc, abbc, abbbc
// + - One or more times
/ab+c/ // Matches: abc, abbc, abbbc (NOT ac)
// ? - Zero or one time (optional)
/colou?r/ // Matches: color, colour
// {n} - Exactly n times
/\d{3}/ // Matches exactly 3 digits: 123
// {n,} - n or more times
/\d{3,}/ // Matches 3 or more digits: 123, 1234, 12345
// {n,m} - Between n and m times
/\d{3,5}/ // Matches 3 to 5 digits: 123, 1234, 12345
// Practical examples
const phonePattern = /\d{3}-\d{3}-\d{4}/; // 123-456-7890
const zipPattern = /\d{5}(-\d{4})?/; // 12345 or 12345-6789
const emailPattern = /\w+@\w+\.\w+/; // simple email pattern
Tip: Quantifiers are greedy by default (match as much as possible). Add ? after a quantifier to make it lazy (match as little as possible): *?, +?, {n,m}?
Anchors and Boundaries
Anchors specify position in the string rather than matching characters:
// ^ - Start of string/line
/^Hello/ // Matches 'Hello' only at the start
/^\d{3}/ // Matches 3 digits at the start
// $ - End of string/line
/world$/ // Matches 'world' only at the end
/\d{3}$/ // Matches 3 digits at the end
// \b - Word boundary
/\bcat\b/ // Matches 'cat' but not 'category' or 'concatenate'
/\bJava\b/ // Matches 'Java' but not 'JavaScript'
// \B - Non-word boundary
/\Bcat/ // Matches 'scat' and 'concatenate' but not 'cat'
// Examples
const text1 = 'The cat sat on the mat';
console.log(/cat/.test(text1)); // true
console.log(/^cat/.test(text1)); // false (cat not at start)
console.log(/mat$/.test(text1)); // true (mat at end)
console.log(/\bcat\b/.test(text1)); // true (cat is a word)
console.log(/\bcat\b/.test('category')); // false (cat not a word)
// Practical example: validate complete string
const usernameRegex = /^[a-z0-9_]{3,16}$/;
console.log(usernameRegex.test('john_doe')); // true
console.log(usernameRegex.test('ab')); // false (too short)
console.log(usernameRegex.test('John_Doe')); // false (has uppercase)
Groups and Capturing
Groups allow you to treat multiple characters as a single unit:
// Capturing groups ( )
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const dateString = '2024-01-15';
const match = dateString.match(datePattern);
console.log(match[0]); // '2024-01-15' (full match)
console.log(match[1]); // '2024' (first group - year)
console.log(match[2]); // '01' (second group - month)
console.log(match[3]); // '15' (third group - day)
// Using groups with replace
const formatted = dateString.replace(datePattern, '$2/$3/$1');
console.log(formatted); // '01/15/2024'
// Named capturing groups
const namedPattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const namedMatch = dateString.match(namedPattern);
console.log(namedMatch.groups.year); // '2024'
console.log(namedMatch.groups.month); // '01'
console.log(namedMatch.groups.day); // '15'
// Non-capturing groups (?: )
const nonCapturing = /(?:Mr|Ms|Mrs)\.\s(\w+)/;
const name = 'Mr. Smith';
const nameMatch = name.match(nonCapturing);
console.log(nameMatch[0]); // 'Mr. Smith'
console.log(nameMatch[1]); // 'Smith' (only this is captured)
// Alternation with groups
const filePattern = /\.(jpg|jpeg|png|gif)$/i;
console.log(filePattern.test('photo.jpg')); // true
console.log(filePattern.test('image.PNG')); // true
console.log(filePattern.test('doc.pdf')); // false
// Backreferences - reference earlier captures
const duplicatePattern = /(\w+)\s\1/; // Matches repeated words
console.log(duplicatePattern.test('hello hello')); // true
console.log(duplicatePattern.test('hello world')); // false
// Find repeated words in text
const text = 'This this is a test test.';
const duplicates = text.match(/(\w+)\s\1/g);
console.log(duplicates); // ['This this', 'test test']
Lookahead and Lookbehind
Lookarounds allow you to match patterns based on what comes before or after, without including it in the match:
// Positive lookahead (?= )
// Matches if followed by pattern
const pricePattern = /\d+(?= dollars)/;
'Price: 50 dollars'.match(pricePattern); // ['50']
'Price: 50 euros'.match(pricePattern); // null
// Negative lookahead (?! )
// Matches if NOT followed by pattern
const notDollarPattern = /\d+(?! dollars)/;
'50 euros'.match(notDollarPattern); // ['50']
'50 dollars'.match(notDollarPattern); // null (50 is followed by ' dollars')
// Positive lookbehind (?<= )
// Matches if preceded by pattern
const afterDollarPattern = /(?<=\$)\d+/;
'Price: $50'.match(afterDollarPattern); // ['50']
'Price: 50'.match(afterDollarPattern); // null
// Negative lookbehind (?<! )
// Matches if NOT preceded by pattern
const notAfterDollarPattern = /(?<!\$)\d+/;
'Price: 50'.match(notAfterDollarPattern); // ['50']
'Price: $50'.match(notAfterDollarPattern); // null
// Practical example: Password validation
// At least 8 chars, must have uppercase, lowercase, and digit
const passwordPattern = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/;
console.log(passwordPattern.test('Password123')); // true
console.log(passwordPattern.test('password123')); // false (no uppercase)
console.log(passwordPattern.test('PASSWORD123')); // false (no lowercase)
console.log(passwordPattern.test('Password')); // false (no digit)
console.log(passwordPattern.test('Pass123')); // false (too short)
Note: Lookbehind assertions (?<= and ?<!) are relatively new and may not be supported in older browsers. Always check compatibility for your target environment.
Flags
Flags modify how the regex engine interprets the pattern:
// g - Global: find all matches (not just first)
const textWithNumbers = 'I have 2 cats and 3 dogs';
console.log(textWithNumbers.match(/\d/)); // ['2'] (first match only)
console.log(textWithNumbers.match(/\d/g)); // ['2', '3'] (all matches)
// i - Case insensitive
console.log(/hello/i.test('HELLO')); // true
console.log(/hello/.test('HELLO')); // false
// m - Multiline: ^ and $ match line boundaries
const multilineText = `Line 1
Line 2
Line 3`;
console.log(multilineText.match(/^Line/g)); // ['Line'] (only first)
console.log(multilineText.match(/^Line/gm)); // ['Line', 'Line', 'Line']
// s - Dotall: . matches newlines
const textWithNewline = 'Hello\nWorld';
console.log(/Hello.World/.test(textWithNewline)); // false
console.log(/Hello.World/s.test(textWithNewline)); // true
// u - Unicode: enable Unicode support
const emoji = '😀';
console.log(/^.$/.test(emoji)); // false (emoji is 2 code units)
console.log(/^.$/u.test(emoji)); // true (with Unicode support)
// y - Sticky: match from lastIndex position
const stickyRegex = /\d+/y;
const numbers = '123 456 789';
stickyRegex.lastIndex = 0;
console.log(stickyRegex.exec(numbers)); // ['123']
stickyRegex.lastIndex = 4;
console.log(stickyRegex.exec(numbers)); // ['456']
// Combining flags
const pattern = /hello/gi; // Global and case-insensitive
const text = 'Hello hello HELLO';
console.log(text.match(pattern)); // ['Hello', 'hello', 'HELLO']
Common Regex Patterns
Here are frequently used patterns for common validation tasks:
// Email validation (basic)
const emailRegex = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
console.log(emailRegex.test('user@example.com')); // true
// Phone number (US format)
const phoneRegex = /^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
console.log(phoneRegex.test('(123) 456-7890')); // true
console.log(phoneRegex.test('123-456-7890')); // true
console.log(phoneRegex.test('1234567890')); // true
// URL validation
const urlRegex = /^https?:\/\/[\w\-]+(\.[\w\-]+)+[/#?]?.*$/;
console.log(urlRegex.test('https://example.com')); // true
console.log(urlRegex.test('http://sub.example.com/path')); // true
// Credit card number (basic check)
const cardRegex = /^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$/;
console.log(cardRegex.test('1234-5678-9012-3456')); // true
// Hex color code
const hexRegex = /^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$/;
console.log(hexRegex.test('#FF5733')); // true
console.log(hexRegex.test('#F57')); // true
// IPv4 address
const ipRegex = /^(\d{1,3}\.){3}\d{1,3}$/;
console.log(ipRegex.test('192.168.1.1')); // true
// Date (YYYY-MM-DD)
const dateRegex = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;
console.log(dateRegex.test('2024-01-15')); // true
// Strong password
// Min 8 chars, uppercase, lowercase, digit, special char
const strongPassRegex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;
console.log(strongPassRegex.test('Secure@123')); // true
// Username (alphanumeric, underscore, 3-16 chars)
const usernameRegex = /^[a-zA-Z0-9_]{3,16}$/;
console.log(usernameRegex.test('john_doe123')); // true
// HTML tag
const tagRegex = /<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)/;
console.log(tagRegex.test('<div class="test">content</div>')); // true
Warning: These are simplified patterns for demonstration. Production email, URL, and credit card validation should use more robust patterns or dedicated validation libraries.
Practical Examples
Let's see regex in action with real-world use cases:
// 1. Extract all URLs from text
function extractUrls(text) {
const urlPattern = /https?:\/\/[\w\-]+(\.[\w\-]+)+[/#?]?[^\s]*/gi;
return text.match(urlPattern) || [];
}
const text = 'Visit https://example.com or http://test.org for info';
console.log(extractUrls(text));
// ['https://example.com', 'http://test.org']
// 2. Format phone numbers
function formatPhone(phone) {
const cleaned = phone.replace(/\D/g, ''); // Remove non-digits
const match = cleaned.match(/^(\d{3})(\d{3})(\d{4})$/);
if (match) {
return `(${match[1]}) ${match[2]}-${match[3]}`;
}
return phone;
}
console.log(formatPhone('1234567890')); // (123) 456-7890
console.log(formatPhone('123-456-7890')); // (123) 456-7890
// 3. Sanitize input (remove special characters)
function sanitizeInput(input) {
return input.replace(/[^a-zA-Z0-9\s]/g, '');
}
console.log(sanitizeInput('Hello<script>alert("XSS")</script>World'));
// HelloscriptalertXSSscriptWorld
// 4. Validate credit card with Luhn algorithm check
function validateCard(cardNumber) {
const cleaned = cardNumber.replace(/\s/g, '');
// Check format
if (!/^\d{13,19}$/.test(cleaned)) {
return false;
}
// Luhn algorithm
let sum = 0;
let isEven = false;
for (let i = cleaned.length - 1; i >= 0; i--) {
let digit = parseInt(cleaned[i]);
if (isEven) {
digit *= 2;
if (digit > 9) digit -= 9;
}
sum += digit;
isEven = !isEven;
}
return sum % 10 === 0;
}
// 5. Highlight search terms
function highlightText(text, searchTerm) {
const regex = new RegExp(`(${searchTerm})`, 'gi');
return text.replace(regex, '<mark>$1</mark>');
}
console.log(highlightText('JavaScript is great', 'javascript'));
// <mark>JavaScript</mark> is great
// 6. Parse CSV line
function parseCSV(line) {
const pattern = /("(?:[^"]|"")*"|[^,]+)(?:,|$)/g;
const fields = [];
let match;
while ((match = pattern.exec(line)) !== null) {
fields.push(match[1].replace(/^"|"$/g, '').replace(/""/g, '"'));
}
return fields;
}
console.log(parseCSV('John,Doe,"123 Main St, Apt 4",Boston'));
// ['John', 'Doe', '123 Main St, Apt 4', 'Boston']
// 7. Extract hashtags
function extractHashtags(text) {
return text.match(/#[a-zA-Z0-9_]+/g) || [];
}
console.log(extractHashtags('Love #JavaScript and #WebDev!'));
// ['#JavaScript', '#WebDev']
Practice Exercise:
Create a function to validate and format a date string:
// Challenge: Create a date validator
// Accept: MM/DD/YYYY, MM-DD-YYYY, YYYY-MM-DD
// Output: Standardized YYYY-MM-DD format
// Return null if invalid
function parseDate(dateString) {
// US format: MM/DD/YYYY or MM-DD-YYYY
const usPattern = /^(0?[1-9]|1[0-2])[\/-](0?[1-9]|[12]\d|3[01])[\/-](\d{4})$/;
const usMatch = dateString.match(usPattern);
if (usMatch) {
const month = usMatch[1].padStart(2, '0');
const day = usMatch[2].padStart(2, '0');
const year = usMatch[3];
return `${year}-${month}-${day}`;
}
// ISO format: YYYY-MM-DD
const isoPattern = /^(\d{4})-(0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])$/;
const isoMatch = dateString.match(isoPattern);
if (isoMatch) {
const year = isoMatch[1];
const month = isoMatch[2].padStart(2, '0');
const day = isoMatch[3].padStart(2, '0');
return `${year}-${month}-${day}`;
}
return null;
}
// Test
console.log(parseDate('12/25/2024')); // 2024-12-25
console.log(parseDate('3-5-2024')); // 2024-03-05
console.log(parseDate('2024-1-15')); // 2024-01-15
console.log(parseDate('invalid')); // null
Summary
In this lesson, you learned:
- Regular expression syntax and creation methods
- Character classes, quantifiers, and anchors
- Groups, capturing, and backreferences
- Lookahead and lookbehind assertions
- Regex flags (g, i, m, s, u, y)
- Common patterns for validation
- Practical real-world regex applications
Next Up: In the next lesson, we'll explore Performance Optimization techniques to make your JavaScript code faster and more efficient!