Advanced JavaScript (ES6+)

Regular Expressions

13 min Lesson 37 of 40

Regular Expressions

Regular expressions (regex) are powerful patterns used for matching and manipulating text. They're essential for validation, searching, and text processing. Let's master this important skill!

Regex Syntax and Patterns

Regular expressions in JavaScript can be created in two ways:

// 1. Literal notation (preferred for static patterns) const regex1 = /pattern/flags; // 2. Constructor notation (for dynamic patterns) const regex2 = new RegExp('pattern', 'flags'); // Basic examples const simplePattern = /hello/; const caseInsensitive = /hello/i; const dynamicPattern = new RegExp(userName, 'i'); // Testing a pattern const text = 'Hello World'; console.log(/hello/i.test(text)); // true console.log(simplePattern.test(text)); // false (case-sensitive) // Matching a pattern const match = text.match(/hello/i); console.log(match); // ['Hello', index: 0, input: 'Hello World']

Character Classes and Quantifiers

Character classes define sets of characters to match:

Character Classes: // Dot (.) - Matches any character except newline /h.t/ // Matches: hat, hot, hit, h9t, etc. // Character set [abc] - Matches any single character in the set /[aeiou]/ // Matches any vowel /[0-9]/ // Matches any digit /[a-z]/ // Matches any lowercase letter /[A-Z]/ // Matches any uppercase letter /[a-zA-Z]/ // Matches any letter // Negated set [^abc] - Matches any character NOT in the set /[^0-9]/ // Matches any non-digit /[^aeiou]/ // Matches any consonant // Shorthand character classes /\d/ // Digit [0-9] /\D/ // Non-digit [^0-9] /\w/ // Word character [a-zA-Z0-9_] /\W/ // Non-word character [^a-zA-Z0-9_] /\s/ // Whitespace (space, tab, newline) /\S/ // Non-whitespace Quantifiers: // * - Zero or more times /ab*c/ // Matches: ac, abc, abbc, abbbc // + - One or more times /ab+c/ // Matches: abc, abbc, abbbc (NOT ac) // ? - Zero or one time (optional) /colou?r/ // Matches: color, colour // {n} - Exactly n times /\d{3}/ // Matches exactly 3 digits: 123 // {n,} - n or more times /\d{3,}/ // Matches 3 or more digits: 123, 1234, 12345 // {n,m} - Between n and m times /\d{3,5}/ // Matches 3 to 5 digits: 123, 1234, 12345 // Practical examples const phonePattern = /\d{3}-\d{3}-\d{4}/; // 123-456-7890 const zipPattern = /\d{5}(-\d{4})?/; // 12345 or 12345-6789 const emailPattern = /\w+@\w+\.\w+/; // simple email pattern
Tip: Quantifiers are greedy by default (match as much as possible). Add ? after a quantifier to make it lazy (match as little as possible): *?, +?, {n,m}?

Anchors and Boundaries

Anchors specify position in the string rather than matching characters:

// ^ - Start of string/line /^Hello/ // Matches 'Hello' only at the start /^\d{3}/ // Matches 3 digits at the start // $ - End of string/line /world$/ // Matches 'world' only at the end /\d{3}$/ // Matches 3 digits at the end // \b - Word boundary /\bcat\b/ // Matches 'cat' but not 'category' or 'concatenate' /\bJava\b/ // Matches 'Java' but not 'JavaScript' // \B - Non-word boundary /\Bcat/ // Matches 'scat' and 'concatenate' but not 'cat' // Examples const text1 = 'The cat sat on the mat'; console.log(/cat/.test(text1)); // true console.log(/^cat/.test(text1)); // false (cat not at start) console.log(/mat$/.test(text1)); // true (mat at end) console.log(/\bcat\b/.test(text1)); // true (cat is a word) console.log(/\bcat\b/.test('category')); // false (cat not a word) // Practical example: validate complete string const usernameRegex = /^[a-z0-9_]{3,16}$/; console.log(usernameRegex.test('john_doe')); // true console.log(usernameRegex.test('ab')); // false (too short) console.log(usernameRegex.test('John_Doe')); // false (has uppercase)

Groups and Capturing

Groups allow you to treat multiple characters as a single unit:

// Capturing groups ( ) const datePattern = /(\d{4})-(\d{2})-(\d{2})/; const dateString = '2024-01-15'; const match = dateString.match(datePattern); console.log(match[0]); // '2024-01-15' (full match) console.log(match[1]); // '2024' (first group - year) console.log(match[2]); // '01' (second group - month) console.log(match[3]); // '15' (third group - day) // Using groups with replace const formatted = dateString.replace(datePattern, '$2/$3/$1'); console.log(formatted); // '01/15/2024' // Named capturing groups const namedPattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/; const namedMatch = dateString.match(namedPattern); console.log(namedMatch.groups.year); // '2024' console.log(namedMatch.groups.month); // '01' console.log(namedMatch.groups.day); // '15' // Non-capturing groups (?: ) const nonCapturing = /(?:Mr|Ms|Mrs)\.\s(\w+)/; const name = 'Mr. Smith'; const nameMatch = name.match(nonCapturing); console.log(nameMatch[0]); // 'Mr. Smith' console.log(nameMatch[1]); // 'Smith' (only this is captured) // Alternation with groups const filePattern = /\.(jpg|jpeg|png|gif)$/i; console.log(filePattern.test('photo.jpg')); // true console.log(filePattern.test('image.PNG')); // true console.log(filePattern.test('doc.pdf')); // false // Backreferences - reference earlier captures const duplicatePattern = /(\w+)\s\1/; // Matches repeated words console.log(duplicatePattern.test('hello hello')); // true console.log(duplicatePattern.test('hello world')); // false // Find repeated words in text const text = 'This this is a test test.'; const duplicates = text.match(/(\w+)\s\1/g); console.log(duplicates); // ['This this', 'test test']

Lookahead and Lookbehind

Lookarounds allow you to match patterns based on what comes before or after, without including it in the match:

// Positive lookahead (?= ) // Matches if followed by pattern const pricePattern = /\d+(?= dollars)/; 'Price: 50 dollars'.match(pricePattern); // ['50'] 'Price: 50 euros'.match(pricePattern); // null // Negative lookahead (?! ) // Matches if NOT followed by pattern const notDollarPattern = /\d+(?! dollars)/; '50 euros'.match(notDollarPattern); // ['50'] '50 dollars'.match(notDollarPattern); // null (50 is followed by ' dollars') // Positive lookbehind (?<= ) // Matches if preceded by pattern const afterDollarPattern = /(?<=\$)\d+/; 'Price: $50'.match(afterDollarPattern); // ['50'] 'Price: 50'.match(afterDollarPattern); // null // Negative lookbehind (?<! ) // Matches if NOT preceded by pattern const notAfterDollarPattern = /(?<!\$)\d+/; 'Price: 50'.match(notAfterDollarPattern); // ['50'] 'Price: $50'.match(notAfterDollarPattern); // null // Practical example: Password validation // At least 8 chars, must have uppercase, lowercase, and digit const passwordPattern = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/; console.log(passwordPattern.test('Password123')); // true console.log(passwordPattern.test('password123')); // false (no uppercase) console.log(passwordPattern.test('PASSWORD123')); // false (no lowercase) console.log(passwordPattern.test('Password')); // false (no digit) console.log(passwordPattern.test('Pass123')); // false (too short)
Note: Lookbehind assertions (?<= and ?<!) are relatively new and may not be supported in older browsers. Always check compatibility for your target environment.

Flags

Flags modify how the regex engine interprets the pattern:

// g - Global: find all matches (not just first) const textWithNumbers = 'I have 2 cats and 3 dogs'; console.log(textWithNumbers.match(/\d/)); // ['2'] (first match only) console.log(textWithNumbers.match(/\d/g)); // ['2', '3'] (all matches) // i - Case insensitive console.log(/hello/i.test('HELLO')); // true console.log(/hello/.test('HELLO')); // false // m - Multiline: ^ and $ match line boundaries const multilineText = `Line 1 Line 2 Line 3`; console.log(multilineText.match(/^Line/g)); // ['Line'] (only first) console.log(multilineText.match(/^Line/gm)); // ['Line', 'Line', 'Line'] // s - Dotall: . matches newlines const textWithNewline = 'Hello\nWorld'; console.log(/Hello.World/.test(textWithNewline)); // false console.log(/Hello.World/s.test(textWithNewline)); // true // u - Unicode: enable Unicode support const emoji = '😀'; console.log(/^.$/.test(emoji)); // false (emoji is 2 code units) console.log(/^.$/u.test(emoji)); // true (with Unicode support) // y - Sticky: match from lastIndex position const stickyRegex = /\d+/y; const numbers = '123 456 789'; stickyRegex.lastIndex = 0; console.log(stickyRegex.exec(numbers)); // ['123'] stickyRegex.lastIndex = 4; console.log(stickyRegex.exec(numbers)); // ['456'] // Combining flags const pattern = /hello/gi; // Global and case-insensitive const text = 'Hello hello HELLO'; console.log(text.match(pattern)); // ['Hello', 'hello', 'HELLO']

Common Regex Patterns

Here are frequently used patterns for common validation tasks:

// Email validation (basic) const emailRegex = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/; console.log(emailRegex.test('user@example.com')); // true // Phone number (US format) const phoneRegex = /^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/; console.log(phoneRegex.test('(123) 456-7890')); // true console.log(phoneRegex.test('123-456-7890')); // true console.log(phoneRegex.test('1234567890')); // true // URL validation const urlRegex = /^https?:\/\/[\w\-]+(\.[\w\-]+)+[/#?]?.*$/; console.log(urlRegex.test('https://example.com')); // true console.log(urlRegex.test('http://sub.example.com/path')); // true // Credit card number (basic check) const cardRegex = /^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$/; console.log(cardRegex.test('1234-5678-9012-3456')); // true // Hex color code const hexRegex = /^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$/; console.log(hexRegex.test('#FF5733')); // true console.log(hexRegex.test('#F57')); // true // IPv4 address const ipRegex = /^(\d{1,3}\.){3}\d{1,3}$/; console.log(ipRegex.test('192.168.1.1')); // true // Date (YYYY-MM-DD) const dateRegex = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/; console.log(dateRegex.test('2024-01-15')); // true // Strong password // Min 8 chars, uppercase, lowercase, digit, special char const strongPassRegex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/; console.log(strongPassRegex.test('Secure@123')); // true // Username (alphanumeric, underscore, 3-16 chars) const usernameRegex = /^[a-zA-Z0-9_]{3,16}$/; console.log(usernameRegex.test('john_doe123')); // true // HTML tag const tagRegex = /<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)/; console.log(tagRegex.test('<div class="test">content</div>')); // true
Warning: These are simplified patterns for demonstration. Production email, URL, and credit card validation should use more robust patterns or dedicated validation libraries.

Practical Examples

Let's see regex in action with real-world use cases:

// 1. Extract all URLs from text function extractUrls(text) { const urlPattern = /https?:\/\/[\w\-]+(\.[\w\-]+)+[/#?]?[^\s]*/gi; return text.match(urlPattern) || []; } const text = 'Visit https://example.com or http://test.org for info'; console.log(extractUrls(text)); // ['https://example.com', 'http://test.org'] // 2. Format phone numbers function formatPhone(phone) { const cleaned = phone.replace(/\D/g, ''); // Remove non-digits const match = cleaned.match(/^(\d{3})(\d{3})(\d{4})$/); if (match) { return `(${match[1]}) ${match[2]}-${match[3]}`; } return phone; } console.log(formatPhone('1234567890')); // (123) 456-7890 console.log(formatPhone('123-456-7890')); // (123) 456-7890 // 3. Sanitize input (remove special characters) function sanitizeInput(input) { return input.replace(/[^a-zA-Z0-9\s]/g, ''); } console.log(sanitizeInput('Hello<script>alert("XSS")</script>World')); // HelloscriptalertXSSscriptWorld // 4. Validate credit card with Luhn algorithm check function validateCard(cardNumber) { const cleaned = cardNumber.replace(/\s/g, ''); // Check format if (!/^\d{13,19}$/.test(cleaned)) { return false; } // Luhn algorithm let sum = 0; let isEven = false; for (let i = cleaned.length - 1; i >= 0; i--) { let digit = parseInt(cleaned[i]); if (isEven) { digit *= 2; if (digit > 9) digit -= 9; } sum += digit; isEven = !isEven; } return sum % 10 === 0; } // 5. Highlight search terms function highlightText(text, searchTerm) { const regex = new RegExp(`(${searchTerm})`, 'gi'); return text.replace(regex, '<mark>$1</mark>'); } console.log(highlightText('JavaScript is great', 'javascript')); // <mark>JavaScript</mark> is great // 6. Parse CSV line function parseCSV(line) { const pattern = /("(?:[^"]|"")*"|[^,]+)(?:,|$)/g; const fields = []; let match; while ((match = pattern.exec(line)) !== null) { fields.push(match[1].replace(/^"|"$/g, '').replace(/""/g, '"')); } return fields; } console.log(parseCSV('John,Doe,"123 Main St, Apt 4",Boston')); // ['John', 'Doe', '123 Main St, Apt 4', 'Boston'] // 7. Extract hashtags function extractHashtags(text) { return text.match(/#[a-zA-Z0-9_]+/g) || []; } console.log(extractHashtags('Love #JavaScript and #WebDev!')); // ['#JavaScript', '#WebDev']

Practice Exercise:

Create a function to validate and format a date string:

// Challenge: Create a date validator // Accept: MM/DD/YYYY, MM-DD-YYYY, YYYY-MM-DD // Output: Standardized YYYY-MM-DD format // Return null if invalid function parseDate(dateString) { // US format: MM/DD/YYYY or MM-DD-YYYY const usPattern = /^(0?[1-9]|1[0-2])[\/-](0?[1-9]|[12]\d|3[01])[\/-](\d{4})$/; const usMatch = dateString.match(usPattern); if (usMatch) { const month = usMatch[1].padStart(2, '0'); const day = usMatch[2].padStart(2, '0'); const year = usMatch[3]; return `${year}-${month}-${day}`; } // ISO format: YYYY-MM-DD const isoPattern = /^(\d{4})-(0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])$/; const isoMatch = dateString.match(isoPattern); if (isoMatch) { const year = isoMatch[1]; const month = isoMatch[2].padStart(2, '0'); const day = isoMatch[3].padStart(2, '0'); return `${year}-${month}-${day}`; } return null; } // Test console.log(parseDate('12/25/2024')); // 2024-12-25 console.log(parseDate('3-5-2024')); // 2024-03-05 console.log(parseDate('2024-1-15')); // 2024-01-15 console.log(parseDate('invalid')); // null

Summary

In this lesson, you learned:

  • Regular expression syntax and creation methods
  • Character classes, quantifiers, and anchors
  • Groups, capturing, and backreferences
  • Lookahead and lookbehind assertions
  • Regex flags (g, i, m, s, u, y)
  • Common patterns for validation
  • Practical real-world regex applications
Next Up: In the next lesson, we'll explore Performance Optimization techniques to make your JavaScript code faster and more efficient!