Git & GitHub

Git Internals and Troubleshooting

13 min Lesson 31 of 35

Git Internals and Troubleshooting

In this lesson, we'll dive deep into Git's internal mechanics, explore plumbing commands, and learn how to troubleshoot and recover from common Git problems. Understanding Git internals will help you debug issues and recover from seemingly catastrophic situations.

Understanding Git Plumbing Commands

Git has two types of commands: porcelain (user-friendly) and plumbing (low-level). Plumbing commands let you inspect and manipulate Git's internal structures directly.

# View the type of a Git object git cat-file -t <hash> # View the content of a Git object git cat-file -p <hash> # List all objects in the repository git rev-list --objects --all # Show the content of the index git ls-files -s
Tip: Plumbing commands are rarely needed for daily work, but they're invaluable for understanding how Git works and debugging complex issues.

Git Filesystem Check

The git fsck command verifies the integrity of your Git repository and can help detect corruption:

# Check repository integrity git fsck # Check with verbose output git fsck --full --verbose # Find unreachable objects (not referenced by any branch) git fsck --unreachable # Find dangling objects (objects with no references) git fsck --dangling

Common output types:

dangling commit: A commit that's not reachable from any branch dangling blob: A file content that's not part of any commit dangling tree: A directory structure not part of any commit unreachable: Objects that cannot be reached from any reference
Important: Dangling objects are normal after rebasing or amending commits. They'll be cleaned up automatically by Git's garbage collection.

Recovering Deleted Branches

Accidentally deleted a branch? Don't panic! Git rarely loses data immediately. Use git reflog to recover:

# View the reflog (history of HEAD movements) git reflog # Find the commit where your branch was git reflog show --all # Recreate the deleted branch git branch recovered-branch <commit-hash> # Or checkout directly to that commit git checkout -b recovered-branch <commit-hash>

Example recovery scenario:

# You deleted feature-branch by mistake git branch -D feature-branch # View reflog to find the branch tip git reflog # Output: a1b2c3d HEAD@{2}: commit: Add new feature # Recover the branch git branch feature-branch a1b2c3d # Verify recovery git log feature-branch
Warning: Reflog entries expire after 90 days by default (30 days for unreachable commits). Don't wait too long to recover deleted branches!

Dealing with Corrupted Repositories

Repository corruption is rare but can happen due to disk errors, power failures, or incomplete operations. Here's how to handle it:

# Step 1: Check for corruption git fsck --full # Step 2: If corruption found, try recovering # Clone from remote (safest option) git clone <remote-url> recovered-repo # Step 3: If no remote, try recovering loose objects cd .git git unpack-objects < ../corrupted-pack-file # Step 4: Rebuild the index rm index git reset # Step 5: Clean up and re-compress git gc --aggressive --prune=now
Best Practice: Always maintain remote backups. A corrupted local repository can be easily replaced by cloning from GitHub or another remote.

Handling Large Files

Git is optimized for text files, not large binary files. Here's how to handle large files effectively:

# Check repository size git count-objects -vH # Find large files in history git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | tail -20 # Remove a large file from history (DANGER!) git filter-branch --tree-filter 'rm -f large-file.zip' HEAD # Better: Use BFG Repo-Cleaner bfg --delete-files large-file.zip git reflog expire --expire=now --all git gc --prune=now --aggressive
Critical: Removing files from history rewrites commits and requires force-pushing. Coordinate with your team before doing this!

Preventing large file issues with Git LFS:

# Install Git LFS git lfs install # Track large files git lfs track "*.psd" git lfs track "*.mp4" # Check what's tracked git lfs ls-files # View LFS status git lfs status

Repository Maintenance

Regular maintenance keeps your repository healthy and performant:

# Run garbage collection git gc # Aggressive garbage collection (slower, more thorough) git gc --aggressive --prune=now # Optimize repository git repack -a -d --depth=250 --window=250 # Remove unreferenced objects older than 2 weeks git prune --expire 2.weeks.ago # Clean up reflog git reflog expire --expire=30.days --all

What does garbage collection do?

✓ Compresses loose objects into packfiles ✓ Removes unreachable objects ✓ Optimizes packfiles for efficiency ✓ Cleans up stale references ✓ Reduces repository size ✓ Improves performance
Automatic GC: Git automatically runs garbage collection periodically. Manual GC is only needed for maintenance or after major history rewrites.

Debugging with GIT_TRACE

Enable tracing to see what Git is doing under the hood. This is invaluable for debugging performance or connection issues:

# Trace basic Git commands GIT_TRACE=1 git status # Trace performance (timing information) GIT_TRACE_PERFORMANCE=1 git log # Trace packfile operations GIT_TRACE_PACK_ACCESS=1 git fetch # Trace all network operations GIT_TRACE_CURL=1 git push # Trace setup (config reading, repository discovery) GIT_TRACE_SETUP=1 git status # Combine multiple traces GIT_TRACE=1 GIT_TRACE_PERFORMANCE=1 GIT_CURL_VERBOSE=1 git clone <url>

Example: Debugging slow fetch operations:

GIT_TRACE_PERFORMANCE=1 GIT_TRACE_PACK_ACCESS=1 git fetch origin # Output shows timing for each operation: # 0.001234 pack-objects.c:2345 performance: ... spent ... # Helps identify bottlenecks

Common Troubleshooting Scenarios

Problem 1: "fatal: bad object HEAD"

# Solution: Restore HEAD reference cat .git/refs/heads/main > .git/HEAD # Or manually edit .git/HEAD to: ref: refs/heads/main

Problem 2: Detached HEAD state

# You're in detached HEAD, want to keep changes git branch temp-branch git checkout main git merge temp-branch # Or create branch directly and switch git checkout -b new-branch

Problem 3: "fatal: refusing to merge unrelated histories"

# Allow merging unrelated histories git pull origin main --allow-unrelated-histories # Or during rebase git rebase origin/main --allow-unrelated-histories

Problem 4: Index lock file exists

# Another Git process crashed, leaving lock file rm .git/index.lock # Then retry your operation git add .
Warning: Only remove index.lock if you're certain no other Git process is running. Removing it during an active operation can cause corruption.

Advanced Recovery Techniques

When standard recovery methods fail, try these advanced techniques:

# Find lost commits by searching reflog of all references git fsck --lost-found # Creates .git/lost-found/ with dangling objects # Review commits: cd .git/lost-found/commit/ for commit in *; do echo "Commit: $commit" git show $commit done # Restore a lost commit git merge <commit-hash> # Or create a branch from it git branch recovered <commit-hash>

Repository Health Checklist

Run these commands periodically to ensure repository health:

# 1. Check for corruption git fsck --full # 2. Verify remote connectivity git remote -v git ls-remote origin # 3. Check repository size git count-objects -vH # 4. Verify configuration git config --list # 5. Check disk usage du -sh .git # 6. Run maintenance git gc --auto # 7. Verify branch tracking git branch -vv

Practice Exercise:

Scenario: Practice repository recovery and maintenance

  1. Create a test branch and make some commits
  2. Delete the branch and recover it using reflog
  3. Run git fsck to check repository health
  4. Enable GIT_TRACE to debug a git status command
  5. Run git gc to optimize your repository
  6. Check repository size before and after gc

Summary

In this lesson, you learned:

  • Git plumbing commands for low-level repository inspection
  • Using git fsck to verify repository integrity
  • Recovering deleted branches with git reflog
  • Handling corrupted repositories and recovery techniques
  • Managing large files and using Git LFS
  • Repository maintenance with git gc and optimization
  • Debugging Git operations with GIT_TRACE variables
  • Common troubleshooting scenarios and solutions
Next Up: In the next lesson, we'll explore Git best practices to maintain clean, secure, and efficient repositories!