Git Internals and Troubleshooting
In this lesson, we'll dive deep into Git's internal mechanics, explore plumbing commands, and learn how to troubleshoot and recover from common Git problems. Understanding Git internals will help you debug issues and recover from seemingly catastrophic situations.
Understanding Git Plumbing Commands
Git has two types of commands: porcelain (user-friendly) and plumbing (low-level). Plumbing commands let you inspect and manipulate Git's internal structures directly.
# View the type of a Git object
git cat-file -t <hash>
# View the content of a Git object
git cat-file -p <hash>
# List all objects in the repository
git rev-list --objects --all
# Show the content of the index
git ls-files -s
Tip: Plumbing commands are rarely needed for daily work, but they're invaluable for understanding how Git works and debugging complex issues.
Git Filesystem Check
The git fsck command verifies the integrity of your Git repository and can help detect corruption:
# Check repository integrity
git fsck
# Check with verbose output
git fsck --full --verbose
# Find unreachable objects (not referenced by any branch)
git fsck --unreachable
# Find dangling objects (objects with no references)
git fsck --dangling
Common output types:
dangling commit: A commit that's not reachable from any branch
dangling blob: A file content that's not part of any commit
dangling tree: A directory structure not part of any commit
unreachable: Objects that cannot be reached from any reference
Important: Dangling objects are normal after rebasing or amending commits. They'll be cleaned up automatically by Git's garbage collection.
Recovering Deleted Branches
Accidentally deleted a branch? Don't panic! Git rarely loses data immediately. Use git reflog to recover:
# View the reflog (history of HEAD movements)
git reflog
# Find the commit where your branch was
git reflog show --all
# Recreate the deleted branch
git branch recovered-branch <commit-hash>
# Or checkout directly to that commit
git checkout -b recovered-branch <commit-hash>
Example recovery scenario:
# You deleted feature-branch by mistake
git branch -D feature-branch
# View reflog to find the branch tip
git reflog
# Output: a1b2c3d HEAD@{2}: commit: Add new feature
# Recover the branch
git branch feature-branch a1b2c3d
# Verify recovery
git log feature-branch
Warning: Reflog entries expire after 90 days by default (30 days for unreachable commits). Don't wait too long to recover deleted branches!
Dealing with Corrupted Repositories
Repository corruption is rare but can happen due to disk errors, power failures, or incomplete operations. Here's how to handle it:
# Step 1: Check for corruption
git fsck --full
# Step 2: If corruption found, try recovering
# Clone from remote (safest option)
git clone <remote-url> recovered-repo
# Step 3: If no remote, try recovering loose objects
cd .git
git unpack-objects < ../corrupted-pack-file
# Step 4: Rebuild the index
rm index
git reset
# Step 5: Clean up and re-compress
git gc --aggressive --prune=now
Best Practice: Always maintain remote backups. A corrupted local repository can be easily replaced by cloning from GitHub or another remote.
Handling Large Files
Git is optimized for text files, not large binary files. Here's how to handle large files effectively:
# Check repository size
git count-objects -vH
# Find large files in history
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
tail -20
# Remove a large file from history (DANGER!)
git filter-branch --tree-filter 'rm -f large-file.zip' HEAD
# Better: Use BFG Repo-Cleaner
bfg --delete-files large-file.zip
git reflog expire --expire=now --all
git gc --prune=now --aggressive
Critical: Removing files from history rewrites commits and requires force-pushing. Coordinate with your team before doing this!
Preventing large file issues with Git LFS:
# Install Git LFS
git lfs install
# Track large files
git lfs track "*.psd"
git lfs track "*.mp4"
# Check what's tracked
git lfs ls-files
# View LFS status
git lfs status
Repository Maintenance
Regular maintenance keeps your repository healthy and performant:
# Run garbage collection
git gc
# Aggressive garbage collection (slower, more thorough)
git gc --aggressive --prune=now
# Optimize repository
git repack -a -d --depth=250 --window=250
# Remove unreferenced objects older than 2 weeks
git prune --expire 2.weeks.ago
# Clean up reflog
git reflog expire --expire=30.days --all
What does garbage collection do?
✓ Compresses loose objects into packfiles
✓ Removes unreachable objects
✓ Optimizes packfiles for efficiency
✓ Cleans up stale references
✓ Reduces repository size
✓ Improves performance
Automatic GC: Git automatically runs garbage collection periodically. Manual GC is only needed for maintenance or after major history rewrites.
Debugging with GIT_TRACE
Enable tracing to see what Git is doing under the hood. This is invaluable for debugging performance or connection issues:
# Trace basic Git commands
GIT_TRACE=1 git status
# Trace performance (timing information)
GIT_TRACE_PERFORMANCE=1 git log
# Trace packfile operations
GIT_TRACE_PACK_ACCESS=1 git fetch
# Trace all network operations
GIT_TRACE_CURL=1 git push
# Trace setup (config reading, repository discovery)
GIT_TRACE_SETUP=1 git status
# Combine multiple traces
GIT_TRACE=1 GIT_TRACE_PERFORMANCE=1 GIT_CURL_VERBOSE=1 git clone <url>
Example: Debugging slow fetch operations:
GIT_TRACE_PERFORMANCE=1 GIT_TRACE_PACK_ACCESS=1 git fetch origin
# Output shows timing for each operation:
# 0.001234 pack-objects.c:2345 performance: ... spent ...
# Helps identify bottlenecks
Common Troubleshooting Scenarios
Problem 1: "fatal: bad object HEAD"
# Solution: Restore HEAD reference
cat .git/refs/heads/main > .git/HEAD
# Or manually edit .git/HEAD to: ref: refs/heads/main
Problem 2: Detached HEAD state
# You're in detached HEAD, want to keep changes
git branch temp-branch
git checkout main
git merge temp-branch
# Or create branch directly and switch
git checkout -b new-branch
Problem 3: "fatal: refusing to merge unrelated histories"
# Allow merging unrelated histories
git pull origin main --allow-unrelated-histories
# Or during rebase
git rebase origin/main --allow-unrelated-histories
Problem 4: Index lock file exists
# Another Git process crashed, leaving lock file
rm .git/index.lock
# Then retry your operation
git add .
Warning: Only remove index.lock if you're certain no other Git process is running. Removing it during an active operation can cause corruption.
Advanced Recovery Techniques
When standard recovery methods fail, try these advanced techniques:
# Find lost commits by searching reflog of all references
git fsck --lost-found
# Creates .git/lost-found/ with dangling objects
# Review commits:
cd .git/lost-found/commit/
for commit in *; do
echo "Commit: $commit"
git show $commit
done
# Restore a lost commit
git merge <commit-hash>
# Or create a branch from it
git branch recovered <commit-hash>
Repository Health Checklist
Run these commands periodically to ensure repository health:
# 1. Check for corruption
git fsck --full
# 2. Verify remote connectivity
git remote -v
git ls-remote origin
# 3. Check repository size
git count-objects -vH
# 4. Verify configuration
git config --list
# 5. Check disk usage
du -sh .git
# 6. Run maintenance
git gc --auto
# 7. Verify branch tracking
git branch -vv
Practice Exercise:
Scenario: Practice repository recovery and maintenance
- Create a test branch and make some commits
- Delete the branch and recover it using reflog
- Run git fsck to check repository health
- Enable GIT_TRACE to debug a git status command
- Run git gc to optimize your repository
- Check repository size before and after gc
Summary
In this lesson, you learned:
- Git plumbing commands for low-level repository inspection
- Using git fsck to verify repository integrity
- Recovering deleted branches with git reflog
- Handling corrupted repositories and recovery techniques
- Managing large files and using Git LFS
- Repository maintenance with git gc and optimization
- Debugging Git operations with GIT_TRACE variables
- Common troubleshooting scenarios and solutions
Next Up: In the next lesson, we'll explore Git best practices to maintain clean, secure, and efficient repositories!