[github, git, git-lfs, large files, versioning, organizing, Git LFS, install, large, file, storage]


Overview

When using Git repositories, the challenge of accommodating large files emerges. GitHub imposes limitations on file sizes within repositories. You are met with a warning when attempting to push a file exceeding 50MB, and the threshold is further constrained at 100MB, completely halting your push.

But even if these limits didn’t exist, versioning large files would be very impractical.

Why versioning large files poses challenges

A fundamental concept of versioning is that repositories retain every iteration of each file, ensuring a comprehensive historical record. However, this comes with a trade-off. Cloning repositories that include multiple versions of large files can rapidly consume disk space and impede fetch times.

This building block explains a solution to this: Git Large File Storage (LFS)! It is structured into the following sections:

  • Git LFS
    • Installation
    • Explicit tracking
    • Resuming workflow
  • Advanced use cases
  • Storing extremely large files

Git LFS

Before delving into Git LFS, you should ask yourself whether storing large files is necessary in the first place. Often, these large files are generated on the basis of existing data and code and hence can be reconstructed using existing files.

However, sometimes you wish to store raw data with moderate file sizes (between 5MB and 50MB). For this, use Git LFS, an open-source Git extension designed to address the intricacies of handling large files.

In short, Git LFS allows you to version large files while saving disk space and cloning time, using the same Git workflow that you’re used to. It does not keep all your project’s data locally. It only provides the version you need in your checked-out revision.

Tip

How Git LFS works

When you mark a file as an LFS file, the extension replaces the actual large file with a small pointer on your PC. The actual files (and all their versions) are located on the LFS remote server and only the pulled files are stored in a local cache. When you pull to your local repository, the pointer is replaced with the file and only the actual version you’ve requested gets stored locally.

Summary

The benefits of Git LFS:

  • Facilitates versioning of large files without bloating your local repository.
  • Enhances cloning and fetch times by conserving storage.
  • Maintains the same Git workflow that you are used to.
Tip

Watch this informative video for a brief explanation of how Git LFS works.

Installation

brew install git-lfs
  • If you used Brew, go to the next step. If downloaded directly, open the terminal and change the current working directory to the downloaded and unzipped folder of Git LFS. Then, install:
./install.sh
  • Once installed, set up LFS for your account:
git lfs install
  • If it was successful, you should see the message Git LFS initialized.

Explicit tracking

Git LFS doesn’t autonomously manage files: you must explicitly tell it which files to track.

  • To track a specific file, use:

    git lfs track "largefile.png"
    

  • Alternatively, to track multiple files of a specific type:

git lfs track "*.png"
Warning

Always enclose file names in quotes!

Resuming workflow

Now you can resume your usual Git workflow. You just need to make sure to track the .gitattributes file too.

git add .gitattributes

Simply add your file(s), commit, and push as you’d normally do!

git add largefile.png
git commit -m "Add large file"
git push origin master

To clone a repository and to pull the most recent changes before working on it, use:

git lfs clone {url}
git lfs pull

Advanced use cases

For advanced scenarios, consider external LFS servers and storage options.

  • GitHub provides a Git LFS server that implements the Git LFS API which you can set up so that your binary files can be uploaded to a server that you administer. However, as of today, this is not in a “production ready state” and it is suggested to be used for testing only.

  • In case you’d like to go serverless and back up these files on external services like Amazon S3, you can use one of the Git LFS implementations, like this one for AWS S3. However, bear in mind that these are some external open-source implementations that have not been verified by GitHub.

Tip

You can follow this tutorial by Atlassian for more advanced use cases, like moving an LFS repository between hosts or deleting local LFS files.

Storing extremely large files

If you’re hitting the limits of Git LFS, or only want to store one version of a file, object storage (e.g., such as the one on AWS S3) may be a better way to handle large files.

Summary

Managing large files within Git repositories can be a challenge due to file size limitations and the storage demands of versioning. Git LFS presents an elegant solution by allowing you to incorporate large files while maintaining a lean local repository. With explicit tracking, effortless integration, and advanced options, Git LFS offers a versatile approach to handling large assets within your Git workflow.

Contributed by Andrea Antonacci