Theoretical Session — Notes

Intro

  • [ ] Git ?
  • [ ] Github, Gitlab ?
  • [ ] GitKraken etc... ?

We'll try to cover all of these — make sure you understand the differences between them.

First: Git.


Git

Back in time

It's August 18, 1989. Belgian act Technotronic just dropped Pump Up the Jam: The Album. (put on the song for a few seconds and ask: "ok, captivated yet?")

People dress funny or are undressed, oil is cheap. Your name is Neo, you work in a software company and you produce code on a "Project". You write code.

How does a project evolve and grow back then?

You work, you write code, and once in a while you save modifications done to one or several files with Ctrl+S. You save the state of the project at that specific moment — all the state in which the project was before is now lost. You take a snapshot in time of your project.

(Fig. 1) — folder icon marked "start" on the left, dots leading to the same folder marked "end", labeled "modifications" — a line cuts across representing hitting "save" and throwing everything before it in the trash

But — what if?

Instead of saving an endpoint, taking a snapshot, you save the modifications themselves. You don't save the state your project is in at a certain point in time — you save how your project changed over time, from A to B.

(Fig. 1, continued) — same drawing but no line cutting across — hitting "save" now saves the modifications themselves, circled

I won't spend time convincing you this is a far better way of saving your work — not that you should be smart enough to get it yourself, it's not that easy. But if the whole world is doing it this way, there must be a good reason, right? Actually, trying to understand why is great for motivation and deeper understanding — so here's an exercise: what can you do now in 2026 with git that you could not do in 1989?

THIS IS GIT. It's that simple.

(Fig. 2)

Before — every save overwrites the previous state, only the final snapshot survives:

●            ●            ■
─────────────────────────────────────▶ time
(save)       (save)       (final)

Git — every save records the change, the full history is preserved:

gitGraph commit id: "change 1" commit id: "change 2" commit id: "change 3"

So Git is about how you save your project over time, locally on your computer — offline, on your machine, no internet needed whatsoever. Saving stuff like that is called version control (or versioning).

  • ──────> is a commit
  • A branch is a commit history — several commits ordered in time

A commit gets applied to a project, to some files. It has a unique ID, useful to go back in time, undo modifications, check how things were 5 years ago, etc.

And the sum of commits builds up the branch:

(Fig. 3)

● ──────> ●                              commit A
● ──────> ● ──────> ●                   commit A + commit B
● ──────> ● ──────> ● ──────> ●         commit A + commit B + commit C = a branch

When to commit? What to commit?

Imagine your project is at point A:

(Fig. 4)

flowchart LR A["A — start, year 1"] -->|"500k files modified over 5 years"| B["B — end, year 6"]

Now you can only jump from A to B and B to A — everything in between is lost.

So: you should commit more often!

Ok but when?

A commit is a bunch of modifications — as soon as a few modifications make sense together for you as a human, you commit. This is a key concept:

Granularity — the more often you commit, the better.

This is a first guideline.

Think of it like a dot plot: just a few dots and it looks discrete and ugly. The more dots you add, the more continuous it feels — getting close to a real continuous function.

(Fig. 5) — two timelines side by side (a week: Mon → Fri). Left: a few dots, sparse commits. Right: many dots, dense commits. Separated by a ≤ sign. The more commits, the more continuous — the better.


Collaboration

2 people are now working on the same project: Alice and Bob.

(Fig. 6)

gitGraph commit id: "C1 (shared)" commit id: "C2 (shared)" commit id: "C3 (shared)" branch bob checkout bob commit id: "Bob's commit" commit id: "Bob's commit " checkout main commit id: "Alice's commit" commit id: "Alice's commit "

Up until ● (now), they have the same git commit history — the branch is the same on Alice's machine and on Bob's.

But from that point on, they start making different modifications, so their commit history diverges — the branch from this point differs, diverges.

We represent this like:

(Fig. 7)

gitGraph commit commit commit branch alice checkout alice commit id: "Alice's commit" checkout main branch bob commit id: "Bob's commit"

People can now work on the same project on things that are dependent in the code, but they work independently on different branches, different commit histories.

(you can start to see the logo used in every YouTube tutorial on git taking shape, right?)

Multiple branches together form a git repository:

(Fig. 8)

gitGraph commit commit commit commit commit branch feature-a checkout feature-a commit checkout main commit branch feature-b commit

Merging

Now a new question arises: cool, collaboration — but how do we mix Alice's work with Bob's at some point? We probably want people's work on the "main" branch and make their modifications "official".

At some point, you're going to want to bring the modifications from divergent branches back to the "main" one — from Alice's, from Bob's.

You're going to want to merge branches: merge one branch onto another.

Two major situations:

  • Alice and Bob touched totally different things — different files, different parts of files. No problem. A commit is a bunch of modifications applied to files, so when you merge Alice's branch onto main, and Bob's onto main, no commits will conflict. Git can take the main branch from the divergence point to the end of both branches without issue, without conflicts.

  • Alice and Bob both modified the same things — same functions, for example. It happens. And there's still no problem — relax, it's fine, you're fine. But you have a conflict. Git is telling you: "I tried to apply two different commits but couldn't, because I don't know what the end state should be — do I apply Alice's? Bob's? One then the other? The other way around? Something completely different?"

Don't worry if you don't fully get this — it's something you understand much better in the practical session, or once it's happened to you a few times. But here's the key : you're going to have to manually edit to decide what the final state should be. Keep Alice's, discard Bob's? The other way around? Apply Alice's then Bob's if possible? A savant mix of the two?

Note: this doesn't only happen between two developers. You, yourself, alone, can have two branches, want to merge one onto the other, and end up with conflicts that you have to deal with between you, yourself — and no therapist can help you there. But you're gonna be fine, I promise.


Branch hierarchy — a guideline

I recommend: - main — clean, tested, validated - dev-yourname — branched out from main, your personal workspace - dev-yourname-feature — one branch per feature, branched out from dev-yourname

(Fig. 9)

gitGraph commit id: "C1" commit id: "C2" branch dev-yourname checkout dev-yourname commit id: "D1" commit id: "D2" branch dev-yourname-feature checkout dev-yourname-feature commit id: "F1" commit id: "F2"

So — this is great, people can collaborate! But we can do even better.


Github / Gitlab

Now, how do we share a git project? I'll give you a hint: not Snapchat, not Instagram, sorry.

Yes, it's 2026 — via internet/cloud services. This is GitHub, GitLab, and others. You can see those as cloud service providers, and even as a code-sharing social network — but it's waaaayy more than that when you get to really advanced features: CI/CD, testing and deploying code, etc.

So you have your git project. It's local, offline:

(Fig. 10)

flowchart TB subgraph local["💻 Your machine (offline)"] LG[("git repo")] end subgraph cloud["☁️ GitLab (online)"] RG[("remote repo")] end subgraph others["👥 Others"] C1["coworker"] C2["anyone"] end LG <-->|"git push / git pull"| RG RG --> C1 RG --> C2

Now: if you ever do sudo rm -rf /* (but don't — really, don't, ever. I did NOT tell you to do that. Seriously kids, don't do it at home...), you lose everything locally. BUT things are saved on the cloud — cool, you actually didn't lose anything (RIP Ana's code...). Safe, we happy :)

So you copy, but it's not a dumb/simple copy — it's way smarter than that.

Now if you make modifications locally on a branch, the cloud copy's branch — the remote branch — diverges from your local one. So you need to explicitly tell git that it should update the remote branch.

For that, you use a remote — a connection between your local/offline git project and your remote one. You apply the commits that were applied to your local branch to your remote branch. We call this pushing your commits via a remote to your remote branch.

Same here — very difficult to understand clearly just from that sentence. It'll be much easier in practice, and the practice will help you make sense of it.

Ok, now — it's actually more than just saved on the cloud. It's a powerful, smart copy of your local git project, but you can also share your remote GitLab project with other people who have GitLab accounts. So now:

(see Fig. 10 above)

And that's essentially what GitHub and GitLab are: a place to host your git project online, share it, and collaborate with others.


A note on naming conventions

None of this has fixed names — but the community has settled on conventions that everyone follows, so you'll see the same names everywhere:

Branches: - main — the principal branch. Used to be called master (you'll still see that in older repos and tutorials). - dev-yourname, feature/my-feature, fix/some-bug — there are different styles, but the idea is always the same: one branch per person or per feature, named clearly.

Remotes: - origin — by convention, the remote pointing to your copy of the project (your fork, or the repo you cloned). - upstream — by convention, the remote pointing to the original repo, the one you forked from.

These are just names. You could call them anything. But everyone uses origin and upstream, so you should too — it makes things instantly readable to anyone who looks at your repo.


What goes into git — .gitignore and .git/info/exclude

Here's the actual rule — not just a rule of thumb:

Git tracking is for human-readable content.

Everything you put in git should be something a human can open, read, and understand. A quick overview:

Track Don't track
Source code Compiled binaries, executables
Scripts Generated files — plots, PDFs, reports
Config files Package manager installs (node_modules/, venv/, __pycache__/)
Markdown, text Images
Secrets, credentials, API keys — never, ever

Why does the human-readable rule matter? Because git tracks modifications.

A diff on a Python script shows you exactly what changed, line by line — you can read it, review it, understand it. A diff on a PNG or a PDF is binary gibberish. Git can store it, but it can't make sense of it and you can't review it.

And every new version of that binary adds its full weight to the repo — git can't delta-compress it the way it can text. Your repo bloats, cloning gets slow, your whole history becomes heavy to carry around. It's bad for you, bad for your collaborators, and bad for the tool.

But there's a deeper reason, especially for scientists:

You might think: "I want to keep my plots — if I store them in the repo I won't have to regenerate them later." That's a natural instinct.

But think about what happens in year 3 of your PhD, when you're writing your manuscript. You will want to regenerate those plots — different size, different dimensions, different resolution, different style, a different subset of data. Something will always change.

And the plot stored in git becomes dead weight: you're not using it, you can't modify it, and you've been dragging it around in the repo for years while it slows down every clone, every pull, every operation on the repo.

What you actually want to track, store, and keep is the code and the data that generate the plot — not the plot itself. That's the living, modifiable, reviewable thing. The plot is just an output.

So: no images, no plots, no PDFs, no executables. Keep git light and human-readable.

How to tell git what to ignore:

Create a .gitignore file at the root of your repo:

*.pdf
*.png
*.exe
build/
__pycache__/
.env

This file is committed and pushed — it's shared with everyone on the project, so nobody accidentally commits the wrong things.

There's also .git/info/exclude — same syntax, but purely local. Never committed, never shared. Useful for personal things you want to ignore without imposing them on the team: editor temp files, a local notes file, etc. In this very repo, CLAUDE.md is excluded that way — it's my local context file and doesn't belong in the shared history.

Templates exist online — gitignore.io and GitHub's collection are good starting points for common languages and frameworks. But writing one yourself is a worthwhile exercise: it forces you to think through what your project actually produces.


SmartGit / GitKraken / Git extensions in VSCode: GitLens, Git Graph

There are a bunch of modern tools you can use to have a better understanding, reading, and visualization of git/GitLab-related stuff. Just a few things about some of them:

  • SmartGit — paid, starts at around ~$59/year per user for a commercial license (perpetual licenses also available). With that price tag you can understand that it's aimed at people doing very complex, powerful things with git who need a clear vision of it — that's what SmartGit sells.

  • GitKraken — very powerful, feature-rich Git GUI. More accessible than SmartGit but still aimed at advanced users. Freemium model — the free tier covers basic use, paid tiers unlock more power and team features.

  • VSCode extensions:

  • Git Graph — recommended for beginners. Simple, free, does one thing: gives you a clean visual view of your branch history and state. Nothing more, nothing less — perfect to start with.
  • GitLens — more advanced. Covers everything from everyday use to powerful features, all directly in your editor. Has a solid free tier, but aimed at users who want to go further.