Cladogram Builder

Build a cladogram from DNA sequence data. This tool walks through the same workflow used by professional biologists — fetch sequences from the NCBI GenBank database, align them, calculate genetic differences, and construct an evolutionary tree. Everything runs in your browser; no installs required.

Step 1

Choose species and a gene region

To build a cladogram we need the same gene from each species so we can compare differences. The default example uses five Eucalyptus species and the internal transcribed spacer (ITS) region — a non-coding stretch on the genes that make ribosomes (similar to introns). It evolves quickly, making it useful for distinguishing closely related plant species.

Gene / region search term

Other options to try: rbcL (plants), matK (plants), cytochrome b (animals), 16S ribosomal RNA (bacteria), COI (animals).

Species to compare

The scientific name is used to search NCBI. The display name is what appears on your tree.

How the search works: For each species we query NCBI GenBank using "Species name"[Title] AND your gene term. If multiple matching records exist, the first one is used.

Step 2

Fetched DNA sequences (FASTA)

These are the actual gene sequences pulled live from the NCBI GenBank database. Each entry has an accession number (the unique ID in the database), then a long string of A, T, G, and C — the four DNA bases. Notice how similar but not identical they look.

Step 3

Multiple sequence alignment

The sequences are now lined up base-by-base. Gaps (-) are inserted where a base appears in some species but not others — this happens because of insertions and deletions (indels) during evolution. An * below a column means every species has the same base there (a conserved position). Columns without an asterisk show variation between species — that's the raw data we'll use to build the tree.

What's happening: The alignment is performed by Clustal Omega at the EMBL-EBI in the UK — the same professional tool that researchers use for published phylogenetic studies. Your sequences are sent to their server, aligned using progressive alignment with a guide tree, and the result is sent back.

A — Adenine T — Thymine G — Guanine C — Cytosine – gap (insertion/deletion) * conserved column

Step 4

Genetic difference matrix

Now we count the differences between every pair of aligned sequences. This is the heart of the molecular clock idea — more differences means more time since the species shared a common ancestor (assuming mutations accumulate at a roughly steady rate). The smallest numbers indicate the most closely related species.

Note: only positions where both sequences have an actual base (not a gap) are compared. This way, a short or partial sequence doesn't get unfairly penalised for missing data — only real base differences count.

Your turn: Looking at the matrix above, which two species do you think are the most closely related? (They should appear as sister taxa on the cladogram.)

and

Step 5

Evolutionary tree

The tree is built by Clustal Omega using its guide-tree-based neighbour-joining approach. Toggle between cladogram view (branches all the same length — shows only the order of divergence) and phylogram view (branch lengths proportional to genetic distance — shows how much evolution has happened).

Reading the tree: Each terminal branch (the tip) is one species. Each node (where branches join) represents a hypothetical common ancestor. The root is the most recent common ancestor of everything in the tree. Two species sharing a recent node are more closely related than two species joined further down.

Cladogram vs phylogram — what's the difference?

A cladogram shows only the branching order of evolution — branch lengths carry no information. A phylogram additionally makes branch lengths proportional to genetic change (or time), so longer branches mean more evolution has happened along that lineage.