v1.5.0

change log

Models

One of the major updates in v1.5.0 is integrating AlphaFold v2.3.1 into ColabFold. This introduces a new fine-tuned model from Deepmind for multimer modeling. We enable this by default.

--model-type=auto specify which model to use.
- If auto, alphafold2_ptm is selected for monomer inputs, and alphafold2_multimer_v3 is selected for complex (multimer) inputs.
- Bonus: all models can be used for either monomer or multimer prediction.

bfloat16

bfloat16 is now enabled by default for both monomer and multimer models. For GPUs that have bfloat16 support, this should significantly reduce the VRAM used and make the computation at least 2X faster. Besides bfloat16 the other change is the fused triangle attention. These changes should allow inferences of much larger protein. (Note: due to slight numeric differences in computation, this may change the results slightly for low-confidence models.)

Recycles

For multimer modeling, it has been shown by AF2Complex people that increasing the number of recycles can help dramatically. For multimers, the max number of recycles was increased from 3 to 20!

--num-recycle= specify number of recycles to run. --recycle-early-stop-tolerance= specify when to stop.
- The tolerance is defined as the RMSD (difference in distance matrices, angstrom units) between recycles. If it drops below the specified value, the recycling will terminate.
- if not specified, num-recycles=20 recycle-early-stop-tolerance=0.5 is used for alphafold2_multimer_v3 and num-recycles=3 recycle-early-stop-tolerance=0.0 is used for alphafold2_ptm.
--save-recycles save models generated at all recycles.
- --save-all will do the same, but will also save all the intermediate outputs between recycles as a pickle file.

Sampling

Though the ability to subsample MSAs and enable dropouts has been available in the advanced notebook since day one, given recent community efforts showing these options are useful, we now add support for this in the main notebook. See: AFsample, Alamo et al. and Wayment-Steele et al..

--random-seed= Specify random seed.
--num-seeds= Number of seeds to try.
- Will iterate from range(random_seed, random_seed+num_seeds)
--use-dropout Activate dropouts during inference to sample from the uncertainty of the models.
--max-seq Number of sequence clusters to use. --max-extra-seq Number of extra sequences to use.
- These two options were previously set by --max-msa="max-seq:max-extra-seq", but are now split up to be more user-friendly.
- Reducing either option will make your model to be less certain about the prediction, and when combined with random seeds may allow sampling alternative conformations.
- --disable-cluster-profile for multimers we find reducing cluster size (max-seq) results in poor model quality due to more diverse profiles. Disabling profiles appears to fix this issue! We suggest using this flag in combination with --max-seq when introducing uncertainty in multimer sampling.

Other

--num-relax= Specify the number of top models to relax. --amber flag by default will trigger ALL models to be relaxed.
--recompile-padding= Now accepts an integer, which specifies how much to pad each input by, instead of factor. This is now only used if more than a single input is provided for "batch" computation.
--stop-at-score=[0,100] As soon as one of the recycles or models or random seeds reaches the specified score, the job will terminate.
- The metric used can be specified by the --rank=[auto,plddt,multimer,ptm,iptm] flag. For "auto", "multimer" is used for complexes and "plddt" is used for monomers. "multimer" metric is computed as 80*iptm + 20*ptm. Note, all metrics are now on a scale of 0 to 100.
--save-all will output a pickled file at each recycle, saving all the results as a dictionary of numpy arrays. This includes the single and pair representations. (if you only want to save the single or pair representations, you can use the old flags --save-single-representations and/or --save-pair-representations)

Bugfixes

ipTMscores and pTMscores were incorrectly computed if padding was used. The padded region was used in the computation. This only affects local users, as padding was disabled in Colab Notebook. Since padding was at most by factor of 1.1, this likely didn't have a big effect on the scores. The model quality/ranking is unaffected.
If you used the monomer model (alphafold_ptm) option for modeling complexes. The first full-length sequence was not defined.

How do I run ColabFold v1.5.0?

See notebook and instructions to run locally.

I don't like these changes... How do I run the old ColabFold v1.4.0?

See notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly