Skip to content

Commit

Permalink
VEP: Improve documentation on SV support (#799)
Browse files Browse the repository at this point in the history
Co-authored-by: Syed Nakib Hossain <snhossain@ebi.ac.uk>
  • Loading branch information
nuno-agostinho and nakib103 authored Jul 2, 2024
1 parent e85c335 commit b39c6a6
Show file tree
Hide file tree
Showing 3 changed files with 176 additions and 63 deletions.
9 changes: 8 additions & 1 deletion docs/htdocs/info/docs/tools/vep/script/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,10 @@ <h1 id="contents"><img src="/i/16/documentation.png"/> Documentation contents</h
<li><a href="vep_download.html#download">Download</a></li>
<li><a href="vep_download.html#new">What's new in release [[SPECIESDEFS::ENSEMBL_VERSION]]</a></li>
<li><a href="vep_download.html#installer">Installation</a></li>
<li><a href="vep_download.html#macos">Using VEP in macOS</a></li>
<li><a href="vep_download.html#windows">Using VEP in Windows</a></li>
<li><a href="vep_download.html#docker">Docker</a></li>
<li><a href="vep_download.html#singularity">Singularity</a></li>
</ul>

<br />
Expand Down Expand Up @@ -148,7 +150,10 @@ <h1 id="contents"><img src="/i/16/documentation.png"/> Documentation contents</h
<img src="/i/16/user.png"/> <a href="vep_example.html" class="notext" style="font-size:16px;font-weight:bold">Examples & use cases</a>
<ul>
<li><a href="vep_example.html#examples">Example commands</a></li>
<li><a href="vep_example.html#gnomade">gnomAD exomes and genomes</a></li>
<li><a href="vep_example.html#gnomad">gnomAD</a></li>
<li><a href="vep_example.html#gerp">Conservation scores</a></li>
<li><a href="vep_example.html#dbNSFP">dbNSFP</a></li>
<li><a href="vep_example.html#StructVar">Structural variants</a></li>
<li><a href="vep_example.html#citations">Citations and VEP users</a></li>
</ul>
</div>
Expand All @@ -162,6 +167,8 @@ <h1 id="contents"><img src="/i/16/documentation.png"/> Documentation contents</h
<li><a href="vep_other.html#pick">Summarising annotation</a></li>
<li><a href="vep_other.html#hgvs">HGVS notations</a></li>
<li><a href="vep_other.html#refseq">RefSeq transcripts</a></li>
<li><a href="vep_other.html#colocated">Colocated variants</a></li>
<li><a href="vep_other.html#shifting">Normalising consequences</a></li>
</ul>

<br />
Expand Down
84 changes: 78 additions & 6 deletions docs/htdocs/info/docs/tools/vep/script/vep_example.html
Original file line number Diff line number Diff line change
Expand Up @@ -272,25 +272,97 @@ <h2 id="StructVar">Structural Variants</h2>

<h4>Prediction process</h4>
<ul>
<li> The INFO keys 'END' or 'SVLEN' are present, the proportion of any overlapping feature covered by the variant is calculated
<li> If the SVTYPE or ALT is 'DEL', the variant tested for feature ablation/ truncation
<li> If the SVTYPE or ALT is 'DUP', the variant tested for feature amplification
<li> If the SVTYPE or ALT is 'INS' or 'DUP', the variant tested for feature elongatation
<li> SVTYPE is used in preference to ALT to derive the variant type of an SV with 'CN*' alleles
<li> If the INFO keys <code>END</code> or <code>SVLEN</code> are present, the proportion of any overlapping feature covered by the variant is calculated</li>
<li> The alternative allele (or <code>SVTYPE</code> in older VCF files) defines the type of structural variant; some types of structural variants are tested for specific consequences:</li>

<table class="ss">
<tr>
<th>Structural variant type</th>
<th>Abbreviation</th>
<th>Specific consequences</th>
</tr>
<tr class="bg1">
<td>Insertion</td>
<td>INS</td>
<td>Feature elongation</td>
</tr>
<tr class="bg2">
<td>Deletion</td>
<td>DEL</td>
<td>Feature truncation</td>
</tr>
<tr class="bg1">
<td>Duplication</td>
<td>DUP</td>
<td>Feature amplification/elongation</td>
</tr>
<tr class="bg2">
<td>Inversion</td>
<td>INV</td>
<td><i>Not tested for any specific consequence</i></td>
</tr>
<tr class="bg1">
<td>Copy number variation</td>
<td>CNV</td>
<td>Feature amplification/elongation (if copy number is 2) or truncation (if copy number is 0)</td>
</tr>
<tr class="bg2">
<td>Breakpoint variant</td>
<td>BND</td>
<td>Feature truncation</td>
</tr>
</table>
</ul>

<h5> Insertions and deletions</h5>

<ul>
<li> Supports <a href="/info/docs/tools/vep/vep_formats.html#sv">mobile element insertions/deletions</a>, including ALU, HERV, LINE1 and SVA elements
<ul>
<li> Currently, mobile element variants are treated as any insertion/deletion
</ul>
</ul>

<h5> Breakpoint variants</h5>

<ul>
<li> Supports chromosome synonyms in breakends (such as <code>chr4</code> and <code>NC_000004.12</code>)
<li> Processes <a href="/info/docs/tools/vep/vep_formats.html#sv">single breakends and multiple, comma-separated alternative breakends</a>
<li> Consequences are reported for each breakend; for instance, for a VCF input like <code>1 7936271 . N N[12:58877476[,N[X:10932343[</code>, it will report the consequences for each of the 3 breakends:
<ul>
<li> <code>N[12:58877476[</code>: consequences for the first alternative breakend near chr12:58877476
<li> <code>N[X:10932343[</code>: consequences for the second alternative breakend near chrX:10932343
<li> <code>N.</code>: consequences for the reference breakend near chr1:7936271 (represented as detailed in the <a href="https://samtools.github.io/hts-specs/VCFv4.4.pdf" rel="external">VCF 4.4 specification, section 5.4.9: Single breakends</a>)
</ul>
<li> In case of specific breakends not overlaping any reported Ensembl features (such as transcripts and regulatory regions), that specific breakend will <b>NOT</b> be presented in VEP output.
</ul>

<h4> Reported overlaps</h4>
<ul>
<li> VEP calculates the length and proportion of each genomic feature overlapped by a structural variant
<li> Use the <a href="vep_options.html#opt_overlaps">--overlaps</a> option to enable this when using VCF or tab format.
(This is reported by default in standard VEP and JSON format.)
<li> The keys bp_overlap and percentage_overlap are used in JSON format and OverlapBP and OverlapPC in other formats.
<li> The keys <code>bp_overlap</code> and <code>percentage_overlap</code> are used in JSON format and <code>OverlapBP</code> and <code>OverlapPC</code> in other formats.
</ul>

<h4> Plugin support</h4>

<ul>
<li> <a href="vep_plugins.html#CADD">CADD plugin</a>
<li> <a href="vep_plugins.html#Conservation">Conservation plugin</a>
<li> <a href="vep_plugins.html#NearestGene">NearestGene plugin</a>
<li> <a href="vep_plugins.html#Phenotypes">Phenotypes plugin</a>
<li> <a href="vep_plugins.html#StructuralVariantOverlap">StructuralVariantOverlap plugin</a>: please note that all features of this plugin have been ported to <a href="vep_custom.html">--custom annotation</a>, with additional improvements
<li> <a href="vep_plugins.html#TSSDistance">TSSDistance plugin</a>
</ul>

<h4> Changing memory requirements</h4>
<ul>
<li> By default, VEP does not annotate variants larger than 10M. If you are using the command
line tool, you can use the <a href="vep_options.html#opt_max_sv_size">--max_sv_size</a> option to modify this.
<ul>
<li> This limit is not associated with breakpoint variants: each breakend in a breakpoint variant is analysed by VEP as a single base (the alternative sequence is currently ignored).
</ul>
<li>By default, variants are analysed in batches of 5000. Using the <a href="vep_options.html#opt_buffer_size">--buffer_size</a>
option to reduce this can reduce memory requirements, especially if your data is sparse.
A smaller buffer size is essential when annotating structural variants with regulatory data.
Expand Down
146 changes: 90 additions & 56 deletions docs/htdocs/info/docs/tools/vep/vep_formats.html
Original file line number Diff line number Diff line change
Expand Up @@ -268,63 +268,7 @@ <h3 id="complex_vcf"> Complex VCF entries </h3>
behaviour. It is recommended to use the
<a href="script/vep_options.html#opt_allele_number">--allele_number</a> flag to track
the correspondence between alleles as input and how they appear in the output.</p>


<br />
<hr />
<h3 id="sv"> Structural variant types</h3>

<p>VEP can also call consequences on structural variants using the
following input formats:</p>
<ul>
<li><a href="#default">Default VEP input</a></li>
<li><a href="#region">REST-style regions</a></li>
<li> <a href="#id">Variant identifiers</a></li>
<li> <a href="#vcf">VCF</a></li>
</ul>

<p> To recognise a variant as a structural variant, the allele string
(or <code>SVTYPE</code> in the INFO column of the VCF format) must be set
to one of the currently supported values: </p>

<ul>
<li><b>INS</b> - insertion</li>
<li><b>DEL</b> - deletion</li>
<li><b>DUP</b> - duplication</li>
<li><b>TDUP</b> - tandem duplication</li>
<li><b>INV</b> - inversion</li>
<li><b>CNV</b> - copy number variation</li>
<ul>
<li>
The copy number value can be specified,
such as <kbd>&ltCN0&gt</kbd> or <kbd>&ltCN=4&gt</kbd>
</li>
</ul>
<li><b>BND</b> - breakend</li>
<ul>
<li>
In VCF, breakend replacements are inserted into the <code>ALT</code>
column and need to meet the
<a rel="external" href="http://samtools.github.io/hts-specs/">HTS specifications</a>,
such as <kbd>A[22:22893780[,A[X:10932343[</kbd>.
</li>
</ul>
</ul>

<p> Examples of structural variants encoded in VCF format: </p>

<pre class="code sh_sh">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 160283 dup . &lt;DUP&gt; . . SVTYPE=DUP;END=471362
1 1385015 del . &lt;DEL&gt; . . SVTYPE=DEL;END=1387562
1 7936271 bnd N N[12:58877476[ . . SVTYPE=BND</pre>

<p> See the <a
rel="external" href="http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/VCF%20%28Variant%20Call%20Format%29%20version%204.0/encoding-structural-variants">VCF
definition document</a> for more detail on how to describe
structural variants in VCF format. </p>


<br />
<hr />
<h2 id="hgvs"> HGVS identifiers</h2>
Expand Down Expand Up @@ -459,6 +403,96 @@ <h2 id="region"> REST-style regions</h2>
# structural variant: inversion
21:25587759-25587769/INV</pre>

<br />
<hr />
<h2 id="sv"> Structural variant types</h2>

<p>VEP can also call consequences on structural variants using the
following input formats:</p>
<ul>
<li><a href="#default">Default VEP input</a></li>
<li><a href="#region">REST-style regions</a></li>
<li> <a href="#id">Variant identifiers</a></li>
<li> <a href="#vcf">VCF</a></li>
</ul>

<p> To recognise a variant as a structural variant, the allele string
(or <code>SVTYPE</code> in the INFO column of the VCF format) must be set
to one of the currently supported values: </p>

<ul>
<li><b>INS</b> - insertion</li>
<ul>
<li><b>INS:ME</b> - insertion of mobile element</li>
<li><b>INS:ME:ALU</b> - insertion of ALU element</li>
<li><b>INS:ME:HERV</b> - insertion of HERV element</li>
<li><b>INS:ME:LINE1</b> - insertion of LINE1 element</li>
<li><b>INS:ME:SVA</b> - insertion of SVA element</li>
</ul>

<li><b>DEL</b> - deletion</li>
<ul>
<li><b>DEL:ME</b> - deletion of mobile element</li>
<li><b>DEL:ME:ALU</b> - deletion of ALU element</li>
<li><b>DEL:ME:HERV</b> - deletion of HERV element</li>
<li><b>DEL:ME:LINE1</b> - deletion of LINE1 element</li>
<li><b>DEL:ME:SVA</b> - deletion of SVA element</li>
</ul>

<li><b>DUP</b> - duplication</li>
<ul>
<li><b>DUP:TANDEM</b> - tandem duplication</li>
<li><b>TDUP</b> - tandem duplication</li>
</ul>
<li><b>INV</b> - inversion</li>

<li><b>CNV</b> - copy number variation</li>
<ul>
<li>
The copy number value can be specified like so:
<ul>
<li><kbd>CN0</kbd></li>
<li><kbd>CN=4</kbd></li>
<li><kbd>CN3,CN4,CN6</kbd></li>
<li><kbd>CN=0,CN=2,CN=4</kbd></li>
</ul>
</li>
<li><b>CNV:TR</b> - tandem repeats</li>
<ul>
<li> Requires <code>INFO</code> fields describing the tandem repeat, such as <code>RUS</code> and <code>RN</code> – check <a href="https://samtools.github.io/hts-specs/VCFv4.4.pdf" rel="external">VCF 4.4 specification, section 5.7</a>
<li> Currently, the <code>CIRUC</code> and <code>CIRB</code> <code>INFO</code> fields are ignored when calculating alternative alleles in tandem repeats
</ul>
</ul>

<li><b>BND</b> - chromosome breakpoints</li>
<ul>
<li> Breakpoint variants are composed by one or more breakends
<li>
In VCF, breakend replacements are inserted into the <code>ALT</code>
column and need to meet the
<a rel="external" href="http://samtools.github.io/hts-specs/">HTS specifications</a>,
such as <kbd>TG[12:58877476[</kbd>
</li>
<li>Single breakends can be specified in <code>ALT</code>, such as <kbd>T.</kbd> and <kbd>.G</kbd></li>
<li>Multiple, comma-separated alternative breakends can be specified in <code>ALT</code>, such as <kbd>A[22:22893780[,A[X:10932343[</kbd></li>
</ul>
</ul>

<p>More information on how VEP processes structural variants can be found <a href="script/vep_example.html#StructVar">here</a>.</p>

<h3> Examples of structural variants encoded in VCF format </h3>

<pre class="code sh_sh">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 160283 dup . &lt;DUP&gt; . . SVTYPE=DUP;END=471362
1 1385015 del . &lt;DEL&gt; . . SVTYPE=DEL;END=1387562
1 7936271 bnd N N[12:58877476[ . . SVTYPE=BND</pre>

<p> See the <a
rel="external" href="http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/VCF%20%28Variant%20Call%20Format%29%20version%204.0/encoding-structural-variants">VCF
definition document</a> for more detail on how to describe
structural variants in VCF format. </p>

<!-- Output formats -->
<br />
<hr style="margin-bottom:0"/>
Expand Down

0 comments on commit b39c6a6

Please sign in to comment.