Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VEP: Improve documentation on SV support #799

Merged
merged 4 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion docs/htdocs/info/docs/tools/vep/script/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,10 @@ <h1 id="contents"><img src="/i/16/documentation.png"/> Documentation contents</h
<li><a href="vep_download.html#download">Download</a></li>
<li><a href="vep_download.html#new">What's new in release [[SPECIESDEFS::ENSEMBL_VERSION]]</a></li>
<li><a href="vep_download.html#installer">Installation</a></li>
<li><a href="vep_download.html#macos">Using VEP in macOS</a></li>
<li><a href="vep_download.html#windows">Using VEP in Windows</a></li>
<li><a href="vep_download.html#docker">Docker</a></li>
<li><a href="vep_download.html#singularity">Singularity</a></li>
</ul>

<br />
Expand Down Expand Up @@ -148,7 +150,10 @@ <h1 id="contents"><img src="/i/16/documentation.png"/> Documentation contents</h
<img src="/i/16/user.png"/> <a href="vep_example.html" class="notext" style="font-size:16px;font-weight:bold">Examples & use cases</a>
<ul>
<li><a href="vep_example.html#examples">Example commands</a></li>
<li><a href="vep_example.html#gnomade">gnomAD exomes and genomes</a></li>
<li><a href="vep_example.html#gnomad">gnomAD</a></li>
<li><a href="vep_example.html#gerp">Conservation scores</a></li>
<li><a href="vep_example.html#dbNSFP">dbNSFP</a></li>
<li><a href="vep_example.html#StructVar">Structural variants</a></li>
<li><a href="vep_example.html#citations">Citations and VEP users</a></li>
</ul>
</div>
Expand All @@ -162,6 +167,8 @@ <h1 id="contents"><img src="/i/16/documentation.png"/> Documentation contents</h
<li><a href="vep_other.html#pick">Summarising annotation</a></li>
<li><a href="vep_other.html#hgvs">HGVS notations</a></li>
<li><a href="vep_other.html#refseq">RefSeq transcripts</a></li>
<li><a href="vep_other.html#colocated">Colocated variants</a></li>
<li><a href="vep_other.html#shifting">Normalising consequences</a></li>
</ul>

<br />
Expand Down
84 changes: 78 additions & 6 deletions docs/htdocs/info/docs/tools/vep/script/vep_example.html
Original file line number Diff line number Diff line change
Expand Up @@ -272,25 +272,97 @@ <h2 id="StructVar">Structural Variants</h2>

<h4>Prediction process</h4>
<ul>
<li> The INFO keys 'END' or 'SVLEN' are present, the proportion of any overlapping feature covered by the variant is calculated
<li> If the SVTYPE or ALT is 'DEL', the variant tested for feature ablation/ truncation
<li> If the SVTYPE or ALT is 'DUP', the variant tested for feature amplification
<li> If the SVTYPE or ALT is 'INS' or 'DUP', the variant tested for feature elongatation
<li> SVTYPE is used in preference to ALT to derive the variant type of an SV with 'CN*' alleles
<li> If the INFO keys <code>END</code> or <code>SVLEN</code> are present, the proportion of any overlapping feature covered by the variant is calculated</li>
<li> The alternative allele (or <code>SVTYPE</code> in older VCF files) defines the type of structural variant; some types of structural variants are tested for specific consequences:</li>

<table class="ss">
<tr>
<th>Structural variant type</th>
<th>Abbreviation</th>
<th>Specific consequences</th>
</tr>
<tr class="bg1">
<td>Insertion</td>
<td>INS</td>
<td>Feature elongation</td>
</tr>
<tr class="bg2">
<td>Deletion</td>
<td>DEL</td>
<td>Feature truncation</td>
</tr>
<tr class="bg1">
<td>Duplication</td>
<td>DUP</td>
<td>Feature amplification/elongation</td>
</tr>
<tr class="bg2">
<td>Inversion</td>
<td>INV</td>
<td><i>Not tested for any specific consequence</i></td>
</tr>
<tr class="bg1">
<td>Copy number variation</td>
<td>CNV</td>
<td>Feature amplification/elongation (if copy number is 2) or truncation (if copy number is 0)</td>
</tr>
<tr class="bg2">
<td>Breakpoint variant</td>
<td>BND</td>
<td>Feature truncation</td>
</tr>
</table>
</ul>

<h5> Insertions and deletions</h5>

<ul>
<li> Supports <a href="/info/docs/tools/vep/vep_formats.html#sv">mobile element insertions/deletions</a>, including ALU, HERV, LINE1 and SVA elements
<ul>
<li> Currently, mobile element variants are treated as any insertion/deletion
</ul>
</ul>

<h5> Breakpoint variants</h5>

<ul>
<li> Supports chromosome synonyms in breakends (such as <code>chr4</code> and <code>NC_000004.12</code>)
<li> Processes <a href="/info/docs/tools/vep/vep_formats.html#sv">single breakends and multiple, comma-separated alternative breakends</a>
<li> Consequences are reported for each breakend; for instance, for a VCF input like <code>1 7936271 . N N[12:58877476[,N[X:10932343[</code>, it will report the consequences for each of the 3 breakends:
<ul>
<li> <code>N[12:58877476[</code>: consequences for the first alternative breakend near chr12:58877476
<li> <code>N[X:10932343[</code>: consequences for the second alternative breakend near chrX:10932343
<li> <code>N.</code>: consequences for the reference breakend near chr1:7936271 (represented as detailed in the <a href="https://samtools.github.io/hts-specs/VCFv4.4.pdf" rel="external">VCF 4.4 specification, section 5.4.9: Single breakends</a>)
</ul>
<li> In case of specific breakends not overlaping any reported Ensembl features (such as transcripts and regulatory regions), that specific breakend will <b>NOT</b> be presented in VEP output.
</ul>

<h4> Reported overlaps</h4>
<ul>
<li> VEP calculates the length and proportion of each genomic feature overlapped by a structural variant
<li> Use the <a href="vep_options.html#opt_overlaps">--overlaps</a> option to enable this when using VCF or tab format.
(This is reported by default in standard VEP and JSON format.)
<li> The keys bp_overlap and percentage_overlap are used in JSON format and OverlapBP and OverlapPC in other formats.
<li> The keys <code>bp_overlap</code> and <code>percentage_overlap</code> are used in JSON format and <code>OverlapBP</code> and <code>OverlapPC</code> in other formats.
</ul>

<h4> Plugin support</h4>

<ul>
<li> <a href="vep_plugins.html#CADD">CADD plugin</a>
<li> <a href="vep_plugins.html#Conservation">Conservation plugin</a>
<li> <a href="vep_plugins.html#NearestGene">NearestGene plugin</a>
<li> <a href="vep_plugins.html#Phenotypes">Phenotypes plugin</a>
<li> <a href="vep_plugins.html#StructuralVariantOverlap">StructuralVariantOverlap plugin</a>: please note that all features of this plugin have been ported to <a href="vep_custom.html">--custom annotation</a>, with additional improvements
<li> <a href="vep_plugins.html#TSSDistance">TSSDistance plugin</a>
</ul>

<h4> Changing memory requirements</h4>
<ul>
<li> By default, VEP does not annotate variants larger than 10M. If you are using the command
line tool, you can use the <a href="vep_options.html#opt_max_sv_size">--max_sv_size</a> option to modify this.
<ul>
<li> This limit is not associated with breakpoint variants: each breakend in a breakpoint variant is analysed by VEP as a single base (the alternative sequence is currently ignored).
</ul>
<li>By default, variants are analysed in batches of 5000. Using the <a href="vep_options.html#opt_buffer_size">--buffer_size</a>
option to reduce this can reduce memory requirements, especially if your data is sparse.
A smaller buffer size is essential when annotating structural variants with regulatory data.
Expand Down
146 changes: 90 additions & 56 deletions docs/htdocs/info/docs/tools/vep/vep_formats.html
Original file line number Diff line number Diff line change
Expand Up @@ -268,63 +268,7 @@ <h3 id="complex_vcf"> Complex VCF entries </h3>
behaviour. It is recommended to use the
<a href="script/vep_options.html#opt_allele_number">--allele_number</a> flag to track
the correspondence between alleles as input and how they appear in the output.</p>


<br />
<hr />
<h3 id="sv"> Structural variant types</h3>

<p>VEP can also call consequences on structural variants using the
following input formats:</p>
<ul>
<li><a href="#default">Default VEP input</a></li>
<li><a href="#region">REST-style regions</a></li>
<li> <a href="#id">Variant identifiers</a></li>
<li> <a href="#vcf">VCF</a></li>
</ul>

<p> To recognise a variant as a structural variant, the allele string
(or <code>SVTYPE</code> in the INFO column of the VCF format) must be set
to one of the currently supported values: </p>

<ul>
<li><b>INS</b> - insertion</li>
<li><b>DEL</b> - deletion</li>
<li><b>DUP</b> - duplication</li>
<li><b>TDUP</b> - tandem duplication</li>
<li><b>INV</b> - inversion</li>
<li><b>CNV</b> - copy number variation</li>
<ul>
<li>
The copy number value can be specified,
such as <kbd>&ltCN0&gt</kbd> or <kbd>&ltCN=4&gt</kbd>
</li>
</ul>
<li><b>BND</b> - breakend</li>
<ul>
<li>
In VCF, breakend replacements are inserted into the <code>ALT</code>
column and need to meet the
<a rel="external" href="http://samtools.github.io/hts-specs/">HTS specifications</a>,
such as <kbd>A[22:22893780[,A[X:10932343[</kbd>.
</li>
</ul>
</ul>

<p> Examples of structural variants encoded in VCF format: </p>

<pre class="code sh_sh">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 160283 dup . &lt;DUP&gt; . . SVTYPE=DUP;END=471362
1 1385015 del . &lt;DEL&gt; . . SVTYPE=DEL;END=1387562
1 7936271 bnd N N[12:58877476[ . . SVTYPE=BND</pre>

<p> See the <a
rel="external" href="http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/VCF%20%28Variant%20Call%20Format%29%20version%204.0/encoding-structural-variants">VCF
definition document</a> for more detail on how to describe
structural variants in VCF format. </p>


<br />
<hr />
<h2 id="hgvs"> HGVS identifiers</h2>
Expand Down Expand Up @@ -459,6 +403,96 @@ <h2 id="region"> REST-style regions</h2>
# structural variant: inversion
21:25587759-25587769/INV</pre>

<br />
<hr />
<h2 id="sv"> Structural variant types</h2>

<p>VEP can also call consequences on structural variants using the
following input formats:</p>
<ul>
<li><a href="#default">Default VEP input</a></li>
<li><a href="#region">REST-style regions</a></li>
<li> <a href="#id">Variant identifiers</a></li>
<li> <a href="#vcf">VCF</a></li>
</ul>

<p> To recognise a variant as a structural variant, the allele string
(or <code>SVTYPE</code> in the INFO column of the VCF format) must be set
to one of the currently supported values: </p>

<ul>
<li><b>INS</b> - insertion</li>
<ul>
<li><b>INS:ME</b> - insertion of mobile element</li>
<li><b>INS:ME:ALU</b> - insertion of ALU element</li>
<li><b>INS:ME:HERV</b> - insertion of HERV element</li>
<li><b>INS:ME:LINE1</b> - insertion of LINE1 element</li>
<li><b>INS:ME:SVA</b> - insertion of SVA element</li>
</ul>

<li><b>DEL</b> - deletion</li>
<ul>
<li><b>DEL:ME</b> - deletion of mobile element</li>
<li><b>DEL:ME:ALU</b> - deletion of ALU element</li>
<li><b>DEL:ME:HERV</b> - deletion of HERV element</li>
<li><b>DEL:ME:LINE1</b> - deletion of LINE1 element</li>
<li><b>DEL:ME:SVA</b> - deletion of SVA element</li>
</ul>

<li><b>DUP</b> - duplication</li>
<ul>
<li><b>DUP:TANDEM</b> - tandem duplication</li>
<li><b>TDUP</b> - tandem duplication</li>
</ul>
<li><b>INV</b> - inversion</li>

<li><b>CNV</b> - copy number variation</li>
<ul>
<li>
The copy number value can be specified like so:
<ul>
<li><kbd>CN0</kbd></li>
<li><kbd>CN=4</kbd></li>
<li><kbd>CN3,CN4,CN6</kbd></li>
<li><kbd>CN=0,CN=2,CN=4</kbd></li>
</ul>
</li>
<li><b>CNV:TR</b> - tandem repeats</li>
<ul>
<li> Requires <code>INFO</code> fields describing the tandem repeat, such as <code>RUS</code> and <code>RN</code> – check <a href="https://samtools.github.io/hts-specs/VCFv4.4.pdf" rel="external">VCF 4.4 specification, section 5.7</a>
<li> Currently, the <code>CIRUC</code> and <code>CIRB</code> <code>INFO</code> fields are ignored when calculating alternative alleles in tandem repeats
</ul>
</ul>

<li><b>BND</b> - chromosome breakpoints</li>
<ul>
<li> Breakpoint variants are composed by one or more breakends
<li>
In VCF, breakend replacements are inserted into the <code>ALT</code>
column and need to meet the
<a rel="external" href="http://samtools.github.io/hts-specs/">HTS specifications</a>,
such as <kbd>TG[12:58877476[</kbd>
</li>
<li>Single breakends can be specified in <code>ALT</code>, such as <kbd>T.</kbd> and <kbd>.G</kbd></li>
<li>Multiple, comma-separated alternative breakends can be specified in <code>ALT</code>, such as <kbd>A[22:22893780[,A[X:10932343[</kbd></li>
</ul>
</ul>

<p>More information on how VEP processes structural variants can be found <a href="script/vep_example.html#StructVar">here</a>.</p>

<h3> Examples of structural variants encoded in VCF format </h3>

<pre class="code sh_sh">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 160283 dup . &lt;DUP&gt; . . SVTYPE=DUP;END=471362
1 1385015 del . &lt;DEL&gt; . . SVTYPE=DEL;END=1387562
1 7936271 bnd N N[12:58877476[ . . SVTYPE=BND</pre>

<p> See the <a
rel="external" href="http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/VCF%20%28Variant%20Call%20Format%29%20version%204.0/encoding-structural-variants">VCF
definition document</a> for more detail on how to describe
structural variants in VCF format. </p>

<!-- Output formats -->
<br />
<hr style="margin-bottom:0"/>
Expand Down