Rare variants

Discovery and interpretation

Peter Humburg

18th November 2015

Introduction

Interpreting patient genomes

  • Sequencing of patient genomes increasingly common
  • Can identify relevant variants
  • … amongst a large number of unrelated variants
  • … can be difficult to interpret
  • Computational strategies critical to obtaining good set of candidates

Identifying Novel Breast Cancer Risk Variants

Motivation

  • Several DNA repair genes implicated in breast and ovarian cancer susceptibility.
  • Strong evidence that rare loss-of-function variants confer increased risk.
  • Sequencing large number of patients not carrying known risk variants should lead to discovery of new ones.

Study design

  • Exons of 507 DNA repair genes in 1,150 unrelated patients.
  • Pools of 24 individuals.
  • Included 79 individuals with known mutations in breast cancer predisposition genes as positive controls.
  • No controls.
  • No barcoding.
  • Expect to do lots of Sanger sequencing in follow-up.

Analysis strategy

  • Sequence pools with GAIIx HiSeq2000.
  • Call variants in pools with Syzygy.
  • Annotate variants to identify loss of function.
  • Validate variants of interest.
  • Sequence relevant genes in control panel.

Achieved coverage

\(\gt\) 480\(\times\) coverage in 90% of target region

Variant calling

  • Syzygy called 34,564 variants in target region.
  • Performance for known variants:
    • 439/439 common SNPs
    • 24/26 rare SNPs
    • 51/54 rare (short) indels

Aside: Annotating variants

A simple plan

  • Use EnsEMBL annotations (via Perl API)
  • Identify protein truncating variants
  • Group variants by gene to identify candidates for follow-up

But it isn’t that easy…

Beware of transcript annotations

Beware of edge effects

Beware of misaligned indels

Back to the Breast Cancer Study

Selecting candidate genes

  • Identified 1,044 PTVs
  • Ranked genes by number of truncating mutations observed.
  • Identify candidate genes
  • Top ranking genes were BRCA2, CHEK1, ATM, BRCA1, …
  • Partially driven by positive controls.
  • First interesting gene on list was PPM1D with 5 PTV.
  • None of these PTVs present in 1000 Genomes.

Investigating PPM1D

  • PPM1D is a phosphatase
  • Phosphatase domain encoded by first 5 exons

Investigating PPM1D

All identified truncating mutations validated with Sanger sequencing.

Phase 2: Case-control study

  • Sequenced PPM1D an additional 2456 cases and 1347 controls.
  • Identified 10 additional PTVs (none in controls)

Phase 2: Case-control study

  • Sequenced final exon only in 5325 cases and 4514 controls.
  • Identified 15 additional PTVs in cases (1 in controls)

Case-control summary

Breast cancer Ovarian cancer controls
Sequenced 6,912 1,121 5,861
with PTV 18 12 1
relative risk 2.7 11.5
95% CI 1.3 - 5.3 4.3 - 30.4

How does it work?

Cells expressing truncated versions of PPM1D show reduced activation of p53 in response to ionizing radiation.

The Plot Thickens

A complication

  • Read counts for variant alleles appear low.
  • Difficult to assess in pools but also visible in trace data.
  • Consistently low frequency of PTVs.

Somatic variation?

  • Could indicate that these are somatic mutations
  • If these are germ line variants we should see them in children of carriers.

Further complications

  • PPM1D truncating mutations appear to be mosaic in lymphocytes.
  • What does PPM1D look like in the tumour tissue?
    • Deep sequencing of DNA from tumour, stromal tissue and blood in four cases.
    • Found expected mutations in blood but not in tumour or stroma

Discussion

Possible interpretations

  • Are these mutations present in cell of cancer origin but lost later?
  • Is oncogenesis driven by lymphocytes?
  • Are the PPM1D mutations only symptoms of an underlying problem that leads to cancer development in other tissues?
  • Are PPM1D mutations and cancer unrelated?

Loss during cancer development

  • Evidence for loss of heterozygosity at PPMID locus.
  • The lost haplotype is the one carrying the PTV in lymphocytes.
  • Unclear whether the mutation was present prior to LOH event.
  • Loss of heterozygosity in this region is common in breast and ovarian cancers.

Oncogenesis driven by lymphocytes

  • Only real evidence is absence of mutation in tumour.
  • Unclear what the mechanism would be.

Symptom of a bigger problem

  • Could be a sign of general genome instability.
  • This might lead to clonal expansion of cells with PPM1D PTVs as well as cancers.
  • Unclear what the driver of this would be.

Simply unrelated

Lessons Learned

Finding rare variants

  • Strategy to sequence as many cases as possible paid off.
  • Would not have found PPM1D PTVs if we had split initial sequencing between cases and controls.
  • A lot, but very focused, follow-up required.
  • Focus on candidate gene panel paid off for similar reasons
  • … but means we have no easy way to check for other shared genomic variation amongst PPM1D PTV carriers.

Somatic variation

  • Were lucky that study design was suited to discovery of somatic variation.
  • Can find somatic variants through deep sequencing
  • but proving that a variant is somatic can be difficult in absence of control tissue.
  • Variant frequency in gDNA and RNA may differ markedly.

Variant annotation

  • Be careful with automated annotations.
  • Have improved a lot over the last few years
  • … but can still be misleading or incomplete.
  • Consider PPM1D PTVs

Variant annotation

  • Be careful with automated annotations.
  • Have improved a lot over the last few years
  • … but can still be misleading or incomplete.
  • Consider PPM1D PTVs
    • Truncation of final exon.
    • (Correctly) predicted to escape nonsense mediated decay.
    • So not loss of function.
    • Doesn’t mean we should ignore it!

Acknowledgements

Acknowledgements

WTCHG

Peter Donnelly

Manuel Rivas

Andrew Rimmer

Davis McCarthy

ICR

Nazneen Rahman

Elise Ruark

Katie Snape