User Guide¶

This page includes example workflows that illustrate the usage of Seeq API. For a detailed overview of how Seeq API works, see the data model documentation (core entities and knowledge graph) and the API Reference.

Finding a Gene¶

Seeq API includes more than 60,000 human genes and recognizes all commonly used gene identifiers: HGNC, Entrez, Ensembl, canonical symbols and synonyms. Internally, Seeq uses Entrez IDs as the canonical identification system for genes.

Let’s say we want to find the full annotation for the HER2 gene.

First, we convert our gene symbol to its canonical representation in Seeq:

$ curl https://api.seeq.bio/genes/find?symbol=HER2

{
  "entrez_id": "2064",
  "name": "erb-b2 receptor tyrosine kinase 2",
  "canonical_symbol": "ERBB2",
  "hgnc_id": "HGNC:3430",
  "ensembl_id": "ENSG00000141736"
}

Now using the gene ID we can obtain the full gene annotation:

$ curl https://api.seeq.bio/genes/2064/info

{
  "gene": {
    "entrez_id": "2064",
    "name": "erb-b2 receptor tyrosine kinase 2",
    "canonical_symbol": "ERBB2",
    "hgnc_id": "HGNC:3430",
    "ensembl_id": "ENSG00000141736",
    "biotype": "protein_coding"
  },
  "genome_region": {
    "chromosome": "17",
    "start": 39687914,
    "end": 39730426,
    "assembly": "hg38"
  },
  "summary": {
    "source": "CIViC",
    "source_version": "CIViCGeneTable::01-Jul-2021::029095ea",
    "text": "ERBB2, commonly referred to as HER2, is amplified and/or overexpressed in 20-30% of invasive breast carcinomas ..."
  },
  "synonyms": [
    "HER-2/neu",
    "HER2",
    "VSCN2",
    "HER-2",
    "TKR1",
    "MLN 19",
    "CD340",
    "NGL",
    "NEU"
  ]
}

Variant Annotation¶

Seeq can annotate and classify any arbitrary mutation, even if that variant has not been observed in an existing database like dbSNP or ClinVar.

Seeq identifies variants via the tuple (chromosome, position, ref, alt), internally referred to as CPRA. This is represented in URL parameters as a string like g.2:208248389:G:A.

A variant annotation includes everything you need to know about the variant: affected gene and transcripts, molecular consequence, cDNA/CDS coordinates, frequency in population genetic databases, conservation scores, in-silico impact predictions, and cross-references to external registries of human variants.

$ curl https://api.seeq.bio/variants/g.2:208248389:G:A/info

{
  "variant": {
    "cpra": "g.2:208248389:G:A",
    "entrez_id": "3417",
    "gene_symbol": "IDH1",
    "gvariant": {
      "chrom": "2",
      "chrom_ref": "G",
      "chrom_alt": "A",
      "chrom_pos": [ 208248389, 208248389 ]
    },
    "nt_change": "c.394C>T",
    "aa_change": "R132C",
    "eal_cds_pos": [ 394, 394 ],
    "aa_pos": [ 132, 132 ],
    "mol_conseqs": [ "coding_sequence_variant", "missense_variant" ]
  },
  "summary": {
    "source": "CIViC",
    "source_version": "CIViCTable::01-Jul-2021::86a828ef",
    "text": "IDH1 R132 mutations have been observed in a number of cancer types ..."
  },
  "gvariants_info": [
    {
      "variant": "...",
      "xrefs": {
        "clinvar": "375891",
        "dbSNP": "rs121913499"
      },
      "mafs": [],
      "conservation_scores": [
        {
          "value": 0.706548,
          "value_display": "0.707",
          "source": "Human cell line (integrated fitCons)"
        },
        "..."
      ],
      "in_silico_predictions": [
        {
          "value": "D",
          "value_display": "Damaging",
          "source": "FATHMM",
          "significance": "pathogenic"
        },
        "..."
      ]
    }
  ]
}

Gene Panel for a Disease¶

A common task is identifying the genes of interest for a particular analysis. The Seeq API knowledge graph allows you to identify genes that are known to interact with a disease or drug of interest.

In this example, we will build a gene panel for Leukemia. This gene panel will include all genes that implicated in Leukemia and any of its subtypes, based on the latest body of clinical evidence available to Seeq.

First, let’s find the canonical identifier of our disease of interest in Seeq:

$ curl https://api.seeq.bio/diseases/search?query=leukemia

[
  {
    "disease": {
      "cui": "C0023418",
      "name": "Leukemia"
    },
    "evidence_count": 2867
  },
  "..."
]

Now that we have the disease id C0023418, we can ask Seeq about its associated genes.

$ curl https://api.seeq.bio/diseases/C0023418/genes | jq -r '.[] .hit.entrez_id'

2322
25
4869
5371
673
3815
3417
...

Here, the jq command allows us to extract only the gene identifiers of the genes of interest from the search results. You could achieve the same behavior by parsing the JSON output of Seeq API in your programming language of choice.

Drugs that target a variant¶

Let’s say your pipeline has identified a variant in a sequenced sample. A typical clinical report would include information about the existence and effectiveness of targeted treatments for that variant.

You can get this information from Seeq by specifying your variant of interest in its CPRA coordinates.

$ curl https://api.seeq.bio/variants/g.12:25245351:C:A/drugs

[
  {
    "hit": {
      "chembl_id": "CHEMBL1614701",
      "name": "Selumetinib",
      "synonyms": [ "Selumetinib sulfate" ],
      "fda_approved": true,
      "development_phase": 4
    },
    "treatability": {
      "entrez_id": "3845",
      "variant_name": "G12C",
      "drug": "...",
      "summary": {
        "max_clinsig": "positive",
        "evidence_count": 4,
        "text": "4 evidence records for Selumetinib targeting <b> KRAS G12C </b>, some show positive response."
      }
    }
  },
  "...",
]

FDA approved products that contain a drug¶

An important step in clinical genomic analysis pipelines is verification of drug approval status for targeted treatments. Let’s say we want to find out whether and which FDA approved products include the the chemical compound Sotorasib.

First, let’s find the canonical identifier of the molecule of interest

$ curl http://api.seeq.bio/drugs/search?query=sotorasib

[
  {
    "drug": {
      "chembl_id": "CHEMBL4535757",
      "name": "Sotorasib",
      "synonyms": [],
      "fda_approved": true
    },
    "evidence_count": 8,
    "matching_drug": "Sotorasib"
  }
]

Now we can use the identifier CHEMBL4535757 to find FDA approved drugs containing this molecule:

$ curl https://api.seeq.bio/drugs/CHEMBL4535757/info

{
  "drug": {
    "chembl_id": "CHEMBL4535757",
    "name": "Sotorasib",
    "synonyms": [],
    "fda_approved": true
  },
  "xrefs": {
    "drugbank": [],
    "fda": [
      {
        "appl_no": "214665",
        "product_no": "001",
        "trade_name": "Lumakras",
        "applicant": "Amgen Inc",
        "dosage_form": "tablet",
        "strength": "120MG",
        "route": "oral",
        "type": "orange",
        "approval_date": "May 28, 2021"
      }
    ]
  }
}