# User Guide

This page includes example workflows that illustrate the usage of Seeq API. For
a detailed overview of how Seeq API works, see the data model documentation
([core entities](entities) and [knowledge graph](knowledge-graph)) and the
<a href="/docs">API Reference</a>.

## Finding a Gene

Seeq API includes more than 60,000 human genes and recognizes all commonly used
gene identifiers: HGNC, Entrez, Ensembl, canonical symbols and synonyms.
Internally, Seeq uses Entrez IDs as the canonical identification system for
genes.

Let's say we want to find the full annotation for the [HER2] gene.

[HER2]: https://en.wikipedia.org/wiki/HER2/neu

First, we convert our gene symbol to its canonical representation in Seeq:

```sh
$ curl https://api.seeq.bio/genes/find?symbol=HER2
```

```json
{
  "entrez_id": "2064",
  "name": "erb-b2 receptor tyrosine kinase 2",
  "canonical_symbol": "ERBB2",
  "hgnc_id": "HGNC:3430",
  "ensembl_id": "ENSG00000141736"
}
```

Now using the gene ID we can obtain the full [gene annotation]:

[gene annotation]: /docs#operation/get_gene_info_genes__id__info_get

```sh
$ curl https://api.seeq.bio/genes/2064/info
```

```json
{
  "gene": {
    "entrez_id": "2064",
    "name": "erb-b2 receptor tyrosine kinase 2",
    "canonical_symbol": "ERBB2",
    "hgnc_id": "HGNC:3430",
    "ensembl_id": "ENSG00000141736",
    "biotype": "protein_coding"
  },
  "genome_region": {
    "chromosome": "17",
    "start": 39687914,
    "end": 39730426,
    "assembly": "hg38"
  },
  "summary": {
    "source": "CIViC",
    "source_version": "CIViCGeneTable::01-Jul-2021::029095ea",
    "text": "ERBB2, commonly referred to as HER2, is amplified and/or overexpressed in 20-30% of invasive breast carcinomas ..."
  },
  "synonyms": [
    "HER-2/neu",
    "HER2",
    "VSCN2",
    "HER-2",
    "TKR1",
    "MLN 19",
    "CD340",
    "NGL",
    "NEU"
  ]
}
```

-----

## Variant Annotation

Seeq can annotate and classify any arbitrary mutation, even if that variant has
not been observed in an existing database like dbSNP or ClinVar.

Seeq identifies variants via the tuple `(chromosome, position, ref, alt)`,
internally referred to as CPRA. This is represented in URL parameters as a
string like `g.2:208248389:G:A`.

A [variant annotation] includes everything you need to know about the variant:
affected gene and transcripts, molecular consequence, cDNA/CDS coordinates,
frequency in population genetic databases, conservation scores, in-silico impact
predictions, and cross-references to external registries of human variants.

[variant annotation]: /docs#operation/get_variant_info_variants__cpra__info_get

```sh
$ curl https://api.seeq.bio/variants/g.2:208248389:G:A/info
```

```json
{
  "variant": {
    "cpra": "g.2:208248389:G:A",
    "entrez_id": "3417",
    "gene_symbol": "IDH1",
    "gvariant": {
      "chrom": "2",
      "chrom_ref": "G",
      "chrom_alt": "A",
      "chrom_pos": [ 208248389, 208248389 ]
    },
    "nt_change": "c.394C>T",
    "aa_change": "R132C",
    "eal_cds_pos": [ 394, 394 ],
    "aa_pos": [ 132, 132 ],
    "mol_conseqs": [ "coding_sequence_variant", "missense_variant" ]
  },
  "summary": {
    "source": "CIViC",
    "source_version": "CIViCTable::01-Jul-2021::86a828ef",
    "text": "IDH1 R132 mutations have been observed in a number of cancer types ..."
  },
  "gvariants_info": [
    {
      "variant": "...",
      "xrefs": {
        "clinvar": "375891",
        "dbSNP": "rs121913499"
      },
      "mafs": [],
      "conservation_scores": [
        {
          "value": 0.706548,
          "value_display": "0.707",
          "source": "Human cell line (integrated fitCons)"
        },
        "..."
      ],
      "in_silico_predictions": [
        {
          "value": "D",
          "value_display": "Damaging",
          "source": "FATHMM",
          "significance": "pathogenic"
        },
        "..."
      ]
    }
  ]
}
```

## Gene Panel for a Disease

A common task is identifying the genes of interest for a particular analysis.
The Seeq API knowledge graph allows you to identify genes that are known to
interact with a disease or drug of interest.

In this example, we will build a gene panel for [Leukemia]. This gene panel will
include all genes that implicated in Leukemia and any of its
subtypes, based on the latest body of clinical evidence available to Seeq.

[Leukemia]: https://seeq.bio/app/condition/C0023418/

First, let's find the canonical identifier of our disease of interest in Seeq:

```sh
$ curl https://api.seeq.bio/diseases/search?query=leukemia
```

```json
[
  {
    "disease": {
      "cui": "C0023418",
      "name": "Leukemia"
    },
    "evidence_count": 2867
  },
  "..."
]
```

Now that we have the disease id `C0023418`, we can ask Seeq about its [associated genes].

[associated genes]: /docs#operation/search_disease2gene_diseases__id__genes_get

```sh
$ curl https://api.seeq.bio/diseases/C0023418/genes | jq -r '.[] .hit.entrez_id'

2322
25
4869
5371
673
3815
3417
...
```

Here, the `jq` command allows us to extract only the gene identifiers of the
genes of interest from the search results. You could achieve the same behavior
by parsing the JSON output of Seeq API in your programming language of choice.

## Drugs that target a variant

Let's say your pipeline has identified a variant in a sequenced sample. A
typical clinical report would include information about the existence and
effectiveness of targeted treatments for that variant.

You can get this information from Seeq by specifying your variant of interest in
its [CPRA coordinates](#variant-annotation).

```sh
$ curl https://api.seeq.bio/variants/g.12:25245351:C:A/drugs
```
```json
[
  {
    "hit": {
      "chembl_id": "CHEMBL1614701",
      "name": "Selumetinib",
      "synonyms": [ "Selumetinib sulfate" ],
      "fda_approved": true,
      "development_phase": 4
    },
    "treatability": {
      "entrez_id": "3845",
      "variant_name": "G12C",
      "drug": "...",
      "summary": {
        "max_clinsig": "positive",
        "evidence_count": 4,
        "text": "4 evidence records for Selumetinib targeting <b> KRAS G12C </b>, some show positive response."
      }
    }
  },
  "...",
]
```

## FDA approved products that contain a drug

An important step in clinical genomic analysis pipelines is verification of drug
approval status for targeted treatments. Let's say we want to find out whether
and which FDA approved products include the the chemical compound
[Sotorasib](https://en.wikipedia.org/wiki/Sotorasib).

First, let's find the canonical identifier of the molecule of interest

```sh
$ curl http://api.seeq.bio/drugs/search?query=sotorasib
```
```json
[
  {
    "drug": {
      "chembl_id": "CHEMBL4535757",
      "name": "Sotorasib",
      "synonyms": [],
      "fda_approved": true
    },
    "evidence_count": 8,
    "matching_drug": "Sotorasib"
  }
]
```
Now we can use the identifier `CHEMBL4535757` to find FDA approved drugs
containing this molecule:

```sh
$ curl https://api.seeq.bio/drugs/CHEMBL4535757/info
```

```json
{
  "drug": {
    "chembl_id": "CHEMBL4535757",
    "name": "Sotorasib",
    "synonyms": [],
    "fda_approved": true
  },
  "xrefs": {
    "drugbank": [],
    "fda": [
      {
        "appl_no": "214665",
        "product_no": "001",
        "trade_name": "Lumakras",
        "applicant": "Amgen Inc",
        "dosage_form": "tablet",
        "strength": "120MG",
        "route": "oral",
        "type": "orange",
        "approval_date": "May 28, 2021"
      }
    ]
  }
}
```