# Seeq Knowledge Graph The real power or Seeq API is in its knowledge graph built from thousands of publications and public datasets. The knowledge graph consists of multiple distinct graphs, each representing a specific semantic relationship between two or more [core entities](entities). The most important feature of the knowledge graph is its ability to slice and minify itself for a particular analysis. This allows us to ship a small version of the knowledge graph to the machine where analysis happens. This means you can use Seeq API to perform your analyses _where your data resides_, without having to send your sensitive data to our servers. ## Semantic atoms are n-ary relationships In Seeq API n-ary relationships are first class citizens. They are the foundation of how the data is modeled and how the search API works. An n-ary relationship, as opposed to a binary relationship, is one that involves multiple entities of different types. ![n-ary relationships](_static/img/n-ary.png) For example, one n-ary relationship could be concerned with evidence of pathogenicity of a certain variant in a certain disease. An instance of this relationship would correspond to a statement like: > Publication P supports pathogenicity of variant V of gene G in disease D. ## Pathogenicity Graph The pathogenicity graph is concerned with evidence of pathogenicity connecting variants and diseases backed by a number of publications. As an example graph to demonstrate the data model of Seeq's n-ary relationships, the diagram below depicts the data model for the pathogenicity relationship, and how its instances are involved in various search scenarios. ![Seeq Pathogenicity Graph](_static/img/seeq-pathogenicity-data-model.png) ## Treatability Graph The treatability graph is composed of multiple distinct graphs itself: 1. Targeted treatments for a variant in a certain disease: the entities involved in this graph are genes, variants, diseases, drugs, and publications. 2. Gene-drug interactions: the entities involved in this graph are genes, drugs, and publications. Assertions in this graph do not have a specific disease context nor do they involve a specific variant. 3. Drug-disease indications: the entities involved in this graph are drugs, diseases, and publications. Assertions in this graph do no have a specific drug or variant context. ## Graph Search Seeq's Graph search API includes two steps: search for connected entities, and extract supported evidence for each connection. ![graph search](_static/img/graph-search.png) ### Connected Entities The first set of search endpoints are concerned with the following problem. Given a specific query entity Q (say some gene), find all entities of type T (say diseases) that are connected to Q through either of the n-ary relationships. For example, let's say we want to find out all diseases that are connected to gene IDH1, ranked by number and strength of the available evidence: ```sh $ curl https://api.seeq.bio/genes/3417/diseases ``` ```{note} This endpoint corresponds to the search results you get on [this page of seeq.bio](https://seeq.bio/app/gene/3417/conditions). ``` The first hit is Acute myeloid leukemia (AML): ```json [ { "hit": { "cui": "C0023467", "name": "Acute myeloid leukemia" }, "treatability": { "disease": "...", "summary": { "max_clinsig": "positive", "evidence_count": 4, "text": "4 evidence records on targeted treatments in Acute myeloid leukemia with IDH1 mutations, some show positive response." } }, "pathogenicity": { "disease": "...", "summary": { "max_clinsig": "pathogenic", "evidence_count": 6, "text": "6 evidence records for IDH1 variants in Acute myeloid leukemia, some are pathogenic." } } }, "..." ] ``` Similar endpoints are available for all pairs of entities, see the API reference for more details. ### Supporting Evidence The second set of endpoints are concerned with the following problem. Given a specific query entity Q and a specific search hit entity H from the previous step, find all instances of all n-ary relationships that support that connection. In the example above, let's say we want to find the supporting evidence for the connection between the gene IDH1 and the disease AML. To do this, we need to query each of the implicated subgraphs separately. The pathogenicity evidence for the connection is backed by the [pathogenicity graph](#pathogenicity-graph): ```sh $ curl https://api.seeq.bio/genes/3417/diseases/C0023467/pathogenicity ``` Each entry in the response is a single instance of the n-ary pathogenicity relationship and counts as a unit of evidence for the connection between the gene and disease of interest: ```json [ { "gene": { "entrez_id": "3417", "canonical_symbol": "IDH1", "..." }, "disease": { "cui": "C0023467", "name": "Acute myeloid leukemia" }, "variant": "R132L", "clinsig": "pathogenic" "citation": { "source": "ClinVar", "variation_id": "375889", "pmids": [ 22160010, 22397365, 22417203, 22898539, 26619011 ], "publications": [ { "pmid": 26619011, "title": "Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity.", "journal": "Nat Biotechnol", "year": 2016, "first_author": "Chang" }, "...", ], }, }, "...", ] ``` And the other set of evidence is backed by the [treatability graph](#treatability-graph): ```sh $ curl https://api.seeq.bio/genes/3417/diseases/C0023467/treatability ``` Each entry in the response is a now single instance of the n-ary treatability relationship: ```json [ { "gene": { "entrez_id": "3417", "canonical_symbol": "IDH1", "..." }, "disease": { "cui": "C0023467", "name": "Acute myeloid leukemia" }, "variant": null, "drug": { "chembl_id": "CHEMBL3989958", "name": "Ivosidenib", "synonyms": [], "fda_approved": true, "development_phase": 4 }, "clinsig": "positive", "source": "CIViC", "citation": { "citation_type": "PubMed", "citation_ids": [ "29860938" ] "text": "I phase 1 trial for patients with IDH1-mutated AML, patients who received ivosidenib (AG-120). The rate of complete remission was 21.6%, the over all response rate was 30.4%.", "publications": [ { "pmid": 29860938, "title": "Durable Remissions with Ivosidenib in IDH1-Mutated Relapsed or Refractory AML.", "journal": "N Engl J Med", "year": 2018, "first_author": "DiNardo" } ] } }, "..." ] ``` ## MicroSeeq Seeq's knowledge graph is designed so that it can be sliced and minified, on demand, for each particular analysis context. This minified knowledge graph is called MicroSeeq. MicroSeeq enables the Seeq client-side SDK to build complex variant interpretation pipelines like [Seeq VCF][seeq-vcf] that run entirely on the client side (e.g. your browser, or your virtual machine) without your raw variant PHI leaving your device. To obtain a MicroSeeq, all you need to do is specify your genes of interest: ```sh $ curl https://api.seeq.bio/micro-seeq?ids=3417 # genes of interest: IDH1 ``` ```json { "skeletons": [ { "gene": { "entrez_id": "3417", "canonical_symbol": "IDH1", "..." }, "exons": [ { "ense": "ENSE00001801507", "enst": "ENST00000415913", "rank": 1, "chrom_start": 208254031, "chrom_end": 208254322, "fn_cdna_pos": 1, "ln_cdna_pos": 292, "..." }, "..." ], "transcript": { "ensg": "ENSG00000138413", "enst": "ENST00000415913", "ensp": "ENSP00000390265", "nm": null, "chrom": "2", "n_exons": 10, "fcn_cdna_pos": 383, "lcn_cdna_pos": 1627, "cds_length": 1245, "..." "cdna": "AGGGGAG...", "peptide": "MSKKISA..." }, "domains": [ { "pfam_id": "PF00180", "pfam_name": "Iso_dh", "name": "Isocitrate/isopropylmalate dehydrogenase", "summary": null, "aa_start": 11, "aa_end": 399 } ] } ], "evidence": [ { "gene": { "entrez_id": "3417", "canonical_symbol": "IDH1", "..." }, "pathogenicity": [ { "eal_cds_pos": 394, "max_clinsig_rank": 10 }, { "eal_cds_pos": 395, "max_clinsig_rank": 10 } ], "treatability": [ { "eal_cds_pos": 394, "max_clinsig_rank": 80 }, { "eal_cds_pos": 395, "max_clinsig_rank": 50 } ], "gene_treatability": 80 } ], "..." } ``` [seeq-vcf]: https://seeq.bio/app/seeq-vcf/about