class: center, middle, inverse, title-slide # EBI visit (1-12 July) ## VSM Dictionaries and REST APIs ### John Zobolas
PhD Student ### Department of Biology, NTNU ### August 20th, 2019 --- layout: true <div class="my-footer"> <span> <img src="img/ntnu/ntnu-logo-neg.png" style="width:160px;height:30px;"> </span> </div> --- # Visit Goal ## Build .purple[vsm-dictionaries] - Uniprot (proteins) - Ensembl (genes) + EnsemblGenomes (for non-vertebrate species) - RNAcentral (non-coding RNA sequences) - Complex Portal (stable macromolecular complexes) -- </br> Connection with [BioPortal](https://github.com/vsmjs/vsm-dictionary-bioportal/) already implemented -- </br> ...for the [causal-builder](https://vtoure.github.io/causalBuilder/) (Vasundra, Steven) --- # Uniprot's REST API - REST API: `https://www.uniprot.org/uniprot/?query=id:P12345 &columns={column names comma separated}&format=tab` -- - Sadly no JSON! (**~2 YEARS**, work in progress) -- - [Proteins API](https://www.ebi.ac.uk/proteins/api/doc/) offers JSON but no filtering and no searching! -- - When programmers nowadays hear the word .large[`JSON`]: .center[<img src="img/gifs/dog_face.gif" style="width:300px;height:200px;">] --- # RNAcentral, Ensembl, ComplexPortal? Non-complete APIs (at least for what we wanted for our VSM-dictionaries) -- .larger[[EBI Search](https://www.ebi.ac.uk/ebisearch/apidoc.ebi) for the rescue!] -- - Offers *uniform access* to the biological data resources hosted at EBI - Each data source is a diffferent **domain** (e.g. uniprot, ensembl_gene) - **EBI Search as a Service** (Ensembl Genomes, RNAcentral) - Metadata table for each domain of interest ([RNAcentral](https://www.ebi.ac.uk/ebisearch/metadata.ebi?db=rnacentral)) - .green[Pagination] supported (`size`: 0 - 100, `start`: 0 - 1000000) -- .larger.purple[Examples] - [Get multiple IDs example](https://www.ebi.ac.uk/ebisearch/ws/rest/rnacentral/entry/URS000075D314_9606,URS0000000001_77133?fields=id%2Cname%2Cdescription%2Cgene%2Cgene_synonym%2Cactive%2Cexpert_db%2Crna_type%2Cspecies&format=json) - [Get search results for `tp53`](https://www.ebi.ac.uk/ebisearch/ws/rest/rnacentral?query=tp53*&fields=id%2Cname%2Cdescription%2Cgene%2Cgene_synonym%2Cactive%2Cexpert_db%2Crna_type%2Cspecies&format=json) -- More on their (last) [paper](https://academic.oup.com/nar/article/47/W1/W636/5446251): "The EMBL-EBI search and sequence analysis tools APIs in 2019" --- # Miscellaneous - .blue[RNAcentral consortium meeting] on November 15th (Hinxston) -- - Lead developer from Ensembl mentioned - a UK-based project to sequence thousands of microorganisms - an .blue[ongoing developing project] to completely change ensemble.org site, model format, API, add pagination, etc. (~ 2022) -- .pull-left[ - .purple[RRI perspective] - How to talk to people when you want something from their API that **is not there**! - And they tend to answer **their own** questions! ] -- .pull-right[ <img src="img/gifs/gilfoyle_waits.gif" style="width:320px;height:220px;"> ] -- - How one person can make a difference in technology (Reactome + GraphDB) --- class: center, middle <img src="img/ack/questions.jpeg"> # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The R code for [these slides](https://github.com/bblodfon/r-pres/blob/master/ebi_stay_july_2019.Rmd).