class: center, middle, inverse, title-slide # PhD Thesis Presentation ## Software implementations allowing new approaches toward data analysis, modeling and curation of biological knowledge for Systems Medicine ### John Zobolas
PhD Candidate ### Department of Biology, NTNU ### August 10th, 2021 --- layout: true <div class="my-footer"> <span> <img src="img/ntnu/ntnu-logo-neg.png" style="width:160px;height:30px;"> </span> </div> --- class: middle # Outline # 1) .large.blue[Introduction] ---- # 2) .large.blue[Thesis Papers] ---- # 3) .large.blue[Take-home messages] --- class: center, middle # .larger.blue[Introduction] ---- # What is the problem? --- # What do people die from? .center[![:scale 80%](img/thesis/annual-number-of-deaths-by-cause.svg)] .purple.small[Ritchie & Roser (2018). Causes of Death. https://ourworldindata.org/causes-of-death] ??? - 56 million people died in 2017 - In Norway, **cancer is first** (2021)! - Every sixth death was due to **cancer** (globally). - The **Global Burden of Disease** is a major global study on the causes of death and disease published in the medical journal *The Lancet* --- # .little[Combating disease: from genes to networks] .pull-left-73[.center[![](img/thesis/history.svg)]] .pull-right-27[ <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> .small[ .purple[Dawkins (1976). The Selfish Gene] .purple[Kitano (2002). Computational systems biology] .purple[Noble (2008). The music of life: biology beyond genes] ] ] ??? - 1950's: Birth of Molecular Biology (Discovery of DNA, Central Dogma: DNA makes RNA, and RNA makes protein) - 1990 - 2003: Human Genome Project - International scientific collaboration - **Aim**: Sequence the human genome - **Promise**: revolutionize and transformation therapeutic medicine - *Reductionism*/*Gene Determinism*: breaking down parts, analyzing these parts - Didn't help to understand disease - except from very specific-gene-associated diseases) - *Systems Perspective*: **Integration rather than breaking down parts** - Metaphors for Reductionism vs Systems perspective: - *A book isn't just the letters in the book, but you reading, thinking about it, at a specific time/place, with your experience and previous collective readings, etc.* - Genes are like word in a language: they mean nothing outside the context they are placed in. - Genes are like Legos! - New experimental and computational approaches - From genomics to multi-omics (**BIG DATA**): genes, protein, complexes, transcriptomics, methylation, phopshorylation, processes, interactions - Mathematical approaches to model complex biological systems - **Computational sciences have became more important** - **To answer the BIG questions** (why disease happen?) you need a **systems view/perspective** --- # Cancer is a complex signaling disease .pull-left[![](img/thesis/hallmarks.jpeg) <br><br> .purple.small[Hanahan & Weinberg (2011). Hallmarks of cancer: the next generation] ] .pull-right[![](img/thesis/cell_circuit.jpg) <br> <br> <br> .small["*Living cells are a symphony of complex processes*"<br> **Jonathan Ronen** <br>*Computational genomics with R*] ] ??? - **Hallmarks of cancer**: a number of characteristics that normal cells acquire towards the path of becoming cancer cells --- # Towards Personalized Systems Medicine .center[![:scale 85%](img/thesis/personalized.svg)] .bold.blue.larger[Goals] 1. Find optimal **drug-patient matches** 2. Exploit **Big Data** and **Mathematics/Modeling** 3. Understand the **mechanisms** that drive disease .purple.small[Apweiler (2018). Whither systems medicine?] ??? - **Systems Medicine**: integrate, analyze and interpret data from different resources and produce new knowledge, based on which, personalized therapies can be developed. - *All for the patient's benefit* - **Polypharmacology** is the design or use of pharmaceutical agents that act on multiple targets or disease pathways - **one-size-fits-all** approaches have gradually been replaced by **targeted therapies** with predictive biomarkers of response - Mention that due to **patient population variation**, there is a large number of non-responders to drugs, that brings up the question: "How many people should be tested to see a particular response in the clinic?" --- # General Thesis Motivation .center[![:scale 80%](img/thesis/curation.svg)] ??? Logical Model of gastric cancer cells - **Biocurators**: make knowledge *computable* and easily reusable, extract the Causal statements - Molecular Interaction standards - Curation tools - Need for uniform access to **Controlled Vocabularies (CVs)** and **ontologies** - Data formats - .green[Causal interactions are the] **building blocks** for the Prior Knowledge Networks that are used to construct logical models, which we are going to discuss later! --- class: center, middle # .larger.blue[Paper 1] ---- # UniBioDicts: Unified access to Biological Dictionaries .small[Zobolas, J., Touré, V., Kuiper, M., & Vercruysse, S. (2020). Bioinformatics, 37(1), 143–144. https://doi.org/10.1093/bioinformatics/btaa1065] ---- --- # MI2CAST: .green.small[The Minimum Information about a Molecular Interaction Causal Statement] .center[ ![](img/thesis/causalStatement.svg) ![:scale 80%](img/thesis/identifiers.svg) ] .littler[**Guidelines**: https://github.com/MI2CAST/MI2CAST/blob/master/docs/MI2CAST_guideline.md] .small.purple[Touré (2020). The Minimum Information about a Molecular Interaction Causal Statement (MI2CAST)] ??? - 3 elements of a causal statement: the source entity, the causal interaction and the target entity --- # **V**isual **S**yntax **M**ethod: .green.small[General-purpose curation tool] ![](img/vsm/vsm-box-example.gif) ![:scale 40%](img/vsm/sro.png) - Dictionaries make VSM-terms accessible - More info: https://vsm.github.io/vsm-pages/intro .small.purple[Vercruysse and Kuiper (2020). Intuitive representation of computable knowledge] ??? - Think all **terms as nouns** or *things* - Connectors enrich meaning by adding **context** to terms - Meaning is applied by **trident connectors**: (Subject, Relation, Object) - SRO! - Easier annotation using an autocomplete function --- # causalBuilder: .green.small[Web-based curation interface for the annotation of molecular causal interactions] ![](img/thesis/causalBuilder_screenshot.png) - **CausalTAB** export .small.purple[ Touré (2021). CausalBuilder: bringing the MI2CAST causal interaction annotation standard to the curator Perfetto (2019). CausalTAB: the PSI-MITAB 2.8 updated format for signalling data representation and dissemination ] ??? - **CausalBuilder**: an intuitive web-based curation interface for the annotation of molecular causal interactions that comply with the *MI2CAST* standard - MITAB 2.8 == CausalTAB, user friendly, Excel-compatible - UBDs search query functionality is supported by an **autocomplete** function that allows users to **choose from a list of candidate terms with descriptive metadata**. --- # ![:scale 20%](img/thesis/UniBioDicts.png): .small.green[Unify Life Science terminologies] > A single programmatic interface to search *terms*, *identifiers* and some *metadata* from several resources .pull-left[ ## Implemented UBDs ![](img/thesis/ubds_image.png) ] .pull-right[ ## Potential Users 1. **Research software engineers** (as a meta-API) 2. **Curation tool developers** (e.g. causalBuilder) 3. **Biocurators** (as tool users) ---- GitHub Organization: https://github.com/UniBioDicts/
] ??? - The recommended ontologies from MI2CAST were the **motive** to started playing with APIs that gave access to various Life Science terminologies. - A collection of software packages/dictionaries that connect to each separate API - A combiner-dictionary package that **combines the results from multiple UBDs**, reporting them back through a single channel --- class: center, middle # .larger.blue[Paper 2] ---- # Linking PubDictionaries with UniBioDicts to support Community Curation .small[Zobolas, J., Kim, J.-D., Kuiper, M., & Vercruysse, S. (2020). BioHackrXiv. https://doi.org/10.37044/osf.io/gzfa8] ---- --- # Motivation .big1[ ## .red[Curation Problems] 1. Constant **refactoring** of ontologies 2. Not-yet-covered domains or novel fields: - **No authoritative** database or ontology - **No community consensus** on terminology 3. Need for **project-specific, ad-hoc** CVs 4. **Openness** and **sharing** - Facilitate review and management ] --- # Pubdictionaries and new UBD Client .pull-left-67[![](img/thesis/pub-dictionaries.svg)] .pull-right-33[ - ELIXIR BioHackathon 2020 - Update PubDictionaries API - Implement new UDB client - [Example Demo with vsm-box](https://github.com/UniBioDicts/vsm-pubdictionaries/blob/master/test/test_vsm_box_pubdictionaries.html) ] <!-- SOS: CREATE SPACE SO THAT THE PARAGRAPH GET BELOW THE TWO COLUMN LAYOUT--> <br/> <br/> <br/> <br/> .purple.small[Kim J.-D. (2019). Open Agile text mining for bioinformatics: the PubAnnotation ecosystem] --- class: center, middle # .larger.blue[Paper 3] ---- # Fine tuning a logical model of cancer cells to predict drug synergies: combining manual curation and automated parameterization .small[Flobak, Å., Zobolas, J., Vazquez, M., Steigedal, T. S., Thommesen, L., Grislingås, A., Niederdorfer B., Folkesson E. & Kuiper, M. (2021). https://doi.org/10.1101/2021.06.28.450165] ---- --- # Motivation .big1[ 2. Find **useful drug combinations** without resorting to blind-mechanism trials 3. **Automatize** logical modeling framework from .purple[Flobak (2015)] 4. Benchmark prediction performance 5. Assess importance of **training data and topology** for prediction performance 6. Mechanistic insights ] --- # Logical model of gastric cancer cells (AGS) .center[![:scale 90%](img/thesis/cascade_1.0.svg)] - CASCADE 1.0 topology + **manually optimized rules** - **Experiments**: 4 synergies out of 21 - **Model**: 5 synergies, including novel `Pi3k` and `Tak1` joint inhibition .small.purple[Flobak (2015). Discovery of drug synergies in gastric cancer cells predicted by logical modeling] ??? - 5 synergies predicted by the logical model, 4 TP + 1 FP --- # Automation of model parameterization ![:scale 90%](img/thesis/genetic_algo.png) .large[ - From generic to **cell-specific models** - Model **mutations** in logical rules (AND-NOT <=> OR-NOT) - Parallel evolutionary simulations: **ensemble of models** ] Genetic algorithm for creation of logical model ensembles: https://github.com/druglogics/gitsbe
??? - From a naive model (agnostic of cell activity states) to a **cell-specific model** - Explain training data --- ## Synergy prediction performance .smaller[https://github.com/druglogics/drabme
] .pull-left[ ![](img/thesis/pr_cascade1_2.png) .little[
Automated pipeline **performs on par** with curated model from .purple[Flobak (2015)] - **Top 6 predictions** include the 4 validated synergies ]] .pull-right[ ![](img/thesis/pr_cascade2_2.png) .little[
**Reduce drug screen** to .bold.green[24%] (36 out of 153 combos, .red[miss 2 observed synergies])] .smaller.purple[Flobak (2019). A high-throughput drug combination screen of targeted small molecule inhibitors in cancer cell lines] ] ??? - CASCADE 1.0 and .purple[Flobak (2015)] dataset: **top 6 predictions** include the 4 validated synergies - CASCADE 2.0: larger topology (144 nodes, 367 edges) - Right figure tested on a larger screen (153 drug combinations) from .purple[Flobak (2019)] - 6 **experimentally confirmed** synergies for AGS cells - Imbalanced dataset: so PR more proper .purple[Saito (2015)] - **Explain differences** between ROC (validation) vs PR (prevalence/balance of data) - Mention importance of **normalization to topology predictions** (nah...) - **Random => random yet proliferative** training data profiles --- ## Impact of data quality on model performance ![:scale 75%](img/thesis/data_quality.png) .large[ - .green[**Performance correlates with data quality**] - **Topology** carries information for synergy prediction ] ??? - Each model ensemble is trained to **partially incorrect data** - Partially-fitted models predict **better than random** (PR AUC = 0.04) - **Topology** carries information for synergy prediction --- ## .little[Impact of incorrect prior knowledge on model performance] .pull-left[![](img/thesis/topo_quality_1.png)] .pull-right[![](img/thesis/topo_quality_2.png)] <br/> <br/><br/> <br/><br/> <br/> <br/> <br/><br/> <br/><br/> <br/> .large[ - **Scrambling** CASCADE 2.0 topology - Swap source - Swap target - Sign inversion - All the above! - .green[**Small randomization decreases prediction performance**] ] ??? - Explain key analysis steps that resulted in these figures - 3 types of topology scrambling + all these applied together - **Even low levels of randomization in the curated knowledge significantly reduce the predictive power of the models** --- # Model ensemble heterogeneity .pull-left-67[![:scale 98%](img/thesis/lo_ss_heat.png)] .pull-right-33[ - Larger heterogeneity in parameterization space - Many logical rule configurations yield biologically compliant models - Calibrated models respect the training data - Violation of training data in: - **JNK** signaling - **ERK** signaling - **p38** signaling (MAPK14) ] --- # Mechanistic insights .pull-left[ #### ERK active in AGS cells - Contradictory evidence of ERK activity in literature ![](img/thesis/erk-pool-cpm-pr.png) ] .pull-right[ #### Active Oncogenes vs inactive Tumor Suppresor Genes ![](img/thesis/cosmic.png) .small.purple[Sondka (2018). The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers] ] ??? - **Biological explanations for some of the observations** - **Cancer Gene Census (CGC)** describes a curated catalogue of genes driving every form of human cancer - Proteins from genes annotated as oncogenes in COSMIC tend to be active, while proteins from genes annotated as tumor suppressor genes (TSG) tend to be inactive in overall steady states. This is biologically plausible and attests to the **capability of our mechanistic model to generate hypotheses about the underlying molecular biology**. - **Conflicting evidence of ERK activity in publications**! - ERK-active sub-ensemble is more predictive, suggesting that from a functional point of view ERK should be considered active in AGS cells. --- exclude:true # Mechanistic insights (2) .pull-left-58[![](img/thesis/edge_heat.png)] .pull-right-42[ <br> ![](img/thesis/lo_vs_topo.png)] <br/> <br/> - **Omitting prior knowledge** may improve model performance - **Cross-talk edges** are likely to be .bold.green[preserved] - **TGF-beta pathway edges** are likely to be .bold.red[removed] ??? - TGF-b edges removed are about **inhibitory effects of the protein SKI and other inhibitors of SMADs** - Preserve of cross-talk might be due to: - Sparse prior knowledge (biased pathway interactions) => Asmund has written that in the paper, but ~50% of the edges are crosstalk in CASCADE 2.0! - Importance of signaling that is not confined to what is more or less arbitrarily viewed as pathways --- exclude:true # Validation of PZ-PI synergy *in vivo* .pull-left[![](img/thesis/tumor_decrease.png)] .pull-right[![](img/thesis/tumor_progression.png)] ??? - Testing of drug synergy in a mouse xenograft model --- class: center, middle # .larger.blue[Paper 4] ---- # emba: R package for analysis and visualization of biomarkers in Boolean model ensembles .small[Zobolas, J., Kuiper, M., & Flobak, Å. (2020). Journal of Open Source Software, 5(53), 2583. https://doi.org/10.21105/joss.02583] ---- --- # Ensemble Boolean Model Analysis .green.smaller[(How it works!)] ![:scale 90%](img/druglogics/emba.svg) CRAN `R` package: https://github.com/bblodfon/emba
??? - Explain figure - Why is emba important for others? - Data analysis software for same kind of datasets - Find biomarkers - Comparing results with other methodologies - Refining model parameters (as stated in the figure) --- # Visualization of biomarker results ```r library(emba) *res = biomarker_synergy_analysis(model.predictions, models.stable.state, models.link.operator, observed.synergies, threshold = 0.7) plot_avg_state_diff_graph(net, diff = res$diff.state.synergies.mat["PI-D1",], layout = nice.layout, title = "A498 Cell Line: Prediction of PI-D1 synergy (Activity State Differences)") plot_avg_link_operator_diff_graph(net, diff = res$diff.link.synergies.mat["PI-D1",], layout = nice.layout, title = "A498 Cell Line: Prediction of PI-D1 synergy (Link Operator Differences)") ``` .pull-left[![](img/druglogics/PI-D1_A498_state.svg)] .pull-right[![](img/druglogics/PI-D1_A498_link.svg)] ??? - Library is very easy to use (once you have the data :) - Development effort/abstraction that led to an **one-liner**! - The visualization helps you pinpoint the biomarkers and see if they are close or not - (Depends on the size of the networks of course and the number of returned biomarkers) - Might be better than looking at the data results when few biomarkers - Comparing parameterization and stable state differences in a visual way - This sort of visualization and analysis led to the discovery of the `ERK_f` activation patterns in our boolean models and inspired the ensemble-based analysis for mechanistic insights in paper 3. --- class: center, middle # .larger.blue[Paper 5] ---- # Boolean function metrics can assist modelers to check and choose logical rules .small[Zobolas, J., Monteiro, P. T., Kuiper, M., & Flobak, Å. (2021). arXiv. http://arxiv.org/abs/2104.01279] ---- --- # Motivation .small.green[(with an example!)] .pull-left-67[ ![](img/thesis/interpretability.svg) .small.purple[Mendoza (2006). A method for the generation of standardized qualitative dynamical systems of regulatory networks] .small[(AND-NOT formula)] ] .pull-right-33[ - **Interpretability** issues (semantics) - **Consistency** with regulatory topology? - Which is the **right** equation? - **Expected** function output? - Match **observations**? ] ??? 1. Semantics: Which equations more **make sense**? - Derive the exact conditions that make target **active**? - Derive a meaningful characterization that allows the comparison with other BRFs? - More regulators, more complexity! - What happens when I use other link operators or other BRFs? - Check **regulatory consistency** from formula? 2. **No single right BRF** - Noisy and sparse data - Enormous function space: `\(2^{2^n}\)` (double exponential) 3. `\(f_{AND-NOT} \rightarrow 0\)`, `\(f_{OR-NOT} \rightarrow 1\)`, for large `\(n\)` - Proof? - Biased behavior? - How large `\(n\)` needs to be practically? - **Effect of number of activators `\(m\)` and inhibitors `\(k\)`?** 4. Metric that defines **expected function output** for every BRF? - Compare different link operator functions? - Use this knowledge to better **match observations**? --- exclude:true # Some notation... - Boolean regulatory functions `\(f_{BRF}:\{0,1\}^n \rightarrow \{0,1\}\)` - **Link operator functions** with a total of `\(n = m + k\)` regulators: - `\(m \ge 1\)` .green[activators] `\(x=\{x_i\}_{i=1}^{m}\)` - `\(k \ge 1\)` .red[inhibitors] `\(y=\{y_j\}_{j=1}^{k}\)` 1. `\(f_{AND-NOT}(x,y) = \left(\bigvee_{i=1}^{m} x_i\right) \land \lnot \left(\bigvee_{j=1}^{k} y_j\right)\)` 2. `\(f_{OR-NOT}(x,y) = \left(\bigvee_{i=1}^{m} x_i\right) \lor \lnot \left(\bigvee_{j=1}^{k} y_j\right)\)` 3. `\(f_{Pairs}(x,y) = \bigvee_{\forall (i,j)}^{m,k}(x_i\land \lnot y_j) = \left(\bigvee_{i=1}^{m} x_i\right) \land \left(\bigvee_{j=1}^{k} \lnot y_j\right)\)` - <p><b>Threshold functions</b>: <span class="math inline">\(\begin{align} f_{Act-win}(x,y) &= \begin{cases} 1, & \sum_{i=1}^{m} x_i \ge \sum_{j=1}^{k} y_j \\ 0, & \text{otherwise} \end{cases} \end{align}\)</span>, where activators "win" on equality, otherwise: <span class="math inline">\(f_{Inh-win(x,y)}\)</span></p> ??? - **Threshold functions**: the activities of the positive and negative regulators are combined in an **additive** manner (and not **combinatorial**), with their respective assigned weights set to ±1 and the threshold parameter to 0, formulating thus a **majority rule** which defines the value of the function. --- ## Disjunctive Normal Form and Consistency metric 1. DNF links **semantics** with **mathematics**! 2. Consistent Boolean regulatory functions - 3 principles .small.purple[Cury (2019). Partial Order on the set of Boolean Regulatory Functions] - **Unique Complete DNF** (CDNF) - <span style="color:green">Activators</span> are the *positive* literals `\((x)\)` - <span style="color:red">Inhibitors</span> are the *negative* literals `\((\lnot y)\)` ---- .blue[Example] with `\(2\)` .green[activators] `\((x_1,x_2)\)` and `\(2\)` .red[inhibitors] `\((y_1,y_2)\)`: .small[ |Link Operator BRF|CDNF|Biological Interpretation|Consistent| |:---:|:---:|:---:|:---:| |<span class="math inline">(<em>x</em><sub>1</sub> OR <em>x</em><sub>2</sub>)<strong> AND NOT </strong>(<em>y</em><sub>1</sub> OR <em>y</em><sub>2</sub>)</span>|<span class="math inline">(<em>x</em><sub>1</sub> AND NOT <em>y</em><sub>1</sub> AND NOT <em>y</em><sub>2</sub>)<strong> OR </span> <br/> <span class="math inline"></strong>(<em>x</em><sub>2</sub> AND NOT <em>y</em><sub>1</sub> AND NOT <em>y</em><sub>2</sub>)</span>|Presence of at least one activator **and** absence of all inhibitors|.bold.green[YES]| |<span class="math inline">(<em>x</em><sub>1</sub> OR <em>x</em><sub>2</sub>)<strong> NOR NOT </strong>(<em>y</em><sub>1</sub> OR <em>y</em><sub>2</sub>)</span>|<span class="math inline">(<span style="color:blue"><em>y</em><sub>1</sub></span> AND <span style="color:blue">NOT <em>x</em><sub>1</sub></span> AND <span style="color:blue">NOT <em>x</em><sub>2</sub></span>)<strong> OR</strong></span> <span class="math inline">(<em>y</em><sub>2</sub> AND NOT <em>x</em><sub>1</sub> AND NOT <em>x</em><sub>2</sub>)</span>|Presence of at least one inhibitor **and** absence of all activators|.bold.red[NO]| ] ??? - Consistency principles 1. Activators `\(\neq\)` Inhibitors (no dual regulations, *monotone* BRFs) 2. All regulators are *essential* 3. - **DNF**: biological characterization can be directly derived from the formula itself - **Results**: Only AND-NOT,OR-NOT,Pairs,Threshold function are consistent with the regulatory topology! --- ## Truth Density metric: expected function output .little[ |<span class="math inline"><em>a</em></span>|<span class="math inline"><em>b</em></span>|<span class="math inline"><em>c</em></span>|<span class="math inline"><em>f</em><sub>1</sub> = (<em>a</em> ∧ ¬ <em>c</em>) ∨ (<em>b</em> ∧ ¬ <em>c</em>)</span>|<span class="math inline"><em>f</em><sub>2</sub> = <em>a</em> ∨ (¬ <em>b</em> ∧ ¬ <em>c</em>)</span>|<span class="math inline"><em>f</em><sub>3</sub> = 1</span>| |:---:|:---:|:---:|:---:|:---:|:---:| |0|0|0|0|.bold.green[1]|.bold.green[1]| |0|0|1|0|0|.bold.green[1]| |0|1|0|.bold.green[1]|0|.bold.green[1]| |0|1|1|0|0|.bold.green[1]| |1|0|0|.bold.green[1]|.bold.green[1]|.bold.green[1]| |1|0|1|0|.bold.green[1]|.bold.green[1]| |1|1|0|.bold.green[1]|.bold.green[1]|.bold.green[1]| |1|1|1|0|.bold.green[1]|.bold.green[1]| ] - `\(TD = \{Prob: f = 1\} \in [0,1]\)` - `\(TD_{f_1}=3/8=0.375\)`, `\(TD_{f_2}=5/8=0.625\)` and `\(TD_{f_3}=8/8=1\)`
**Biased function**: `\(TD_f \rightarrow 1\)` or `\(TD_f \rightarrow 0\)`
**Balanced function**: `\(TD_f \rightarrow 1/2\)` ??? - Truth density is also known as: **bias** - **Balanced** terminology from Cryptography - Limits on definitions of biased and balanced functions are used because we usually refer to the **change of a variable, like number of regulators** --- exclude:true ## Truth density formulas and Asymptotic study .pull-left[ `\(TD_{AND-NOT}=\frac{2^m-1}{2^n} = \frac{1}{2^k}-\frac{1}{2^n}\)` `\(TD_{OR-NOT}=\frac{2^n-(2^k-1)}{2^n} = 1 - \frac{1}{2^m} + \frac{1}{2^n}\)` `\(TD_{Pairs}=\frac{(2^m-1)(2^k-1)}{2^n}\)` `\(TD_{thres} = \frac{\sum_{i=1}^m \left[ \binom{m}{i} \times \sum_{j=0}^{min(u,k)} \binom{k}{j} \right]}{2^n}\)` --- **(2)** For a .green[.bold[high] activator-to-inhibitor ratio] `\((m=n-1,k=1)\)`: `\(TD_{AND-NOT} = TD_{Pairs} \rightarrow 1/2\)` `\(TD_{OR-NOT} = TD_{thres} \rightarrow 1\)` ] .pull-right-42[ **(1)** .green[For large n]: `\(TD_{AND-NOT} \sim \frac{1}{2^k} \xrightarrow{k \rightarrow \infty} 0\)` `\(TD_{OR-NOT} \sim 1-\frac{1}{2^m} \xrightarrow{m \rightarrow \infty} 1\)` `\(TD_{Pairs}\)` and `\(TD_{thres}\)` depend on both `\(m\)` and `\(k\)` --- **(3)** For an .green[.bold[equal] activator-to-inhibitor ratio], `\(m \approx k \approx n/2\)`: `\(TD_{AND-NOT} \rightarrow 0\)`, `\(TD_{OR-NOT} = TD_{Pairs} \rightarrow 1\)` `\(TD_{thres} \rightarrow 1/2\)` ] ??? - `\(u=i\)` for `\(f_{Act-win}\)` and `\(u=i-1\)` for `\(f_{Inh-win}\)` --- ## Truth Density results and validation .pull-left-67[![](img/thesis/td-link-operator.svg)] .pull-right-33[ .large[ .bold[Notation] - `\(n\)` regulators - `\(m\)` .green[activators] - `\(k\)` .red[inhibitors]
.blue[Activator-to-inhibitor ratio] is important! ]] <br/> <br/> <br/> <br/> `\(TD_{AND-NOT} = \frac{1}{2^k}-\frac{1}{2^n} \sim \frac{1}{2^k} \xrightarrow{k \rightarrow \infty} 0\)`, depends on #.red[inhibitors] `\(k\)` `\(TD_{OR-NOT} = 1 - \frac{1}{2^m} + \frac{1}{2^n} \sim 1-\frac{1}{2^m} \xrightarrow{m \rightarrow \infty} 1\)`, depends on #.green[activators] `\(m\)` ??? - Explain how to read the figures! - For each specific number of regulators, every possible combination of at least one activator and one inhibitor that add up to that number, results in a different truth table output with its corresponding truth density value. - More **bias** with increasing numbers of regulators - **7-10** regulators: that's where biased behavior starts - Threshold function behavior: - More **balanced** behavior, following the activator-to-inhibitor ratio - A larger spectrum of possible truth density values - Connect with next slide: we wanted to test the bias of these functions --- # abmlog: .small.green[From Topology to AND/OR-NOT Boolean models] .center[![](img/thesis/abmlog.svg)] **Agreement between parameterization and stable state**: `\(10/12=0.83\)` **Per parameterization agreement**: `\(6/7\)` (AND-NOT) and `\(4/5\)` (OR-NOT) https://github.com/druglogics/abmlog
??? - `\(83\%\)` of the presented data, the link operator parameterization dictated function outcome --- ## Link operator function bias in biological networks .pull-left[ ### CASCADE 1.0 ![](img/thesis/ss-lo-agreement-per-node.svg) .large[ - Higher than random agreement - .little[AND-NOT => .red[0], OR-NOT => .green[1]] ]] .pull-right[ ### Scale-free networks ![](img/thesis/agree-stats-gamma2.svg) .large[ - Increasing agreement (.red[bias]) for #regulators >7-10 ]] ??? ### CASCADE 1.0 - Used `abmlog` on CASCADE 1.0 (77 nodes, 149 interactions, 23 link operator nodes) - Produced `\(2^23\)` boolean models - For the analysis I kept ~3 million models that had a single stable state => agreement easy to calculate - **One contingency table per node**! - **Higher than random probability** that the assignment of the "AND-NOT" (resp. "OR-NOT") link operator formula in the Boolean equations will result in the inhibition (resp. activation) of the target nodes. - **Variable results** => need for more regulators! ### Hub node bias in scale-free networks - Scale-free networks as a proxy to biological networks (with some supported evidence) - `\(n=50\)` nodes per scale-free network - `\(\gamma\)` is the power-law exponent: Comparing the scale-free networks generated with two gammas we observe that the `\(\gamma = 2\)` have **more link operator nodes** and **higher degree hubs** than networks with `\(\gamma = 2.5\)` - In-degrees are drawn from Riemann Zeta distribution based on the work from .purple[Aldana (2003). Boolean dynamics of networks with scale-free topology] --- # Bias guides model parameterization .center[![:scale 66%](img/thesis/ss-lo-agreement-per-node-split.svg)] - High correlation `\((\rho_{and-not} = -0.74\)`, `\(\rho_{or-not} = 0.85)\)` - *A priori* .green[choose parameterization to match observations] - `LRP_f` `\((m=4,k=1)\)`: `\(TD_{AND−NOT}=0.47\)`, `\(TD_{OR−NOT}=0.97\)` - `TSC_f` `\((m=1,k=4)\)`: `\(TD_{AND−NOT}=0.03\)`, `\(TD_{OR−NOT}=0.53\)` ??? - CASCADE models were split into two pools, representing the "AND-NOT" and "OR-NOT" node parameterizations, and calculated the proportion of models within each pool whose link operator matched the expected state outcome (percent agreement) - The link operator nodes are **sorted according to the average activity state** across the considered CASCADE models - Colored node labels indicate literature curated activity profiles from Flobak et al. [2015] - **Correlation here is between activity and parameterization** - A priori choosing parameterization - better than random implied! - `TSC_f`: no model had it as active no matter the parameterization, which shows that **complex dynamics also play a role**. Activity state results may contradict what is expected from the asymptotic truth density results - e.g. I would expect **at least some active `TSC_f`** when it's with `OR NOT`!!! Also: `\(TD_{AND−NOT}=0.03=1/32=1/(2^5)\)` --- class: center, middle # .larger.blue[Take-home messages] ---- .pull-left[ ### Curation of prior knowledge ### Topology drives predictions ### Mathematics as a valuable tool for modeling ] .pull-right[![](img/thesis/curation.svg)] --- # Acknowledgements .smaller[(.red[
<i class="fas fa-heart faa-bounce animated faa-fast "></i>
] our sponsors!)] .center.pull-left-67[ ![:scale 90%](img/druglogics/druglogics_org_2021.svg) <img src="https://media.giphy.com/media/IvTIFDvulINIA/giphy.gif" height="150px"></img> ] .pull-right-27[ ![](img/ack/eracosysmed.gif) ![](img/ack/cost.png) ![](img/ack/ntnu_helse.jpg) ![](img/ack/Samarbeidsorganet.gif) ---- .little[ - .bold[[Code for slides](https://github.com/bblodfon/r-pres/blob/master/phd_thesis.Rmd)]
- .bold[[PhD Thesis](https://bblodfon.github.io/my-phd-thesis/)]
]] ??? - Pedro T. Monteiro - Noemi Del Toro Ayllon, Henning Hermjacob (EBI Team) - Denis Thieffry - Jing-Dong Kim (PubDictionaries, BioHackathon)