Get the right rules
“That's all well and good in practice, but how does it work in theory?” Shmuel Weinberger
In the previous chapter we described in detail all the different requirements that a software needs to satisfy so as to deliver on its promise to do what it was made to do. In other words, it is entirely the software developer’s responsibility to make sure that the underlying algorithms are programmed correctly, the model equations and their solutions are correct and that in general, the software works as expected. In the case of our modeling pipeline, this translates for example to the precise and error-free implementation of the genetic algorithm as well as taking the extra effort to test and ensure that the Boolean models are assigned the desired parameterization and their attractors are correctly calculated. The aforementioned procedure is semantically synonymous to verification: the software works in a manner that directly reflects the underlying theories and modeling assumptions. This is part of what makes the simulation results trustworthy and actionable, in the sense that they can be used to provide solutions to real-world problems, making the respective models valuable for diverse applications, both in industrial and clinical contexts.
It is a totally different matter if the solutions that the software was made to produce, (e.g. the simulation results in the case of modeling software), are pragmatic. What good are models if their outputs do not agree with real observations, no matter how skillfully the developer translated the theoretical ideas in software code? All models are wrong since they are approximate representations of reality, but they should at least have some use and therefore it is important to be able to pinpoint exactly where this wrongness originates and what it pertains to. If a biological model for example cannot reproduce basic observations about the phenotype of the system it simulates (via its respective software), then that model is not useful and it probably needs further refinement. That leads to the question of what makes a model able to better match experimental data and become a more faithful approximation of reality. In other words, what aspect of the model needs refinement? The first step towards answering this question is understanding that a model is more than just the code. A model, as explained in a previous chapter, is a set of mathematical rules applied to a list of biological entities (also referred to as model parameterization). So, extra care should be given to make sure we have the right equations in our models. This crucial next step is known as the validation of the modeling software and its basic assumption is that more “right” equations result in better fit to observations, which lead to better models (their behavior corresponds more accurately to reality) and subsequently, better simulation results (predictions about the studied system).
So for every modeling application and subsequent software, not only do we need to have the rules right (verification), but it is of equal or maybe even more significance to have the right rules (validation) [127]. In that aspect, the parameterization of the Boolean models in our software, i.e. the choice of logical equations that define the regulatory activity of every target in the underlying cancer signaling network, stands as one of the most important parts of the ensuing modeling process. Since parameterization is so important, how can we find which are the “right” rules or similarly, how can we distinguish between those that are right versus those that are not (or are less so), so that we can choose the former for our models? Practically, the way that researchers in the logical modeling community have dealt with this problem is by establishing standard logical equation forms to represent regulatory interactions upon a signaling target [128]. Based on such a foundation for the initial construction of the logical rules, the next step is to tweak them properly, by changing logical operators and removing/adding variables (regulators), so that a better match with experimental observations can be achieved. This process of calibrating the rules to fit the respective data can be either the result of manual curation [90,129,130], or the outcome of automated computational search for optimal logical equations [131,132] (for similar efforts in this thesis see Paper 3). In addition, several other methods are used to convert various input sources to the appropriate Boolean rules, e.g. by translating molecular interaction maps directly to Boolean models [133] or user-provided text to suitable logical equations via web-based tools [134].
Be it the modelers or the computers that refine and produce the final Boolean equations in the respective models, we reasoned that a proper framework to characterize the Boolean functions that constitute the rules in these models, is currently missing. The main idea is that, since the mathematical rules are the heart of modeling and we need to have the right rules (or as best as possible) for validating and further refining our models, a proper toolkit is needed to differentiate and choose between the possible parameterization options. The plethora of potential equations that can be just “right” are a direct consequence of the fact that the number of possible parameterizations increases dramatically with the number of regulators [135]. In addition, a large number of Boolean equations may fit equally well the experimental observations or it might even be the case that the data are not enough to uniquely define the model equations [136]. Either way, fitting the model to match the expected outputs is just one side of the validation process. We need to go deeper than that though if we are to reach our goal of finding biologically reasonable and functionally useful rules for our models. To establish a practical framework assisting in the choice of a particular logical model parameterization, we need to search for ways to expand our knowledge of the rules and gain insights into these from different perspectives. Following this logic, there are function properties that Boolean mathematics research has thoroughly studied and which, when brought to the context of modeling, can be beneficial for both modelers and software applications that specify or calibrate the rules of a logical model [135,137,138]. For example, such properties could be used to investigate if the equations contradict the structure of the underlying regulatory topology (PKN) or that biologically important regulators manifest in the equations as proportionally influential to the respective Boolean output. Our efforts constitute a first attempt to compile such a list of Boolean function metrics in a unified framework that could be used to refine the search for the optimal rules to use (Paper 5). The analyses in that paper also show the differences between varying Boolean parameterizations in terms of expected output behavior and how this information, when known a priori (based on precise mathematical formalization and subsequent calculation), can assist in the choice of better rules for the considered models.
While establishing a framework to help in the identification of the right rules for a Boolean model, we leveraged the benefits of variation in model parameterization (Figure 7). Exploring the effects of variation in the values of model parameters, enables the identification of variables which have the largest influence in the behavior of the studied systems and can provide mechanistic insights to explain several phenomena [75]. For example, in Eduati et al. (2017) [86], the authors study how logical model parameters are related with cellular sensitivity to anti-cancer drugs. Simply put, by tweaking a biologically meaningful parameter in their model (the responsiveness of the GSK3 signaling node), you could explain why a drug combination involving a MEK and a GSK3 inhibitor is sensitive in some cancer cell lines and resistant in others. This computational finding was supported by experimental evidence, thereby providing a proof-of-concept in how the investigation of model parameterization can suggest new mechanisms for the manifestation of particular drug synergies. On a more theoretical front, in Abou-Jaoudé et al. (2019) [139], the authors formally describe the concept of logical bifurcation diagrams, a framework to assess how changes in the logical parameters result in the change of the dynamics of logical models. This methodology was used to display the attractors of the simple p53-Mdm2 signaling network as a function of the degradation rate of ubiquitin ligase Mdm2 in the nucleus, allowing for a more concise characterization of its main dynamical properties [140].
Such approaches inspired us to explore the space of Boolean model parameterization, pertaining to the equations used to construct and mutate the models in the genetic algorithm of Gitsbe (Paper 3).
We made a Java software package that can produce either a random sample or all possible Boolean models, based on a standardized equation form [128] and its most basic variation (Figure 7).
The abmlog
package is now also part of the DrugLogics software suite and was used in the analyses of Paper 5 to show how the output behavior of the two alternative parameterization options of the Gitsbe models varies based on the number of regulators in the respective equations and the ratio of positive to negative regulators (as these were defined in the original PKN).
Moreover, a large number of CASCADE-based Boolean models were produced using the abmlog
package to explore parameterization variation.
Exploiting the dimension reduction and visualization method UMAP [141], we constructed several Boolean model maps and were able to visualize model differences such as fitness to training data and prediction performance.15
Further analysis identified important nodes which drive the change of dynamics (number of attractors in the CASCADE Boolean models) and whose parameterization could be used to visually separate the UMAP embedded models in distinct clusters.