Use this function to perform a full biomarker analysis on an ensemble boolean model dataset where the model classification is based on the Matthews correlation coefficient score (MCC). This analysis enables the discovery of performance biomarkers, nodes whose activity and/or boolean model parameterization (link operator) affects the prediction performance of the models (as measured by the MCC score).
biomarker_mcc_analysis( model.predictions, models.stable.state, models.link.operator = NULL, observed.synergies, threshold, num.of.mcc.classes = 5, penalty = 0.1 )
model.predictions | a |
---|---|
models.stable.state | a |
models.link.operator | a |
observed.synergies | a character vector with elements the names of the
drug combinations that were found as synergistic. This should be a subset of
the tested drug combinations, that is the column names of the |
threshold | numeric. A number in the [0,1] interval, above which (or below its negative value) a biomarker will be registered in the returned result. Values closer to 1 translate to a more strict threshold and thus less biomarkers are found. |
num.of.mcc.classes | numeric. A positive integer larger than 2 that signifies the number of mcc classes (groups) that we should split the models MCC values. Default value: 5. |
penalty | value between 0 and 1 (inclusive). A value of 0 means no penalty and a value of 1 is the strickest possible penalty. Default value is 0.1. This penalty is used as part of a weighted term to the difference in a value of interest (e.g. activity or link operator difference) between two group of models, to account for the difference in the number of models from each respective model group. |
a list with various elements:
predicted.synergies
: a character vector of the synergies (drug
combination names) that were predicted by at least one of the models
in the dataset.
models.mcc
: a numeric vector of MCC scores, one for each model.
Values are in the [-1,1] interval.
diff.state.mcc.mat
: a matrix whose rows are vectors of
average node activity state differences between two groups of models where
the classification was based on the MCC score of each model and was
found using an optimal univariate k-means clustering method
(Ckmeans.1d.dp
).
Rows represent the different classification group matchings, e.g. (1,2)
means the models that were classified into the first MCC class vs the models
that were classified in the 2nd class (higher is better). The columns
represent the network's node names. Values are in the [-1,1] interval.
biomarkers.mcc.active
: a character vector whose elements are
the names of the active state biomarkers. These nodes appear more
active in the better performance models.
biomarkers.mcc.inhibited
: a character vector whose elements are
the names of the inhibited state biomarkers. These nodes appear more
inhibited in the better performance models.
diff.link.mcc.mat
: a matrix whose rows are vectors of
average node link operator differences between two groups of models where
the classification was based on the MCC score of each model and was
found using an optimal univariate k-means clustering method
(Ckmeans.1d.dp
).
Rows represent the different classification group matchings, e.g. (1,2)
means the models that were classified into the first MCC class vs the models
that were classified in the 2nd class (higher is better).
The columns represent the network's node names. Values are in the [-1,1] interval.
biomarkers.mcc.or
: a character vector whose elements are
the names of the OR link operator biomarkers. These nodes have
mostly the OR link operator in their respective boolean equations
in the better performance models.
biomarkers.mcc.and
: a character vector whose elements are
the names of the AND link operator biomarkers. These nodes have
mostly the AND link operator in their respective boolean equations
in the better performance models.
Other general analysis functions:
biomarker_synergy_analysis()
,
biomarker_tp_analysis()