R/single-sample-scoring-methods.R
eigenWeightedMean.Rd
Weights for the genes in x
are calculated by the percent of which
they contribute to the principal component indicated by eigengene
.
eigenWeightedMean(
x,
eigengene = 1L,
center = TRUE,
scale = TRUE,
uncenter = center,
unscale = scale,
retx = FALSE,
weights = NULL,
normalize = FALSE,
all.x = NULL,
...,
.drop.sd = 1e-04
)
An expression matrix of genes x samples. When using this to score
geneset activity, you want to reduce the rows of x
to be only the
genes from the given gene set.
the PC used to extract the gene weights from
center and/or scale data before scoring?
uncenter and unscale the data data on the way out?
Defaults to the respective values of center
and scale
Works the same as retx
from prcomp
. If
TRUE
, will return a ret$pca$x
matrix that has the rotated
variables.
a user can pass in a prespecified set of waits using a named
numeric vector. The names must be a superset of rownames(x)
. If
this is NULL
, we calculate the "eigenweights".
If TRUE
, each score is normalized to a randomly
selected geneset score. The size of the randomly selected geneset is
the same as the corresponding geneset. This only works with the "ewm"
method when unscale and uncenter are TRUE
. By default, this is
set to FALSE
, and normalization does not happen. Instead of
passing in TRUE
, the user can pass in a vector of gene names
(identifiers) to be considered for random geneset creation. If no
genes are provided, then all genes in y
are fair game.
if the user is trying to normalize these scores, an expression
matrix that has superset of the control genes needs to be provided, where
the columns of all.x
must correspond to this in x
.
these aren't used in here
When zero-sd (non varying) features are scaled, their values
are NaN
. When the Features with rowSds < this threshold (default 1e-4) are
identified, and their scaled values are set to 0.
A list of useful transformation information. The caller is likely
most interested in the $score
vector, but other bits related to
the SVD/PCA decomposition are included for the ride.
You will generally want the rows of the gene x sample matrix “xto be z-transformed. If it is not already, ensure that
center` and
`scale` are set to `TRUE`.
When uncenter and/or unscale are FALSE
, it means that the scores
should be applied on the centered or scaled values, respectively.
Scores can be normalized against a set of control genes. This results in negative and postiive sample scores. Positive scores are ones where the specific geneset score is higher than the aggregate control-geneset score.
Genes used for the control set can either be randomly sampled from the
rows of the all.x
expression matrix (when normalize = TRUE
), or
explicitly specified by a row-identifier character vectore passed to the
normalize
parameter. In both cases, the code prefers to select a
random-control geneset to be of equal size as nrow(x)
. If that's not
possible, we use as many genes as we can get.
Note that normalization requires an expression matrix to be passed into
the all.x
parameter, whose columns match 1:1 to the columns in x
.
Calling scoreSingleSamples()
with method = "ewm", normalize = TRUE
handles this transparently.
This idea to implement this method of normalizatition was inspired from
the ctrl.score
normalization found in Seurat's AddModuleScore()
function.
scoreSingleSamples
vm <- exampleExpressionSet(do.voom=TRUE)
gdb <- conform(exampleGeneSetDb(), vm)
features <- featureIds(gdb, 'c2', 'BURTON_ADIPOGENESIS_PEAK_AT_2HR',
value='x.idx')
scores <- eigenWeightedMean(vm[features,])$score
## Use scoreSingleSamples to facilitate scoring of all gene sets
scores.all <- scoreSingleSamples(gdb, vm, 'ewm')
s2 <- with(subset(scores.all, name == 'BURTON_ADIPOGENESIS_PEAK_AT_2HR'),
setNames(score, sample_id))
all.equal(s2, scores)
#> [1] TRUE