R/single-sample-scoring-methods.R
gsdScore.Rd
This method was developed by Jason Hackney and first introduced in the following paper doi:10.1038/ng.3520. It produces a single sample gene set score in values that are in "expression space," the innards of which mimic something quite similar to an eigengene based score.
To easily use this method to score a number of gene setes across an
experiment, you'll want to have the scoreSingleSamples()
method
drive this function via specifying "svd"
as one of the
methods
.
gsdScore(
x,
eigengene = 1L,
center = TRUE,
scale = TRUE,
uncenter = center,
unscale = scale,
retx = FALSE,
...,
.use_irlba = FALSE,
.drop.sd = 1e-04
)
An expression matrix of genes x samples. When using this to score
geneset activity, you want to reduce the rows of x
to be only the
genes from the given gene set.
the "eigengene" you want to get the score for. only accepts a single value for now.
center and/or scale data before scoring?
uncenter and unscale the data data on the way out?
Defaults to the respective values of center
and scale
Works the same as retx
from prcomp
. If
TRUE
, will return a ret$pca$x
matrix that has the rotated
variables.
these aren't used in here
when TRUE
, used irlba::svdr()
instead of base::svd()
.
Default: FALSE
.
When zero-sd (non varying) features are scaled, their values
are NaN
. When the Features with rowSds < this threshold (default 1e-4) are
identified, and their scaled values are set to 0.
A list of useful transformation information. The caller is likely
most interested in the $score
vector, but other bits related to
the SVD/PCA decomposition are included for the ride.
The difference between this method vs the eigengene score is that the SVD is used to calculate the eigengene. The vector of eigengenes (one score per sample) is then multiplied through by the SVD's left matrix. This produces a matrix which we then take the colSums of to get back to a single sample score for the geneset.
Why do all of that? You get data that is back "in expression space" and we
also run around the problem of sign of the eigenvector. The scores you get
are very similar to average zscores of the genes per sample, where the
average is weighted by the degree to which each gene contributes to the
principal component chosen by eigengene
, as implemented in the
eigenWeightedMean()
function.
The core functionality provided here is taken from the soon to be released GSDecon package by Jason Hackney
vm <- exampleExpressionSet(do.voom=TRUE)
gdb <- conform(exampleGeneSetDb(), vm)
features <- featureIds(gdb, "c2", "BURTON_ADIPOGENESIS_PEAK_AT_2HR")
scores <- gsdScore(vm[features,])$score
## Use scoreSingleSamples to facilitate scoring of all gene sets
scores.all <- scoreSingleSamples(gdb, vm, 'gsd')
s2 <- with(subset(scores.all, name == 'BURTON_ADIPOGENESIS_PEAK_AT_2HR'),
setNames(score, sample_id))
all.equal(s2, scores)
#> [1] TRUE