This method was developed by Jason Hackney and first introduced in the following paper doi:10.1038/ng.3520. It produces a single sample gene set score in values that are in "expression space," the innards of which mimic something quite similar to an eigengene based score.

To easily use this method to score a number of gene setes across an experiment, you'll want to have the scoreSingleSamples() method drive this function via specifying "svd" as one of the methods.

gsdScore(
  x,
  eigengene = 1L,
  center = TRUE,
  scale = TRUE,
  uncenter = center,
  unscale = scale,
  retx = FALSE,
  ...,
  .use_irlba = FALSE,
  .drop.sd = 1e-04
)

Arguments

x

An expression matrix of genes x samples. When using this to score geneset activity, you want to reduce the rows of x to be only the genes from the given gene set.

eigengene

the "eigengene" you want to get the score for. only accepts a single value for now.

center, scale

center and/or scale data before scoring?

uncenter, unscale

uncenter and unscale the data data on the way out? Defaults to the respective values of center and scale

retx

Works the same as retx from prcomp. If TRUE, will return a ret$pca$x matrix that has the rotated variables.

...

these aren't used in here

.use_irlba

when TRUE, used irlba::svdr() instead of base::svd(). Default: FALSE.

.drop.sd

When zero-sd (non varying) features are scaled, their values are NaN. When the Features with rowSds < this threshold (default 1e-4) are identified, and their scaled values are set to 0.

Value

A list of useful transformation information. The caller is likely most interested in the $score vector, but other bits related to the SVD/PCA decomposition are included for the ride.

Details

The difference between this method vs the eigengene score is that the SVD is used to calculate the eigengene. The vector of eigengenes (one score per sample) is then multiplied through by the SVD's left matrix. This produces a matrix which we then take the colSums of to get back to a single sample score for the geneset.

Why do all of that? You get data that is back "in expression space" and we also run around the problem of sign of the eigenvector. The scores you get are very similar to average zscores of the genes per sample, where the average is weighted by the degree to which each gene contributes to the principal component chosen by eigengene, as implemented in the eigenWeightedMean() function.

The core functionality provided here is taken from the soon to be released GSDecon package by Jason Hackney

Examples

vm <- exampleExpressionSet(do.voom=TRUE)
gdb <- conform(exampleGeneSetDb(), vm)
features <- featureIds(gdb, "c2", "BURTON_ADIPOGENESIS_PEAK_AT_2HR")
scores <- gsdScore(vm[features,])$score

## Use scoreSingleSamples to facilitate scoring of all gene sets
scores.all <- scoreSingleSamples(gdb, vm, 'gsd')
s2 <- with(subset(scores.all, name == 'BURTON_ADIPOGENESIS_PEAK_AT_2HR'),
           setNames(score, sample_id))
all.equal(s2, scores)
#> [1] TRUE