Single sample gene set score by a weighted average of the genes in geneset

Weights for the genes in x are calculated by the percent of which they contribute to the principal component indicated by eigengene.

eigenWeightedMean(
  x,
  eigengene = 1L,
  center = TRUE,
  scale = TRUE,
  uncenter = center,
  unscale = scale,
  retx = FALSE,
  weights = NULL,
  normalize = FALSE,
  all.x = NULL,
  ...,
  .drop.sd = 1e-04
)

Arguments

x	An expression matrix of genes x samples. When using this to score geneset activity, you want to reduce the rows of `x` to be only the genes from the given gene set.
eigengene	the PC used to extract the gene weights from
center	center and/or scale data before scoring?
scale	center and/or scale data before scoring?
uncenter	uncenter and unscale the data data on the way out? Defaults to the respective values of `center` and `scale`
unscale	uncenter and unscale the data data on the way out? Defaults to the respective values of `center` and `scale`
retx	Works the same as `retx` from `prcomp`. If `TRUE`, will return a `ret$pca$x` matrix that has the rotated variables.
weights	a user can pass in a prespecified set of waits using a named numeric vector. The names must be a superset of `rownames(x)`. If this is `NULL`, we calculate the "eigenweights".
normalize	If `TRUE`, each score is normalized to a randomly selected geneset score. The size of the randomly selected geneset is the same as the corresponding geneset. This only works with the "ewm" method when unscale and uncenter are `TRUE`. By default, this is set to `FALSE`, and normalization does not happen. Instead of passing in `TRUE`, the user can pass in a vector of gene names (identifiers) to be considered for random geneset creation. If no genes are provided, then all genes in `y` are fair game.
all.x	if the user is trying to normalize these scores, an expression matrix that has superset of the control genes needs to be provided, where the columns of `all.x` must correspond to this in `x`.
...	these aren't used in here
.drop.sd	When zero-sd (non varying) features are scaled, their values are `NaN`. When the Features with rowSds < this threshold (default 1e-4) are identified, and their scaled values are set to 0.

Value

A list of useful transformation information. The caller is likely most interested in the $score vector, but other bits related to the SVD/PCA decomposition are included for the ride.

Details

You will generally want the rows of the gene x sample matrix ``xto be z-transformed. If it is not already, ensure thatcenter` and `scale` are set to `TRUE`.

When uncenter and/or unscale are FALSE, it means that the scores should be applied on the centered or scaled values, respectively.

Normalization

Scores can be normalized against a set of control genes. This results in negative and postiive sample scores. Positive scores are ones where the specific geneset score is higher than the aggregate control-geneset score.

Genes used for the control set can either be randomly sampled from the rows of the all.x expression matrix (when normalize = TRUE), or explicitly specified by a row-identifier character vectore passed to the normalize parameter. In both cases, the code prefers to select a random-control geneset to be of equal size as nrow(x). If that's not possible, we use as many genes as we can get.

Note that normalization requires an expression matrix to be passed into the all.x parameter, whose columns match 1:1 to the columns in x. Calling scoreSingleSamples() with method = "ewm", normalize = TRUE handles this transparently.

This idea to implement this method of normalizatition was inspried from the ctrl.score normalization found in Seurat's Seurat::AddModuleScore() function.

Examples