Summarizes useful statistics per gene set from a SparrowResult

This function calculates the number of genes that move up/down for the given contrasts, as well as mean and trimmed mean of the logFC and t-statistics. Note that the statistics calculated and returned here are purely a function of the statistics generated at the gene-level stage of the analysis.

geneSetsStats(
  x,
  feature.min.logFC = 1,
  feature.max.padj = 0.1,
  trim = 0.1,
  reannotate.significance = FALSE,
  as.dt = FALSE
)

Arguments

x: A SparrowResult object
feature.min.logFC: used with feature.max.padj to identify the individual features that are to be considered differentially expressed.
feature.max.padj: used with feature.min.logFC to identify the individual features that are to be considered differentially expressed.
trim: The amount to trim when calculated trimmed t and logFC statistics for each geneset.
reannotate.significance: this is internally by the package, and should left as FALSE when used by the user.
as.dt: If FALSE (default), the data.frame like thing that this funciton returns will be set to a data.frame. Set this to TRUE to keep this object as a data.table

Value

A data.table with statistics at the gene set level across the prescribed contrast run on x. These statistics are independent of any particular GSEA method, but rather summarize aggregate shifts of the gene sets individual features. The columns included in the output are summarized below:

n.sig: The number of individual features whose abs(logFC) and padj thersholds satisfy the criteria of the feature.min.logFC and feature.max.padj parameters of the original seas() call
n.neutral: The number of individual features whose abs(logFC) and padj thersholds do not satisfy the feature.* criteria named above.
n.up, n.down: The number of individual features with logFC > 0 or logFC < 0, respectively, irrespective of the feature.* thresholds referenced above.
n.sig.up, n.sig.down: The number of individual features that pass the feature.* thresholds and have logFC > 0 or logFC < 0, respectively.
mean.logFC, mean.logFC.trim: The mean (or trimmed mean) of the individual logFC estimates for the features in the gene set. The amount of trim is specified in the trim parameter of the seas() call.
mean.t, mean.t.trim: The mean (or trimmed mean) of the individual t-statistics for the features in the gene sets. These are NA if the input expression object was a DGEList.

Examples

vm <- exampleExpressionSet(do.voom=TRUE)
gdb <- exampleGeneSetDb()
mg <- seas(vm, gdb, design = vm$design, contrast = 'tumor')
head(geneSetsStats(mg))
#>   collection                                          name n.sig n.neutral n.up
#> 1         c2                        BIOCARTA_AGPCR_PATHWAY     1        10    3
#> 2         c2         BOYAULT_LIVER_CANCER_SUBCLASS_G123_DN     9        32   11
#> 3         c2               BURTON_ADIPOGENESIS_PEAK_AT_2HR     9        41   14
#> 4         c2        BYSTRYKH_HEMATOPOIESIS_STEM_CELL_IL3RA     0         6    1
#> 5         c2             CAIRO_PML_TARGETS_BOUND_BY_MYC_UP     2        21   14
#> 6         c2 CHARAFE_BREAST_CANCER_BASAL_VS_MESENCHYMAL_DN     7        38   13
#>   n.down n.sig.up n.sig.down mean.logFC mean.logFC.trim     mean.t mean.t.trim
#> 1      8        0          1 -0.4952773      -0.5047325 -1.3734521  -1.3771654
#> 2     30        0          9 -0.7681384      -0.6497956 -1.5011871  -1.5199373
#> 3     36        1          8 -0.7659036      -0.7384984 -1.0942669  -1.1551306
#> 4      5        0          0 -0.2321046      -0.2321046 -0.5770413  -0.5770413
#> 5      9        2          0  0.3430059       0.2155544  0.6148539   0.4356277
#> 6     32        0          7 -0.4558734      -0.4708266 -1.1351593  -1.0682285