Perform goseq Enrichment tests across a GeneSetDb.

Note that we do not import things from goseq directly, and only load it if this function is fired. I can't figure out a way to selectively import functions from the goseq package without it having to load its dependencies, which take a long time -- and I don't want loading multiGSEA to take a long time. So, the goseq package has moved to Suggests and then is loaded within this function when necessary.

goseq(
  gsd,
  selected,
  universe,
  feature.bias,
  method = c("Wallenius", "Sampling", "Hypergeometric"),
  repcnt = 2000,
  use_genes_without_cat = TRUE,
  plot.fit = FALSE,
  do.conform = TRUE,
  as.dt = FALSE,
  .pipelined = FALSE
)

Arguments

gsd	The `GeneSetDb` object to run tests against
selected	The ids of the selected features
universe	The ids of the universe
feature.bias	a named vector as long as `nrow(x)` that has the "bias" information for the features/genes tested (ie. vector of gene lengths). `names(feature.bias)` should equal `rownames(x)`. If this is not provided, all feature lengths are set to 1 (no bias). The goseq package provides a `getlength` function which facilitates getting default values for these if you do not have the correct values used in your analysis.
method	The method to use to calculate the unbiased category enrichment scores
repcnt	Number of random samples to be calculated when random sampling is used. Ignored unless `method="Sampling"`.
use_genes_without_cat	A boolean to indicate whether genes without a categorie should still be used. For example, a large number of gene may have no GO term annotated. If this option is set to FALSE, those genes will be ignored in the calculation of p-values (default behaviour). If this option is set to TRUE, then these genes will count towards the total number of genes outside the category being tested.
do.conform	By default `TRUE`: does some gymnastics to conform the `gsd` to the `universe` vector. This should neber be set to `FALSE`, but this parameter is here so that when this function is called from the `multiGSEA` codepath, we do not have to reconform the `GeneSetDb` object, because it has already been done.
.pipelined	If this is being external to a multiGSEA pipeline, then some additional cleanup of columns name output will be done. Otherwise the column renaming and post processing is left to the do.goseq caller (Default: `FALSE`).
active.only	If `TRUE`, only "active" genesets are used
value	The feature_id types to extract from `gsd`

Value

A data.table of results, similar to goseq output. The output from nullp is added to the outgoing data.table as an attribue named "pwf".

References

Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. (2010). Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11, R14. http://genomebiology.com/2010/11/2/R14