Note that we do not import things from goseq directly, and only load it if this function is fired. I can't figure out a way to selectively import functions from the goseq package without it having to load its dependencies, which take a long time – and I don't want loading sparrow to take a long time. So, the goseq package has moved to Suggests and then is loaded within this function when necessary.
goseq(
gsd,
selected,
universe,
feature.bias,
method = c("Wallenius", "Sampling", "Hypergeometric"),
repcnt = 2000,
use_genes_without_cat = TRUE,
plot.fit = FALSE,
do.conform = TRUE,
as.dt = FALSE,
.pipelined = FALSE
)
The GeneSetDb
object to run tests against
The ids of the selected features
The ids of the universe
a named vector as long as nrow(x)
that has the
"bias" information for the features/genes tested (ie. vector of gene
lengths). names(feature.bias)
should equal rownames(x)
.
If this is not provided, all feature lengths are set to 1 (no bias).
The goseq package provides a getlength
function which
facilitates getting default values for these if you do not have the
correct values used in your analysis.
The method to use to calculate the unbiased category enrichment scores
Number of random samples to be calculated when random sampling
is used. Ignored unless method="Sampling"
.
A boolean to indicate whether genes without a categorie should still be used. For example, a large number of gene may have no GO term annotated. If this option is set to FALSE, those genes will be ignored in the calculation of p-values (default behaviour). If this option is set to TRUE, then these genes will count towards the total number of genes outside the category being tested.
parameter to pass to goseq::nullp()
.
By default TRUE
: does some gymnastics to conform
the gsd
to the universe
vector. This should neber be set
to FALSE
, but this parameter is here so that when this function
is called from the seas()
codepath, we do not have to
reconform the GeneSetDb
object, because it has already been done.
If FALSE
(default), the data.frame like thing that
this funciton returns will be set to a data.frame. Set this to TRUE
to keep this object as a data.table
If this is being called external to a seas pipeline, then
some additional cleanup of columns name output will be done when
FALSE
(default). Otherwise the column renaming and post processing is
left to the do.goseq caller.
A data.table
of results, similar to goseq output. The output
from nullp
is added to the outgoing data.table as
an attribue named "pwf"
.
Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. (2010). Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11, R14. http://genomebiology.com/2010/11/2/R14
vm <- exampleExpressionSet()
gdb <- conform(exampleGeneSetDb(), vm)
# Identify DGE genes
mg <- seas(vm, gdb, design = vm$design)
lfc <- logFC(mg)
# wire up params
selected <- subset(lfc, significant)$feature_id
universe <- rownames(vm)
mylens <- setNames(vm$genes$size, rownames(vm))
degenes <- setNames(integer(length(universe)), universe)
degenes[selected] <- 1L
gostats <- sparrow::goseq(
gdb, selected, universe, mylens,
method = "Wallenius", use_genes_without_cat = TRUE)
#>
#> Warning: initial point very close to some inequality constraints
#> Using manually entered categories.
#> Calculating the p-values...