R/get-msigdb.R
getMSigCollection.Rd
This provides versioned genesets from gene set collections defined in
MSigDB. Collections can
be retrieved by their collection name, ie c("H", "C2", "C7")
.
getMSigCollection(
collection = NULL,
species = "human",
id.type = c("ensembl", "entrez", "symbol"),
with.kegg = FALSE,
promote.subcategory.to.collection = FALSE,
prefix.collection = FALSE,
...
)
getMSigGeneSetDb(
collection = NULL,
species = "human",
id.type = c("ensembl", "entrez", "symbol"),
with.kegg = FALSE,
promote.subcategory.to.collection = FALSE,
prefix.collection = FALSE,
...
)
character vector specifying the collections you want
(c1, c2, ..., c7, h). By default we load just the hallmark collecitons.
Setting this to NULL
loads all collections. Alternative you can also
include named subsets of collections, like "reactome"
. Refer to the
Details section for more information.
"human"
or "mouse"
? Really, this is anything available
in the alias
column of the sparrow:::species_info()
table (except
cyno).
do you want the feature id's used in the gene sets to be
"ensembl"
(default), "entrez"
, or "symbol"
.
The Broad distributes the latest versions of the KEGG
genesets as part of the c2 collection. These genesets come with a
restricted license, so by default we do not return them as part of the
GeneSetDb. To include the KEGG gene sets when asking for the c2
collection, set this flag to TRUE
.
there are different sources of
genesets for a number of the collections in MSigDB. These are included
in the gs_subcat
column of geneSets(this)
. When this is set to TRUE
,
the collection column for the genesets is appended with the subcatory.
So, instead of having a massive "C2"
collection, you'll have bunch of
collections like "C2_CGP"
, "C2_CP:BIOCARTA"
, etc.
When TRUE
(default: FALSE
), the "C1"
, "C2"
,
etc. is prefixed with "MSigDB_*"
pass through parameters
a BiocSet
of the MSigDB collections
getMSigGeneSetDb()
: retrieval method for a GeneSetDb container
This function utilizes the functionality from the {msigdbr}
and
{babelgene}
packages to retrieve gene set definitions from a variety of
organisms and identifier types.
Due to the licensing restrictions over the KEGG collections, they are not
returned from this function unless they are explicitly asked for. You can
ask for them through this function by either (i) querying for the "c2"
collection while setting with.kegg = TRUE
; or (ii) explicitly calling with
collection = "kegg"
.
To cite your use of the Molecular Signatures Database (MSigDB), please reference Subramanian, Tamayo, et al. (2005, PNAS 102, 15545-15550) and one or more of the following as appropriate:
Liberzon, et al. (2011, Bionformatics);
Liberzon, et al. (2015, Cell Systems); and
The source for the gene set as listed on the gene set page.
# \donttest{
# these take a while to load initially, so put them in dontrun blocks.
# you should run these interactively to understand what they return
bcs <- getMSigCollection("h", "human", "entrez")
bcs.h.entrez <- getMSigCollection(c("h", "c2"), "human", "entrez")
bcs.h.ens <- getMSigCollection(c("h", "c2"), "human", "ensembl")
bcs.m.entrez <- getMSigCollection(c("h", "c2"), "mouse", "entrez")
gdb <- getMSigGeneSetDb("h", "human", "entrez")
# }