This provides versioned genesets from gene set collections defined in MSigDB. Collections can be retrieved by their collection name, ie c("H", "C2", "C7").

getMSigCollection(
  collection = NULL,
  species = "human",
  id.type = c("ensembl", "entrez", "symbol"),
  with.kegg = FALSE,
  promote.subcategory.to.collection = FALSE,
  prefix.collection = FALSE,
  ...
)

getMSigGeneSetDb(
  collection = NULL,
  species = "human",
  id.type = c("ensembl", "entrez", "symbol"),
  with.kegg = FALSE,
  promote.subcategory.to.collection = FALSE,
  prefix.collection = FALSE,
  ...
)

Arguments

collection

character vector specifying the collections you want (c1, c2, ..., c7, h). By default we load just the hallmark collecitons. Setting this to NULL loads all collections. Alternative you can also include named subsets of collections, like "reactome". Refer to the Details section for more information.

species

"human" or "mouse"? Really, this is anything available in the alias column of the sparrow:::species_info() table (except cyno).

id.type

do you want the feature id's used in the gene sets to be "ensembl" (default), "entrez", or "symbol".

with.kegg

The Broad distributes the latest versions of the KEGG genesets as part of the c2 collection. These genesets come with a restricted license, so by default we do not return them as part of the GeneSetDb. To include the KEGG gene sets when asking for the c2 collection, set this flag to TRUE.

promote.subcategory.to.collection

there are different sources of genesets for a number of the collections in MSigDB. These are included in the gs_subcat column of geneSets(this). When this is set to TRUE, the collection column for the genesets is appended with the subcatory. So, instead of having a massive "C2" collection, you'll have bunch of collections like "C2_CGP", "C2_CP:BIOCARTA", etc.

prefix.collection

When TRUE (default: FALSE), the "C1", "C2", etc. is prefixed with "MSigDB_*"

...

pass through parameters

Value

a BiocSet of the MSigDB collections

Functions

  • getMSigGeneSetDb(): retrieval method for a GeneSetDb container

Species and Identifier types

This function utilizes the functionality from the {msigdbr} and {babelgene} packages to retrieve gene set definitions from a variety of organisms and identifier types.

KEGG Gene Sets

Due to the licensing restrictions over the KEGG collections, they are not returned from this function unless they are explicitly asked for. You can ask for them through this function by either (i) querying for the "c2" collection while setting with.kegg = TRUE; or (ii) explicitly calling with collection = "kegg".

Citing the Molecular Signatures Database

To cite your use of the Molecular Signatures Database (MSigDB), please reference Subramanian, Tamayo, et al. (2005, PNAS 102, 15545-15550) and one or more of the following as appropriate:

  • Liberzon, et al. (2011, Bionformatics);

  • Liberzon, et al. (2015, Cell Systems); and

  • The source for the gene set as listed on the gene set page.

Examples

# \donttest{
  # these take a while to load initially, so put them in dontrun blocks.
  # you should run these interactively to understand what they return
  bcs <- getMSigCollection("h", "human", "entrez")
  bcs.h.entrez <- getMSigCollection(c("h", "c2"), "human", "entrez")
  bcs.h.ens <- getMSigCollection(c("h", "c2"), "human", "ensembl")
  bcs.m.entrez <- getMSigCollection(c("h", "c2"), "mouse", "entrez")

  gdb <- getMSigGeneSetDb("h", "human", "entrez")
# }