The various GeneSetDb data providers (MSigDb, KEGG, etc). limit the identifier types that they return. Use this function to map the given identifiers to whichever type you like.

convertIdentifiers(
  x,
  from = NULL,
  to = NULL,
  id.type = c("ensembl", "entrez", "symbol"),
  xref = NULL,
  extra.cols = NULL,
  allow.cartesian = FALSE,
  min_support = 3,
  top = TRUE,
  ...
)

# S4 method for class 'BiocSet'
convertIdentifiers(
  x,
  from = NULL,
  to = NULL,
  id.type = c("ensembl", "entrez", "symbol"),
  xref = NULL,
  extra.cols = NULL,
  allow.cartesian = FALSE,
  min_support = 3,
  top = TRUE,
  ...
)

# S4 method for class 'GeneSetDb'
convertIdentifiers(
  x,
  from = NULL,
  to = NULL,
  id.type = c("ensembl", "entrez", "symbol"),
  xref = NULL,
  extra.cols = NULL,
  allow.cartesian = FALSE,
  min_support = 3,
  top = TRUE,
  ...
)

Arguments

x

The GeneSetDb with identifiers to convert

from, to

If you are doing identifier and/orspecies conversion using babelgene, to is the species you want to convert to, and from is the species of x. If you are only doing id type conversion within the same species, specify the current species in from. If you are providing a data.frame map of identifiers in xref, to is the name of the column that holds the new identifiers, and from is the name of the column that holds the current identifiers.

id.type

If you are using babelgene conversion, this specifies the type of identifier you want to convert to. It can be any of "ensembl", "entrez", or "symbol".

xref

a data.frame used to map current identifiers to target ones.

extra.cols

a character vector of columns from to to add to the features of the new GeneSetDb. If you want to keep the original identifiers of the remapped features, include "original_id" as one of the values here.

allow.cartesian

a boolean used to temporarily set the datatable.allow.cartesian global option. If you are doing a 1:many map of your identifiers, you may trigger this error. You can temporarily turn this option/error off by setting allow.cartesian = TRUE. The option will be restored to its "pre-function call" value on.exit.

min_support, top

Parameters used in the internal call to babelgene::orthologs()

...

pass through args (not used)

Value

A new GeneSetDb object with converted identifiers. We try to retain any metadata in the original object, but no guarantees are given. If id_type was stored previously in the collectionMetadata, that will be dropped.

Details

For best results, provide your own identifier mapping reference, but we provide a convenience wrapper around the babelgene::orthologs() function to change between identifier types and species.

When there are multiple target id's for the source id, they will all be returned. When there is no target id for the source id, the soucre feature will be axed.

Methods (by class)

  • convertIdentifiers(BiocSet): converts identifiers in a BiocSet

  • convertIdentifiers(GeneSetDb): converts identifiers in a GeneSetDb

Custom Mapping

You need to provide a data.frame via the xref paramater that has a column for the current identifiers and another column for the target identifiers. The columns are specified by the from and to paramters, respectively.

Convenience identifier and species mapping

If you don't provide a data.frame, you can provide a species name. We will rely on the {babelgene} package for the conversion, so you will have to provide a species name that it recognizes.

Species and Identifier Conversion via babelgene

We plan to provide a quick wrapper to babelgene's ortholog mapping function to make identifier conversion a easier through this function. You can track this in sparrow issue #2.

Examples

# You can convert the identifiers within a GeneSetDb to some other type
# by providing a "translation" table. Check out the unit tests for more
# examples.
gdb <- exampleGeneSetDb() # this has no symbols in it

# Define a silly conversion table.
xref <- data.frame(
  current_id = featureIds(gdb),
  new_id = paste0(featureIds(gdb), "_symbol"))
gdb2 <- convertIdentifiers(gdb, from = "current_id", to = "new_id",
                           xref = xref, extra.cols = "original_id")
geneSet(gdb2, name = "BIOCARTA_AGPCR_PATHWAY")
#>    collection                   name active  N  n  feature_id original_id
#> 1          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 2778_symbol        2778
#> 2          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 2782_symbol        2782
#> 3          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 2792_symbol        2792
#> 4          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 2868_symbol        2868
#> 5          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA  408_symbol         408
#> 6          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5567_symbol        5567
#> 7          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5568_symbol        5568
#> 8          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5573_symbol        5573
#> 9          c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5575_symbol        5575
#> 10         c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5576_symbol        5576
#> 11         c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5577_symbol        5577
#> 12         c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5578_symbol        5578
#> 13         c2 BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA 5579_symbol        5579

# Convert entrez to ensembl id's using babelgene
if (FALSE) { # \dontrun{
# The conversion functionality via babelgene isn't yet implemented, but
# will look like this.

# 1. convert the human entrez identifiers to ensembl
gdb.ens <- convertIdentifiers(gdb, "human", id.type = "ensembl")

# 2. convert the human entrez to mouse entrez
gdb.entm <- convertIdentifiers(gdb, "human", "mouse", id.type = "entrez")

# 3. convert the human entrez to mouse ensembl
gdb.ensm <- convertIdentifiers(gdb, "human", "mouse", id.type = "ensembl")
} # }