R/convertIdentifiers.R
convertIdentifiers.Rd
The various GeneSetDb data providers (MSigDb, KEGG, etc). limit the identifier types that they return. Use this function to map the given identifiers to whichever type you like.
convertIdentifiers(
x,
from = NULL,
to = NULL,
id.type = c("ensembl", "entrez", "symbol"),
xref = NULL,
extra.cols = NULL,
allow.cartesian = FALSE,
min_support = 3,
top = TRUE,
...
)
# S4 method for class 'BiocSet'
convertIdentifiers(
x,
from = NULL,
to = NULL,
id.type = c("ensembl", "entrez", "symbol"),
xref = NULL,
extra.cols = NULL,
allow.cartesian = FALSE,
min_support = 3,
top = TRUE,
...
)
# S4 method for class 'GeneSetDb'
convertIdentifiers(
x,
from = NULL,
to = NULL,
id.type = c("ensembl", "entrez", "symbol"),
xref = NULL,
extra.cols = NULL,
allow.cartesian = FALSE,
min_support = 3,
top = TRUE,
...
)
The GeneSetDb with identifiers to convert
If you are doing identifier and/orspecies conversion using
babelgene, to
is the species you want to convert to, and from
is the
species of x
. If you are only doing id type conversion within the same
species, specify the current species in from
.
If you are providing a data.frame map of identifiers in xref
, to
is
the name of the column that holds the new identifiers, and from
is the
name of the column that holds the current identifiers.
If you are using babelgene conversion, this specifies the
type of identifier you want to convert to. It can be any of "ensembl"
,
"entrez"
, or "symbol"
.
a data.frame used to map current identifiers to target ones.
a character vector of columns from to
to add to the
features of the new GeneSetDb. If you want to keep the original identifiers
of the remapped features, include "original_id"
as one of the values
here.
a boolean used to temporarily set the
datatable.allow.cartesian
global option. If you are doing a 1:many
map of your identifiers, you may trigger this error. You can temporarily
turn this option/error off by setting allow.cartesian = TRUE
. The
option will be restored to its "pre-function call" value on.exit
.
Parameters used in the internal call to
babelgene::orthologs()
pass through args (not used)
A new GeneSetDb object with converted identifiers. We try to retain
any metadata in the original object, but no guarantees are given. If
id_type
was stored previously in the collectionMetadata, that will be
dropped.
For best results, provide your own identifier mapping reference, but we
provide a convenience wrapper around the babelgene::orthologs()
function to
change between identifier types and species.
When there are multiple target id's for the source id, they will all be returned. When there is no target id for the source id, the soucre feature will be axed.
convertIdentifiers(BiocSet)
: converts identifiers in a BiocSet
convertIdentifiers(GeneSetDb)
: converts identifiers in a GeneSetDb
You need to provide a data.frame via the xref
paramater that has a column
for the current identifiers and another column for the target identifiers.
The columns are specified by the from
and to
paramters, respectively.
If you don't provide a data.frame, you can provide a species name. We will
rely on the {babelgene}
package for the conversion, so you will have to
provide a species name that it recognizes.
We plan to provide a quick wrapper to babelgene's ortholog mapping function to make identifier conversion a easier through this function. You can track this in sparrow issue #2.
# You can convert the identifiers within a GeneSetDb to some other type
# by providing a "translation" table. Check out the unit tests for more
# examples.
gdb <- exampleGeneSetDb() # this has no symbols in it
# Define a silly conversion table.
xref <- data.frame(
current_id = featureIds(gdb),
new_id = paste0(featureIds(gdb), "_symbol"))
gdb2 <- convertIdentifiers(gdb, from = "current_id", to = "new_id",
xref = xref, extra.cols = "original_id")
geneSet(gdb2, name = "BIOCARTA_AGPCR_PATHWAY")
#> collection name active N n feature_id original_id
#> 1 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 2778_symbol 2778
#> 2 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 2782_symbol 2782
#> 3 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 2792_symbol 2792
#> 4 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 2868_symbol 2868
#> 5 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 408_symbol 408
#> 6 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5567_symbol 5567
#> 7 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5568_symbol 5568
#> 8 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5573_symbol 5573
#> 9 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5575_symbol 5575
#> 10 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5576_symbol 5576
#> 11 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5577_symbol 5577
#> 12 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5578_symbol 5578
#> 13 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA 5579_symbol 5579
# Convert entrez to ensembl id's using babelgene
if (FALSE) { # \dontrun{
# The conversion functionality via babelgene isn't yet implemented, but
# will look like this.
# 1. convert the human entrez identifiers to ensembl
gdb.ens <- convertIdentifiers(gdb, "human", id.type = "ensembl")
# 2. convert the human entrez to mouse entrez
gdb.entm <- convertIdentifiers(gdb, "human", "mouse", id.type = "entrez")
# 3. convert the human entrez to mouse ensembl
gdb.ensm <- convertIdentifiers(gdb, "human", "mouse", id.type = "ensembl")
} # }