select
is the primary data retrieval method for the SomaScan.db
database. select
will retrieve a data frame of SomaScan annotations based
on the parameters provided by the keys
, columns
, and keytype
arguments. The default keytype is "PROBEID", e.g. the SomaLogic SeqId
;
this value is used to tie all annotations back to a SomaScan-specific
identifier.
Arguments
- x
the
AnnotationDb
object. But in practice this will mean an object derived from anAnnotationDb
object such as aOrgDb
orChipDb
object.- keys
the keys to select records for from the database. All possible keys are returned by using the
keys
method.- columns
the columns or kinds of things that can be retrieved from the database. As with
keys
, all possible columns are returned by using thecolumns
method.- keytype
the keytype that matches the keys used. For the
select
methods, this is used to indicate the kind of ID being used with the keys argument. For thekeys
method this is used to indicate which kind of keys are desired fromkeys
- menu
a character string identifying a SomaScan menu version (optional). Possible options include:
"5k"
,"7k"
, or"11k"
, as well as the version numbers for those menus ("v4.0"
,"v4.1"
, or"v5.0"
, respectively). May only be used whenkeytype = "PROBEID"
. This argument will filter the keys to the specified menu and only return data associated with analytes present in that menu. By default, all annotations from all analytes are available.- match
a logical (TRUE/FALSE). Must be used with the "SYMBOL", "ALIAS", or "GENENAME" keytypes only. If true, the character string provided for
keys
will be used as a search term. The string will be used to match symbols that also start with that string (ex. a key of "CASP1" will return annotations for both the CASP10 & CASP14 genes).- ...
Arguments passed on to
AnnotationDbi::select
Details
Users should be aware that if they call select
and request columns that
have multiple matches for the provided keys (e.g. GO terms),
select
will return a data.frame
with one row for each possible match.
This can have a multiplicative effect and result in a large number of
returned values. In general, if a user needs to retrieve a column that has
a many-to-one relationship to the original keys, it is best to extract data
from that column in its own query.
Examples
# Retrieve a set of example keys
keys <- head(keys(SomaScan.db))
keys
#> [1] "10000-28" "10001-7" "10003-15" "10006-25" "10008-43" "10010-10"
# Look up the gene symbol and gene type for all example keys
select(SomaScan.db, keys = keys, columns = c("SYMBOL", "GENETYPE"))
#> 'select()' returned 1:1 mapping between keys and columns
#> PROBEID SYMBOL GENETYPE
#> 1 10000-28 CRYBB2 protein-coding
#> 2 10001-7 RAF1 protein-coding
#> 3 10003-15 ZNF41 protein-coding
#> 4 10006-25 ELK1 protein-coding
#> 5 10008-43 GUCA1A protein-coding
#> 6 10010-10 BECN1 protein-coding
# Look up SomaScan SeqIds & proteins associated with a gene of interest
select(SomaScan.db, keys = "NOTCH3", keytype = "SYMBOL",
columns = c("PROBEID", "UNIPROT"))
#> 'select()' returned 1:many mapping between keys and columns
#> SYMBOL PROBEID UNIPROT
#> 1 NOTCH3 5108-72 Q9UEB3
#> 2 NOTCH3 5108-72 Q9UM47
#> 3 NOTCH3 5108-72 Q9UPL3
#> 4 NOTCH3 5108-72 Q9Y6L8