SomaDataIO 6.3.0
CRAN release: 2025-05-06
New Functions
- Added
preProcessAdat()function- added new function
preProcessAdat()to filter features, filter samples, generate data QC plots of normalization scale factors by covariates, and perform standard analyte RFU transformations including log10, centering, and scaling
- added new function
- Added
calcOutlierMap()function- added
calcOutlierMap()and its print and plot S3 methods, along withgetOutlierIds()for identifying sample level outliers from outlier map object - added
ggplot2as a package dependency
- added
Function and Object Improvements
- Added
ex_clin_dataobject- a
tibbleobject with additional sample annotation fieldssmoking_statusandalcohol_useto demonstrate merging to asoma_adatobject
- a
Documentation Updates
- Added pre-processing vignette article
- includes guidance on pre-processing SomaScan data for a typical analysis
- provides an example of recommended workflow of filtering features, filtering samples, performing data QC checks, and transformations of RFU features
- introduces usage of the
preProcessAdat()function
- Improved adat ingest documentation in
README- added comments to clarify file path input to
read_adat()example inREADME
- added comments to clarify file path input to
- Updated stat workflow articles to begin with reading in adat
- updated data preparation chunks with comments about how to download and read in the the
example_data.adatobject - data preparation chunks now use the
preProcessAdat()function for pre-processing
- updated data preparation chunks with comments about how to download and read in the the
- Added sample annotation merging guidance
- updated
READMEand loading and wrangling vignette article with section including code to join theex_clin_dataobject to theexample_dataadat
- updated
Internal 🚧
- Added helper utility functions for snapshot plot unit tests
- added helper utility functions
figure(),close_figure(),save_png(), andexpect_snapshot_plot()for saving plot snapshot output totestthat/helper.R - added snapshot unit tests for
preProcessAdat()messaging, print and QC plot output
- added helper utility functions
SomaDataIO 6.2.0
CRAN release: 2025-02-06
New Functions
- Added
calc_eLOD()function (#131)- calculates the estimated limit of detection (eLOD) for SeqId columns of an input
soma_adatordata.frame
- calculates the estimated limit of detection (eLOD) for SeqId columns of an input
Bug Fixes
- Fixed
crayonbug andui_bullet()issue (#129, #130)- removed
crayonandusethisas dependencies in favor ofcli - fixed bug in R version 4.4.1 with
ui_bullet()internal calls withinloadAdatsAsList()andwrite_adat()
- removed
- Fixed bug in
Summary.soma_adat()operations (#121)- these operations:
min(),max(),any(),range(), etc. would return the incorrect value due to anas.matrix()conversion under the hood - now skips that conversion, trips a warning, and carries on
- triggers an error if non-numerics are passed as part of the ‘…’ outside of a
soma_adat, just likeSummary.data.frame()
- these operations:
Function and Object Improvements
-
collapseAdats()now maintains Cal.Set entries of Col.Meta (#113)- collapsing ADATs can be problematic for the attributes, especially for large numbers of ADATs
-
collapseAdats()now attempts to smartly merge the (potentially numerous elements) Col.Meta attribute in the final object, preserving the “Cal.Set” and “ColCheck” columns in particular - the resulting
Col.Metaattribute is a combined product of the individual ADAT elements, and the intersect of the analyte features (as is the case for therbind()that is called)
- Updated checksums and versions for Annotations Excel files (#116)
- updated the 7k and 11k file versions and md5sum checksums
- now allows
read_annotations()to load the individual Excel files
- Updated
lift_masterobject to alpha sort columns
Documentation Updates
- Updated company name, license year, and maintainer (#137)
- SomaLogic Operating Co., Inc is now Standard BioTools, Inc.
- updated license and copyright year to 2025
- updated package maintainer to Caleb Scheidel
- Updated article links in README, intro vignette (#123)
- updated links to articles in README and introduction vignette to URLs to pkgdown website rather than
vignette()code references - added clarification to above documents that articles are available on website only rather than traditional vignettes included with package
- updated links to articles in README and introduction vignette to URLs to pkgdown website rather than
- Updates to example documentation
-
read_annotations()example documentation now points to the most recent 11k Excel annotations file -
parseHeader()example now prints list elements separately, rather than full object, which slowed website rendering
-
Internal 🚧
- Updates to GitHub Action workflows
- added
rhub.yamlconfiguration file to comply withrhubv2 - updated macOS version in
pkgdown.yamlto macOS-14 - added write permission to
pkgdown.yamlfile to enable deployment - changed GitHub Action R checks to MacOS and Windows only
-
ubuntumachine was taking too long to build
-
- added
- Increased package test coverage
- added unit tests for
getSomaScanLiftCCC(),parseCheck()and release utilities which were previously untested - increased test coverage for
pivotExpressionSet()
- added unit tests for
- Added missing package anchors to .Rd files (#139)
- fixed note from remote windows check related to Rd targets missing package anchors
- Updated README badge (#109)
- now shows ‘downloads’ per month over total downloads
- Fixed link in DESCRIPTION; master -> main (#107)
SomaDataIO 6.1.0 🥳 🍾
CRAN release: 2024-03-26
Lifting Code 🚀
- Major restructure of
lift_adat()functionality (@stufield, #81, #78)-
lift_adat()now takes abridge =argument, replacing theanno.tbl =argument. Lifting is now performed internally for a better (and safer) user experience, without the necessity of an external annotations (Excel) file. - the majority of this refactoring was internal and the user should not experience a major disruption to the API.
- much improved lifting/bridging documentation (#82)
-
- Added a new lifting and bridging vignette (@stufield, #77)
- in addition to the improved lifting documentation this new vignette provides additional context, explanation, clear examples, and lifting guidance.
New Functions ✨
is_lifted()is new and returns a boolean according to whether the signal space (RFU) has been previously lifted-
Lifting accessor function for Lin’s CCC values (#88)
-
getSomaScanLiftCCC()accesses the lifting correlations between SomaScan versions for each analyte - returns a
tibblesplit by sample matrix (serum or plasma)
-
-
merge_clin()is newly exported (#80)- a thin wrapper that allows users to merge clinical variables to
soma_adatobjects easily - previously users had to either use the CLI merge tool or merge in clinical variables themselves with
dplyr
- a thin wrapper that allows users to merge clinical variables to
-
Newly exported ADAT “get**” helpers (#83)
- functions to access properties of ADATs
-
getAdatVersion()gets a new S3 method (#92)- this enables passing of different objects
- namely
soma_adatorlistdepending on the situation
-
Newly exported functions that were previously internal only:
New Vignettes 🤓
- The package
READMEis now simplified (#35)- example analysis workflows are now split out into their own vignettes/articles and cross-linked in the
README
- example analysis workflows are now split out into their own vignettes/articles and cross-linked in the
- Reorganization and expansion of statistical vignettes (#35, #47)
- moved 3 existing statistical examples from
READMEinto their own vignettes - resulting in four new “Statistical Workflow” vignettes/articles:
- Binary classification via logistic regression
- Linear regression for continuous variables
- Two-group comparison via t-test
- Three-group analysis ANOVA
- moved 3 existing statistical examples from
- Added new general analysis workflow vignettes
- articles for the pkgdown website have been built out
- new articles on:
- safely mapping values among variables
- safely renaming a data frame
- loading-and-wrangling
- typical train and test data splits
- beginning the FAQs and/or Coming Soon pages
- Added a new vignette describing how to use the command-line interface merge tool (#45)
- the new CLI merge tool used to add new clinical data to existing ADAT file
Updates and Improvements 🔨
-
collapseAdats()better combinesHEADERinformation (#86)- certain information, e.g.
PlateScaleandCal*, are better maintained in the final collapsed ADAT - other entries are combined by pasting into a single string
- should result in less duplication of superfluous entries and retention of more “useful”
HEADERinformation in the resulting (collapsed)soma_adat
- certain information, e.g.
Update
read_annotations()with11kcontent (#85)-
Update
transform()andscaleAnalytes()-
scaleAnalytes()(internal) now skips missing references and is much more like a “step” in therecipespackage -
transform()gets edge case protection withdrop = FALSEin case a single-analytesoma_adatis scaled.
-
-
New
row.names()S3 method support forsoma_adatclass- dispatched on calls to
rownmaes() - rather than calling
NextMethod()which normally would invokedata.frame, we now force thedata.framemethod in case there aretbl_dforgrouped_dfclasses present that would be dispatched. Those are bypassed in favor of thedata.framebecausetbl_df1) can nuke the attributes, 2) triggers a warning about adding rownames to atibble.
- dispatched on calls to
-
New
grouped_dfS3 print support for the groupedsoma_adat- now displays Grouping information from a call to the S3 print method for
soma_adatclass
- now displays Grouping information from a call to the S3 print method for
-
New
grouped_dfS3 method support forsoma_adatclass (#66)-
grouped_dfdata objects previously unsupported and were interfering with downstream S3 methods fordplyrverbs onceNextMethod()was called - this support now ensures that the group methods are maintained, as well as the
soma_adatclass itself (and most importantly, with its attributes intact)
-
-
tidyr::separate.soma_adat()S3 method was simplified (#72)- now uses
%||%helper internally - expanded error messages inside
stopifnot()to be more informative
- now uses
-
is_intact_attr()is now much quieter, signaling only when called indirectly (#71)- new conditional logic to silences signaling messages when called from within another function (indirectly)
- these previously lead to confusing messages when they appear in wrappers, where
is_intact_attr()can be, sometimes deeply, nested in the call stack
-
Development and improvements to the
pkgdownwebsite- added new links and improved clarity in YAML
- added new logo at footer
- restyled side bar for easier hyperlinking and getting help
- clicking on the SomaLogic logo in the GitHub
READMEnow links to thepkgdownwebsite - new “Coming Soon” drop-down section in the website header to let users know about active progress (but not yet ready for external publication)
-
SomaDataIOno longer depends ondescpackage- to generate the
README.md
- to generate the
Internal 🚧
- Internal rowname helpers were upgraded
- they now use internal cross-functions as originally intended to avoid redundancy, efficiency, and improved debugging
-
sysdata.rdano longer contains non-exported functions (#59)- new internal helper functions:
convertColMeta()genRowNames()parseCheck()syncColMeta()scaleAnalytes()
- new internal helper functions:
- Bug-fix for corner-case writing a single-analyte ADAT (#51)
- RFU values are rounded to 1 decimal place when written by
write_adat(), via a call toapply(), which expects a 2-dim object when replacing those values. -
write_adat()no longer usesapply()and instead converts the entire RFU data frame to a matrix (maintains original dimensions), and use vectorized format conversion viasprintf() - in theory this should be faster because
sprintf()is only called once on a long vector, rather than 1000s of times on shorter vectors (insideapply()).
- RFU values are rounded to 1 decimal place when written by
- Fixed missing closing parenthesis in
SomaScanObjects.R(thanks @Hijinx725!, #40)
SomaDataIO 6.0.0 🎉
CRAN release: 2023-03-15
- We are now on CRAN! 🥳
New changes
- New clinical data merge CLI tool (@stufield, #25)
-
Rscript --vanilla merge_clin.Rfor merging clinical variables into existing*.adatSomaScan data files - added 2 new example
meta.csvandmeta2.csvfiles to run examples with random data but with valid index keys - see
dir(system.file("cli", "merge", package = "SomaDataIO"))
-
- Package data objects (@stufield, #32)
-
example_data.adatwas reduced in size ton = 10samples (from 192) to conform to CRAN size requirements (< 5MB) - the current file was renamed
example_data10.adatto reflect this change - this likely has far-reaching consequences for users who access this flat file via
system.file() - the
example_dataobject itself however remains true to its original file (https://github.com/SomaLogic/SomaLogic-Data/blob/master/example_data.adat) - the directory location
inst/example/was renamedinst/extdata/to conform to CRAN package standard naming conventions - the file
single_sample.adatwas removed from package data as it is now redundant (however still used in unit testing) -
SomaDataObjectswas renamed and is nowSomaScanObjects
-
- Gradual deprecation (@stufield)
-
read.adat()is now soft-deprecated; please useread_adat() instead - lifecycle for soft-deprecated
warn()->stop()for functions that have been been soft deprecated sincev5.0.0
-
- New S3 print method default (@stufield)
-
tibblehas newmax_extra_cols =argument, which is set to6for theprint.soma_adatmethod
-
- New S3 merge method (@stufield, #31)
- calling
base::merge()on asoma_adatis strongly discouraged - we now redirect users to use
dplyr::*_join()alternatives which are designed to preservesoma_adatattributes
- calling
- Code hardening for
prepHeaderMeta()(@stufield)- some ADATs do not have
CreatedDateandCreatedByin the HEADER entry. This currently breaks the writer - simplified to make more robust but also refactor to be more convenient (for abnormal ADATs not generated by standard SomaScan processing)
-
CreatedDateHistorywas removed as an entry from written ADATs -
CreatedByHistorywas combined and dated for written ADATs -
NULLbehavior remains if keys are missing -
CreatedByandCreatedDatewill be generated either as new entries or over-written as appropriate
- some ADATs do not have
- Numerous non-user-facing (API) changes internal package maintenance, efficiency, and structural upgrades were included
SomaDataIO 5.3.1
-
Bug-fix release related to
write_adat():- fixed bug in
write_adat()that resulted from adding/removing clinical (non-SomaScan) variables to an ADAT. Export viawrite_adat()resulted in a broken ADAT file (@stufield, #18) -
write_adat()now has much higher fidelity to original text file (*.adat) in full-cycle read-write-read operations; particularly in presence of bangs (!) in the Header section and in floating point decimals in the?Col.Metasection -
write_adat()no longer converts commas (,) to semi-colons (;) in the?Col.Metablock (originally introduced to avoid cell alignment issues in*.csvformats) -
write_adat()no longer concatenates written ADATs, when writing to the same file. Data is over-written to file to avoid mangled ADATs resulting from re-writing to the same connection and to match the default behavior ofwrite.table(),write.csv(), etc.
- fixed bug in
read_adat()now has more consistent character type theBarcode2variable in standard ADATs, now forcescharacterclass, does not allow R’sread.delim()to “guess” the type-
Decreased dependency of
magrittrpipes (%>%) in favor of the native R pipe (|>). As a result the package now depends onR >= 4.1.0-
SomaDataIOwill continue to re-exportmagrittrpipes for backward compatibility, but this should not be considered permanent. Please code accordingly
-
Migration to the default branch in GitHub from
master->main(@stufield, #19)Numerous non-user-facing (API) changes internal package maintenance, efficiency, and structural upgrades were included
SomaDataIO 5.3.0
-
Upgrades primarily from improvements to SomaLogic internal code base, including: (@stufield)
- general reduction on external package dependency to improve code stability
- internal usage of base R alternatives to the
readrpackage for parsing and importing ADATs (e.g.read.delim()overreadr::read_delim()). This is mostly for code simplification, but can often result in marked speed improvements. As the SomaScanplexsize increases, this speed improvement will become more important. -
parseHeader()was dramatically simplified, now reading in lines 20L at a time until the RFU block is reached. In addition, once the block is reached, all header lines are read-in once and indexed (as opposed to line-by-line). -
read_adat()now specifies column types viacolClasses =which for the majority of the ADAT is typedoublefor the RFU columns. This should dramatically improve speed of ingest. -
write_adat()was simplified internally, with fewer nestedapplyand for-loops. - encoding for all input/output (I/O) is assumed to be
UTF-8.
New
getAnalytes()S3 method for classrecipefrom therecipespackage.New
loadAdatsAsList()to load multiple ADAT files in a single call and optionally collapse them into a single data frame (@stufield, #8).New
getTargetNames()function to map ADATseq.XXXX.XXnames to corresponding protein targets from the annotations table
SomaDataIO 5.2.0
SomaLogic Inc. is now SomaLogic Operating Co. Inc.
-
Added new documentation regarding
Col.Meta(@stufield, #12).- documentation around column meta data, row meta data, where they are found in an ADAT, and how to access them.
Research Use Only (“RUO”) language was added to the README (@stufield, #10).
-
Numerous internal code improvements from SomaLogic code-base (@stufield)
- the consisted of reducing usage of external dependencies, e.g. using
stop()overui_stop()andwarning()overui_warn(), usingusethis,cli, andcrayonshims aliases. - package uses
purrrvery selectively and no longer usesstringr. - using base R alternatives in favor of increased stability for underlying, non-user-facing code.
- the consisted of reducing usage of external dependencies, e.g. using
-
New
lift_adat()was added to provided ‘lifting’ functionality (@stufield, #11)- provides mechanism to convert RFU space between SomaScan versions (e.g. v4.1 -> v4.0).
- added new S3
transform.soma_adat()method which simplifies linear scaling ofsoma_adatcolumns (analytes). - uses an “Annotations file” (Excel) as source of scalars for transformation.
-
Minor improvements and updates to the
README.Rmd(@stufield, #7)- fixed a broken
adat2eSet()link in README (#5). - clearer text to the
READMEregardingBiobaseinstallation. - added new links to external Bioconductor website in installation section of README.
- new
pkgdownand links to Issues (#4). - SomaLogic logo was added to README.
- a lifecycle (“maturing”) badge was added.
- fixed a broken
Startup message was improved with dynamic width (@stufield).
New
locateSeqId()function to pull outSeqIdregex. (@stufield).-
New
read_annotations()function (@stufield, #2)- new function to parse/import SomaLogic annotations files (
*.xlsx).
- new function to parse/import SomaLogic annotations files (
SomaDataIO 5.1.0
New
set_rn()drop-in replacement formagrittr::set_rownames()getFeatures()was renamed to be less ambiguous and better align with internal SomaLogic code usage. Now usegetAnalytes()(@stufield)getFeatureData()was also renamed togetAnalyteInfo()(@stufield)various upgrades as required by code changes in external package dependencies, e.g.
tidyverse.new alias for
read_adat(),read.adat(), for backward compatibility to previous versions ofSomaDataIO(@stufield)
