Using `sleepIPD` to Reprocess Data

Introduction

The purpose of this package is to simplify the reprocessing of accelerometer data for project The relationship between sleep and physical activity across the lifespan. This project wraps functions from GGIR, along with functions to make collating the results easier.

Getting Started

This page will guide you through the process. How you reprocess the data will depend on a number of factors.

If you are familiar with R and intend to reprocess data on a single machine, follow the Quick Start guide.
If you have a large number of files and will be reprocessing on a cluster, read the Key GGIR Settings
If you are new to R (e.g., you would normally process data with ActiLife), this chapter is a good introduction. The key steps are to install R, install an IDE (I strongly recommend RStudio), and get familiar with how to run scripts. You can then move to the Quick Start guide.

Getting Help

The project is led by Dr Timothy Hartwig, and this package is developed by Dr Taren Sanders. Please reach out if you need any support.

Quick Start

One of the goals for this package was to make reprocessing data as easy as possible. If your study meets all of the criteria below, you should be able to reprocess your data with just a single call to reprocess().

Criteria

You intend to run the analysis on a single computer (i.e., you do not intend to use a cluster).
Your data are stored in a format that GGIR understands (Geneactiv .bin or .csv, Actigraph raw .csv, or Axivity .cwa).
Your folder structure contains only one of the above formats (i.e., your data are not duplicated in multiple formats).

Running with `reprocess()`

Installation

First, install the package. Since the package is not on CRAN, you will need devtools::install_github() or remotes::install_github(). You will need to install one of these packages if you do not already have one. I recommend devtools.

# install.packages("devtools") # uncomment if needed
devtools::install_github("Motivation-and-Behaviour/sleepIPD")

Reprocessing the Data

Next, provide sleepIPD with the information about how your data are stored, and information about the sample. The essential information is:

Parameter	Description
datadir	Where are the data stored?
outputdir	Where can we safely store the outputs? Note - do not use the same location and data previously processed in GGIR. You risk overwriting your existing outputs. A new output folder just for this study would be sensible.
studyname	What should we refer to this sample as? You can always use your last name.
ages	How old is the sample? Can be either `"children"` (for those 18 or less), `"adults"` (for >18) or both (i.e., `c("children", "adults")`). If you do not specify, both are used.
device	What device did you use? Can be either `"geneactiv"`, `"actigraph"` or (in the unlikely event two devices were used) both (i.e., `c("geneactiv", "actigraph")`). If you do not specify, both are used. Note that you can use `"geneactiv"` if your device was Axivity.
wear_location	How did the sample wear the device? Can be either `"wrist"` (the default) or `"hip"`. In the unlikely event multiple locations were used within the same study, you need to run the function twice, once with each location. Note that you would also need to ensure that they have different study names or output locations so they don’t collide.

An example call might look like:

library(sleepIPD)

datadir <- "C:\\Documents\\My Study\\Data"
outputdir <- "C:\\Documents\\My Study\\Output"
studyname <- "Hartwig 2021"
ages <- "children"
device <- "geneactiv"
wear_location <- "wrist"

reprocess(datadir = datadir,
          outputdir = outputdir,
          studyname = studyname,
          ages = ages,
          device = device,
          wear_location = wear_location)

Note that if you discover you have made a mistake and need to run this again, you’ll need to additionally set overwrite = TRUE.

What Does `reprocess()` Do?

reprocess() performs three steps, and moves the data between each step.

Step One: Reprocess Files

This is the largest step. Here, sleepIPD calls process_files(), which does the following:

Checks that you have the correct version of GGIR installed. sleepIPD can help you install the right version if it cannot find the correct version.
Determines the correct cut-points to apply to the data, based on information about your sample.
Calls GGIR::g.shell.GGIR() to reprocess the data. This is where >99% of the time will be spent.

Step Two: Collate the Outputs

GGIR can produce a lot of output files, but we only really need the summaries. Even so, depending on the configuration, there could be more than 1000 summary files. In this second step, sleepIPD calls collate_outputs(), which finds all of the output files and collapses them into just three files. These three files are stored in a new folder called sleepIPD_output, which will be located under the folder you specify in outputdir.

For example, if outputdir is "C:\\Documents\\My Study\\Output", the collated files will be stored in "C:\\Documents\\My Study\\Output\\sleepIPD_output".

Step Three: Create the Metadata Template

To perform the integrated analysis, we will need additional information about the sample such as demographics. To try to make this simpler, sleepIPD calls build_meta() which will create the metadata template file and pre-fill it with the names of the files, and the wear location. You can then add the remaining information in, either by matching it to an existing spreadsheet or filling it manually.

The metadata template will be located in the folder you specify as the outputdir, and will be named <studyname>-metadata.xlsx. The Metadata sheet contains the information to fill out, and the other sheets explain the expected format.

Key GGIR Settings

Processing

sleepIPD wraps GGIR::g.shell.GGIR() for the processing. If you are manually making calls to GGIR (i.e., not using this package), the settings are provide below. You must use GGIR Version 2.5-1 for this project. There is some evidence that different versions of GGIR can produce slightly different results. Please check your version before you begin (utils::packageVersion("GGIR")).

The full GGIR::g.shell.GGIR() call used in sleepIPD is:

GGIR::g.shell.GGIR(
  mode = c(1, 2, 3, 4, 5),
  datadir = datadir,
  outputdir = outputdir,
  do.report = c(2, 4, 5),
  overwrite = overwrite,
  do.enmo = TRUE,
  do.hfen = TRUE,
  do.enmoa = TRUE,
  studyname = studyname,
  # =====================
  # Part 2
  # =====================
  do.imp = TRUE,
  strategy = 2,
  maxdur = 0,
  includedaycrit = 16,
  qwindow = c(0, 6, 12, 18, 24),
  mvpathreshold = threshold_mod,
  bout.metric = 4,
  excludefirstlast = FALSE,
  includenightcrit = 16,
  MX.ig.min.dur = 10,
  iglevels = 50,
  ilevels = seq(0, 4000, 50),
  epochvalues2csv = FALSE,
  winhr = c(16,         # Most active 16 hours
            c(60 / 60), # Most active hour
            c(30 / 60), # Most active half hour
            c(15 / 60), # Most active 15 min
            c(10 / 60), # Most active 10 min
            c(5 / 60)   # Most active 5 min)
  ),
  qlevels = c(
    960 / 1440, # 2/3rds of the day
    1320 / 1440, # Top 120min
    1380 / 1440, # Top 60min
    1410 / 1440, # Top 30min
    1425 / 1440, # Top 15min
    1435 / 1440 # Top 5min
  ),
  # =====================
  # Part 3 + 4
  # =====================
  def.noc.sleep = 1,
  outliers.only = TRUE,
  criterror = 4,
  do.visual = TRUE,
  # =====================
  # Part 5
  # =====================
  threshold.lig = threshold_lig,
  threshold.mod = threshold_mod,
  threshold.vig = threshold_vig,
  boutcriter = 0.8, boutcriter.in = 0.9, boutcriter.lig = 0.8,
  boutcriter.mvpa = 0.8, boutdur.in = c(1, 10, 30), boutdur.lig = c(1, 10),
  boutdur.mvpa = c(1),
  includedaycrit.part5 = 2 / 3,
  # =====================
  # Visual report
  # =====================
  timewindow = c("WW", "MM"),
  visualreport = FALSE,
  ...
)

Note that datadir, outputdir, overwrite, and studyname are set elsewhere in sleepIPD. The thresholds (i.e., threshold_lig, threshold_mod, and threshold_vig) will depend on the device you are using, the age of the sample, and where the accelerometer was worn. See details below.

Thresholds

Study Design			Thresholds			Sources
Age¹	Device²	Worn on	Light	Moderate	Vigorous	Sources
children	geneactiv	wrist	51.6	191.6	695.8	Light: Hurter et al., 2018 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
children	actigraph	wrist	48.1	201.4	707	Light: Hurter et al., 2018 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
children	geneactiv	hip	64.1	152.8	514.3	Light: Hildebrand et al., 2016 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
children	actigraph	hip	32.6	142.6	464.6	Light: Hurter et al., 2018 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
adults	geneactiv	wrist	45.8	93.2	418.3	Light: Hildebrand et al., 2016 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
adults	actigraph	wrist	44.8	100.6	428.8	Light: Hildebrand et al., 2016 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
adults	geneactiv	hip	46.9	68.7	266.8	Light: Hildebrand et al., 2016 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
adults	actigraph	hip	47.4	61.1	258.7	Light: Hildebrand et al., 2016 Mod: Hildebrand et al., 2014 Vig: Hildebrand et al., 2014
¹ Children: ≤18 years; Adults: > 18 years
² Geneactiv thresholds can also be applied to Axivity devices.

If you have data that would fall into more than one row (e.g., your sample includes both children and adults), you can add multiple cut-points and we will extract the right values later based on the metadata.

As an example, if your study included children wearing geneactiv devices on their wrists, you would include the following lines in your GGIR call:

...
  threshold.lig = c(51.6),
  threshold.mod = c(191.6),
  threshold.vig = c(695.8),
...

If your study included both children and adults wearing geneactiv devices on their wrists, your call would include:

...
  threshold.lig = c(45.8, 51.6),
  threshold.mod = c(93.2, 191.6),
  threshold.vig = c(418.3, 695.8),
...

Using `sleepIPD` Helper Functions

Even if you do not use sleepIPD for the reprocessing, you may still find the helper functions useful. The two main helper functions are collate_outputs() and build_meta().

`collate_outputs()`

collate_outputs() will find all of the Part 2, 4, and 5 files, collapse them so that there is a one file for each part, and move that file to the top level of the output directory. This reduces the number of files you need to submit for the analysis. You can use collate_outputs() like so:

collate_outputs(
  outputdir = "C:/MyFiles", # The location you used in GGIR as outputdir
  studyname = "Hartwig 2016" # A study name. Author surname and year is fine
                )

`build_meta()`

build_meta() will create a metadata template for you. Additionally, it will search the output of collate_outputs() for the names of files, and pre-fill the metadata template with this information. The goal is to reduce the rate of unmatched data, which was a problem in previous pooled analyses. You must use collate_outputs() before using build_meta().

build_meta(
  outputdir = "C:/MyFiles", # Must be the same as that used in collate_outputs()
  studyname = "Hartwig 2016" # Must be the same as that used in collate_outputs()
)

If you have only a single wear location, you can (optionally) provide this (e.g., wear_location="wrist") in order to pre-fill it.

If you have previously used build_meta and need to run it again, you need to specify overwrite = TRUE. Be warned that this will replace the existing metadata template.

Using sleepIPD to Reprocess Data