## Sunday, 28 August 2011

### ACE model specification in OpenMx

Now we are ready to specify the ACE model to test which sources of variance significantly contribute to the phenotype and estimate their best value. The structure of this script mimics that of the saturated model. The main difference is that we no longer estimate the variance-covariance matrix directly, but express it as a function of the three sources of variance, A, C and E. As the same sources are used for the MZ and the DZ group, the matrices which will represent them are part of the ‘super’ model. As these sources are variances, which need to be positive, we use a Cholesky decomposition of the standard deviations (and effectively estimate a rather then a^2; see later for more in-depth coverage). Thus, we specify three separate matrices for the three sources of variance using the mxMatrix command and calculate the variance components with the mxAlgebra command. Note that there are a variety of ways to specify this model; I have picked one that corresponds well to previous Mx code, and has some intuitive appeal.

#---------------------------------------------------------------------------------------------------------
#                                                  ACE Model
#   DVM Bishop 13th March 2010; based on p 18, OpenMx manual
#---------------------------------------------------------------------------------------------------------
#read in previously saved data created with Twin Simulate script
myDataMZ=mytwindata[,1:2]  #columns 1-2 are MZ twin1 and twin2
myDataDZ=mytwindata[,3:4]  #columns 3-4 are DZ twin 1 and twin 2
colnames(myDataMZ)=c("twin1","twin2")
mylabels=colnames(myDataMZ)

myACEModel <- mxModel("twinACE",
# Matrix expMean for expected mean vector for MZ and DZ twins
mxMatrix(type="Full",
nrow= 1,
ncol=2,
free=TRUE,
values=0,
labels= "mean",
name="expMean"),
# Matrices X, Y, and Z to store the a, c, and e path coefficients
mxMatrix(type="Full",
nrow=1,  #just one row and one column to estimate path a
ncol=1,
free=TRUE,
values=.6, #starting values
label="a",
name="X"),
mxMatrix(type="Full",
nrow=1,
ncol=1,
free=TRUE,
values=.6,
label="c",
name="Y"),
mxMatrix(type="Full",
nrow=1,
ncol=1,
free=TRUE,
values=.6,
label="e",
name="Z"),
# Matrixes A, C, and E to compute A, C, and E variance components
mxAlgebra(X * t(X), name="A"),
mxAlgebra(Y * t(Y), name="C"),
mxAlgebra(Z * t(Z), name="E"),

# Matrix expCOVMZ for expected covariance matrix for MZ twins
mxAlgebra(rbind(cbind(A+C+E, A+C), cbind(A+C, A+C+E)), name="expCovMZ"),
mxModel("MZ",
mxData(myDataMZ, type="raw"),
mxFIMLObjective("twinACE.expCovMZ", "twinACE.expMean",dimnames=colnames(myDataMZ))),
# Matrix expCOVDZ for expected covariance matrix for DZ twins
mxAlgebra(rbind(cbind(A+C+E, .5%x%A+C), cbind(.5%x%A+C , A+C+E)), name="expCovDZ"),  #note use of Kroneker product here!
mxModel("DZ",

# Algebra to combine objective function of MZ and DZ groups
mxAlgebra(MZ.objective + DZ.objective, name="alltwin"),
mxAlgebraObjective("alltwin"))

mytwinACEfit <- mxRun(myACEModel)
#-------------------------------------------------------------------------------------------------------------------

Relevant output can be generated with print or summary statements, or specific output can be requested using the mxEval command. Typically we would compare this model back to the saturated model to interpret its goodness-of-fit. Parameter estimates are obtained and can easily be standardized. ACE_Model_Output is a script I wrote when learning about OpenMx, and it may be useful to look at to clarify how to access and format different outputs. Note, though that since this was written, better options for summarising model output have been developed, and you should check the current version of OpenMx for these.

#---------------------------------------------------------------------------------------------------------
#                                 ACE_Model_Output
#  DVM Bishop, 13th March 2010, based on OpenMxUserGuide, p. 18
#---------------------------------------------------------------------------------------------------------
# NB assumes you have run Saturated twin model, and ACE model,
# and have likelihood and DF from those models in memory

LL_ACE <- mxEval(objective, mytwinACEfit) # -2LL for ACE model from previous script

#compute DF: NB only works if no missing data!
msize=nrow(myDataMZ)*ncol(myDataMZ)
myDF_ACE=msize+dsize-nrow(mytwinACEfit@output\$standardErrors)

mychi_ACE= LL_ACE - LL_Sat  #subtract LL for Saturated model from LL for ACE
mychi_DF=myDF_ACE-myDF_Sat  #subtract DF for Saturated model from DF for ACE
mychi_p=1-pchisq(mychi_ACE,mychi_DF) # compute chi square probability

# Retrieve vectors of expected means and expected covariance matrices
myMZc <- mxEval(expCovMZ, mytwinACEfit)
myDZc <- mxEval(expCovDZ, mytwinACEfit)
myM <- mxEval(expMean, mytwinACEfit)

# Retrieve the unstandardized A, C, and E variance components
A <- mxEval(A, mytwinACEfit)
C <- mxEval(C, mytwinACEfit)
E <- mxEval(E, mytwinACEfit)

# Calculate standardized variance components
V <- (A+C+E)   # total variance
a2 <- A/V      # genetic term as proportion of total variance, i.e. standardized
c2 <- C/V      # shared environment term as proportion of total variance
e2 <- E/V      # nonshared environment term as proportion of total variance

# Build and print reporting table with row and column names

myoutput <- rbind(cbind(round(A,3),round(C,3),round(E,3)),cbind(round(a2,3),round(c2,3),round(e2,3)),cbind("chisq","DF","p"),cbind(round(mychi_ACE,3),mychi_DF,round(mychi_p,3)))
# Round is used here simply to keep output to 3 decimal places

myoutput <- data.frame(myoutput, row.names=c("Unstandarized Var Comp","Standardized Var Comp","","Model fit"))
# Writes the output into a data frame which allows row and col labels
names(myoutput)<-c("A", "C", "E")
myoutput
# Print the table on screen
#---------------------------------------------------------------------------------------------------------------------------

The likelihood of the ACE model can readily be compared to the likelihood of nested submodels, i.e., we can try dropping the A or C terms to see if there is significant deterioration of fit. If terms can be dropped without significantly altering model fit (assessed by comparing chi square of models with the term included or excluded), then we prefer the more parsimonious model with fewer estimated paramters.

We shall illustrate this by modifying the ACE model to drop the free parameter for c. We do this by making the start value zero and fixing it by setting 'free = FALSE'.

Save the ACE script and then make a modified copy by saving it as AE script. It is a good idea to do this before you make any changes!  It is all too easy to modify a working script and find you have lost the original version. To run the AE model, you need to just make the following changes to the section where matrix 'Y' is defined.

mxMatrix(type="Full",

nrow=1,

ncol=1,

free=FALSE, #fix value of this parameter

values=0,    # set start value to zero

label="c",

name="Y"),

In addition, you should use search/replace (from the Edit option on menu bar)  to alter instances of ACE to AE in the script. N.B. it is not advisable to do use "Replace all", which does an automatic replacement, because the search term may occur in contexts that you don't want to change – e.g you could have the word 'trace' in your script, and you would not want it changed to 'trae'.

You should be able to run the AE model and test its significance, either by scrutinizing the summary output, and/or by explicitly computing chi square relative to the saturated model. You can do this by modifying the ACE model output script. Furthermore, when you have nested models, as in this case, you can directly compare one model with another; thus you can subtract -2LL values and their DFs for ACE and AE models, to obtain a chi square value that tells you whether there is significant deterioration of fit when the C path is dropped. In general, the outcome of this comparison will reflect the size of the c path and its confidence interval; if the confidence interval for the path does not span zero, then it won't be possible to drop the path without significant loss of fit.

You should now be able to make sense of some of the demonstration scripts that are provided with OpenMx. You can find these in the 'demo' folder. Try the script called UnivariateTwinAnalysis, which puts together all the pieces that you have covered in this section: computing likelihood for a saturated model, comparing it with the ACE model, and then comparing with the AE model.

Note that in this script, it is not necessary to copy the whole ACE model and then fix the 'c' path. Instead, one can add a simple command to modify the model as follows:

twinAEModel <- twinACEModel
twinAEModel\$twinACE\$Y <- mxMatrix("Full", 1, 1, F, 0, "c", name="Y")  # drop c

The first command creates twinAEModel which is identical to twinACEModel. The second command respecifies the defintion of the Y matrix, to make 'free = false' and to set the start value to zero.