Products & Services Solutions Academia Support User Community Company

Learn more about Statistics Toolbox   

Nonlinear Regression

Nonlinear Regression Models

The models described in Linear Regression Models are often called empirical models, because they are based solely on observed data. Model parameters typically have no relationship to any mechanism producing the data. To increase the accuracy of a linear model within the range of observations, the number of terms is simply increased.

Nonlinear models, on the other hand, typically involve parameters with specific physical interpretations. While they require a priori assumptions about the data-producing process, they are often more parsimonious than linear models, and more accurate outside the range of observed data.

Parametric nonlinear models represent the relationship between a continuous response variable and one or more predictor variables (either continuous or categorical) in the form y = f(X, β) + ε, where

Nonparametric models do not attempt to characterize the relationship between predictors and response with model parameters. Descriptions are often graphical, as in the case of Regression Trees.

Parametric Models

A Parametric Nonlinear Model

The Hougen-Watson model (Bates and Watts, [2], pp. 271–272) for reaction kinetics is an example of a parametric nonlinear model. The form of the model is

where rate is the reaction rate, x1, x2, and x3 are concentrations of hydrogen, n-pentane, and isopentane, respectively, and β1, β2, ... , β5 are the unknown parameters.

The file reaction.mat contains simulated reaction data:

load reaction         

The variables are:

The function for the model is hougen, which looks like this:

type hougen

function yhat = hougen(beta,x)
%HOUGEN Hougen-Watson model for reaction kinetics.
%   YHAT = HOUGEN(BETA,X) gives the predicted values of the
%   reaction rate, YHAT, as a function of the vector of 
%   parameters, BETA, and the matrix of data, X.
%   BETA must have five elements and X must have three
%   columns.
%
%   The model form is:
%   y = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3)

b1 = beta(1);
b2 = beta(2);
b3 = beta(3);
b4 = beta(4);
b5 = beta(5);

x1 = x(:,1);
x2 = x(:,2);
x3 = x(:,3);

yhat = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3);

The function nlinfit is used to find least-squares parameter estimates for nonlinear models. It uses the Gauss-Newton algorithm with Levenberg-Marquardt modifications for global convergence.

nlinfit requires the predictor data, the responses, and an initial guess of the unknown parameters. It also requires a function handle to a function that takes the predictor data and parameter estimates and returns the responses predicted by the model.

To fit the reaction data, call nlinfit using the following syntax:

load reaction
betahat = nlinfit(reactants,rate,@hougen,beta)
betahat =
    1.2526
    0.0628
    0.0400
    0.1124
    1.1914

The output vector betahat contains the parameter estimates.

The function nlinfit has robust options, similar to those for robustfit, for fitting nonlinear models to data with outliers.

Confidence Intervals for Parameter Estimates

To compute confidence intervals for the parameter estimates, use the function nlparci, together with additional outputs from nlinfit:

[betahat,resid,J] = nlinfit(reactants,rate,@hougen,beta);
betaci = nlparci(betahat,resid,J)
betaci =
   -0.7467    3.2519
   -0.0377    0.1632
   -0.0312    0.1113
   -0.0609    0.2857
   -0.7381    3.1208

The columns of the output betaci contain the lower and upper bounds, respectively, of the (default) 95% confidence intervals for each parameter.

Confidence Intervals for Predicted Responses

The function nlpredci is used to compute confidence intervals for predicted responses:

[yhat,delta] = nlpredci(@hougen,reactants,betahat,resid,J);
opd = [rate yhat delta]
opd =
    8.5500    8.4179    0.2805
    3.7900    3.9542    0.2474
    4.8200    4.9109    0.1766
    0.0200   -0.0110    0.1875
    2.7500    2.6358    0.1578
   14.3900   14.3402    0.4236
    2.5400    2.5662    0.2425
    4.3500    4.0385    0.1638
   13.0000   13.0292    0.3426
    8.5000    8.3904    0.3281
    0.0500   -0.0216    0.3699
   11.3200   11.4701    0.3237
    3.1300    3.4326    0.1749

The output opd contains the observed rates in the first column and the predicted rates in the second column. The (default) 95% simultaneous confidence intervals on the predictions are the values in the second column ± the values in the third column. These are not intervals for new observations at the predictors, even though most of the confidence intervals do contain the original observations.

Interactive Nonlinear Parametric Regression

Calling nlintool opens a graphical user interface (GUI) for interactive exploration of multidimensional nonlinear functions, and for fitting parametric nonlinear models. The GUI calls nlinfit, and requires the same inputs. The interface is analogous to polytool and rstool for polynomial models.

Open nlintool with the reaction data and the hougen model by typing

load reaction
nlintool(reactants,rate,@hougen,beta,0.01,xn,yn)

You see three plots. The response variable for all plots is the reaction rate, plotted in green. The red lines show confidence intervals on predicted responses. The first plot shows hydrogen as the predictor, the second shows n-pentane, and the third shows isopentane.

Each plot displays the fitted relationship of the reaction rate to one predictor at a fixed value of the other two predictors. The fixed values are in the text boxes below each predictor axis. Change the fixed values by typing in a new value or by dragging the vertical lines in the plots to new positions. When you change the value of a predictor, all plots update to display the model at the new point in predictor space.

While this example uses only three predictors, nlintool can accommodate any number of predictors.

Mixed-Effects Models

Introduction

In statistics, an effect is anything that influences the value of a response variable at a particular setting of the predictor variables. Effects are translated into model parameters. In linear models, effects become coefficients, representing the proportional contributions of model terms. In nonlinear models, effects often have specific physical interpretations, and appear in more general nonlinear combinations.

Fixed effects represent population parameters, assumed to be the same each time data is collected. Estimating fixed effects is the traditional domain of regression modeling. Random effects, by comparison, are sample-dependent random variables. In modeling, random effects act like additional error terms, and their distributions and covariances must be specified.

For example, consider a model of the elimination of a drug from the bloodstream. The model uses time t as a predictor and the concentration of the drug C as the response. The nonlinear model term C0ert combines parameters C0 and r, representing, respectively, an initial concentration and an elimination rate. If data is collected across multiple individuals, it is reasonable to assume that the elimination rate is a random variable ri depending on individual i, varying around a population mean . The term C0ert becomes

where β = is a fixed effect and bi = is a random effect.

Random effects are useful when data falls into natural groups. In the drug elimination model, the groups are simply the individuals under study. More sophisticated models might group data by an individual's age, weight, diet, etc. Although the groups are not the focus of the study, adding random effects to a model extends the reliability of inferences beyond the specific sample of individuals.

Mixed-effects models account for both fixed and random effects. As with all regression models, their purpose is to describe a response variable as a function of the predictor variables. Mixed-effects models, however, recognize correlations within sample subgroups. In this way, they provide a compromise between ignoring data groups entirely and fitting each group with a separate model.

Mixed-Effects Model Hierarchy

Suppose data for a nonlinear regression model falls into one of m distinct groups i = 1, ..., m. To account for the groups in a model, write response j in group i as:

yij is the response, xij is a vector of predictors, φ is a vector of model parameters, and εij is the measurement or process error. The index j ranges from 1 to ni, where ni is the number of observations in group i. The function f specifies the form of the model. Often, xij is simply an observation time tij. The errors are usually assumed to be independent and identically, normally distributed, with constant variance.

Estimates of the parameters in φ describe the population, assuming those estimates are the same for all groups. If, however, the estimates vary by group, the model becomes

In a mixed-effects model, φi may be a combination of a fixed and a random effect:

The random effects bi are usually described as multivariate normally distributed, with mean zero and covariance Ψ. Estimating the fixed effects β and the covariance of the random effects Ψ provides a description of the population that does not assume the parameters φi are the same across groups. Estimating the random effects bi also gives a description of specific groups within the data.

Model parameters do not have to be identified with individual effects. In general, design matrices A and B are used to identify parameters with linear combinations of fixed and random effects:

If the design matrices differ among groups, the model becomes

If the design matrices also differ among observations, the model becomes

Some of the group-specific predictors in xij may not change with observation j. Calling those vi, the model becomes

Specifying Mixed-Effects Models

Suppose data for a nonlinear regression model falls into one of m distinct groups i = 1, ..., m. (Specifically, suppose that the groups are not nested.) To specify a general nonlinear mixed-effects model for this data:

  1. Define group-specific model parameters φi as linear combinations of fixed effects β and random effects bi.

  2. Define response values yi as a nonlinear function f of the parameters and group-specific predictor variables Xi.

The model is:

This formulation of the nonlinear mixed-effects model uses the following notation:

φiA vector of group-specific model parameters
βA vector of fixed effects, modeling population parameters
biA vector of multivariate normally distributed group-specific random effects
AiA group-specific design matrix for combining fixed effects
BiA group-specific design matrix for combining random effects
XiA data matrix of group-specific predictor values
yiA data vector of group-specific response values
fA general, real-valued function of φi and Xi
εiA vector of group-specific errors, assumed to be independent, identically, normally distributed, and independent of bi
ΨA covariance matrix for the random effects
σ2The error variance, assumed to be constant across observations

For example, consider a model of the elimination of a drug from the bloodstream. The model incorporates two overlapping phases:

For data on multiple individuals i, the model is

where yij is the observed concentration in individual i at time tij. The model allows for different sampling times and different numbers of observations for different individuals.

The elimination rates rpi and rqi must be positive to be physically meaningful. Enforce this by introducing the log rates Rpi = log(rpi) and Rqi = log(rqi) and reparametrizing the model:

Choosing which parameters to model with random effects is an important consideration when building a mixed-effects model. One technique is to add random effects to all parameters, and use estimates of their variances to determine their significance in the model. An alternative is to fit the model separately to each group, without random effects, and look at the variation of the parameter estimates. If an estimate varies widely across groups, or if confidence intervals for each group have minimal overlap, the parameter is a good candidate for a random effect.

To introduce fixed effects β and random effects bi for all model parameters, reexpress the model as follows:

In the notation of the general model:

where ni is the number of observations of individual i. In this case, the design matrices Ai and Bi are, at least initially, 4-by-4 identity matrices. Design matrices may be altered, as necessary, to introduce weighting of individual effects, or time dependency.

Fitting the model and estimating the covariance matrix Ψ often leads to further refinements. A relatively small estimate for the variance of a random effect suggests that it can be removed from the model. Likewise, relatively small estimates for covariances among certain random effects suggests that a full covariance matrix is unnecessary. Since random effects are unobserved, Ψ must be estimated indirectly. Specifying a diagonal or block-diagonal covariance pattern for Ψ can improve convergence and efficiency of the fitting algorithm.

Statistics Toolbox functions nlmefit and nlmefitsa fit the general nonlinear mixed-effects model to data, estimating the fixed and random effects. The functions also estimate the covariance matrix Ψ for the random effects. Additional diagnostic outputs allow you to assess tradeoffs between the number of model parameters and the goodness of fit.

Specifying Covariate Models

If the model in Specifying Mixed-Effects Models assumes a group-dependent covariate such as weight (w) the model becomes:

Thus, the parameter φi for any individual in the ith group is:

To specify a covariate model, use the 'FEGroupDesign' option.

'FEGroupDesign' is a p-by-q-by-m array specifying a different p-by-q fixed-effects design matrix for each of the m groups. Using the previous example, the array resembles the following:

  1. Create the array.

    % Number of parameters in the model (Phi)
    num_params = 3;
    % Number of covariates
    num_cov = 1;
    % Assuming number of groups in the data set is 7
    num_groups = 7;
    % Array of covariate values
    covariates = [75; 52; 66; 55; 70; 58; 62 ];
    A = repmat(eye(num_params, num_params+num_cov),...
    [1,1,num_groups]);
    A(1,num_params+1,1:num_groups) = covariates(:,1)
  2. Create a struct with the specified design matrix.

    options.FEGroupDesign = A; 
    
  3. Specify the arguments for nlmefit (or nlmefitsa) as shown in Example: Mixed-Effects Models Using nlmefit and nlmefitsa.

Choosing nlmefit or nlmefitsa

Statistics Toolbox provides two functions, nlmefit and nlmefitsa for fitting non-linear mixed-effects models. Each function provides different capabilities, which may help you decide which to use.

Approximation Methods.  nlmefit provides the following four approximation methods for fitting non-linear mixed-effects models:

nlmefitsa provides an additional approximation method, Stochastic Approximation Expectation-Maximization (SAEM) [24] with three steps :

  1. Simulation: Generate simulated values of the random effects b from the posterior density p(b|Σ) given the current parameter estimates.

  2. Stochastic approximation: Update the expected value of the log likelihood function by taking its value from the previous step, and moving part way toward the average value of the log likelihood calculated from the simulated random effects.

  3. Maximization step: Choose new parameter estimates to maximize the log likelihood function given the simulated values of the random effects.

Both nlmefit and nlmefitsa attempt to find parameter estimates to maximize a likelihood function, which is difficult to compute. nlmefit deals with the problem by approximating the likelihood function in various ways, and maximizing the approximate function. It uses traditional optimization techniques that depend on things like convergence criteria and iteration limits.

nlmefitsa, on the other hand, simulates random values of the parameters in such a way that in the long run they converge to the values that maximize the exact likelihood function. The results are random, and traditional convergence tests don't apply. Therefore nlmefitsa provides options to plot the results as the simulation progresses, and to re-start the simulation multiple times. You can use these features to judge whether the results have converged to the accuracy you desire.

Parameters Specific to nlmefitsa.  The following parameters are specific to nlmefitsa. Most control the stochastic algorithm.

Model and Data Requirements.  There are some differences in the capabilities of nlmefit and nlmefitsa. Therefore some data and models are usable with either function, but some may require you to choose just one of them.

In practice, nlmefitsa tends to be more robust, and less likely to fail on difficult problems. However, nlmefit may converge faster on problems where it converges at all. Some problems may benefit from a combined strategy, for example by running nlmefitsa for a while to get reasonable parameter estimates, and using those as a starting point for additional iterations using nlmefit.

Using Output Functions with Mixed-Effects Models

The Outputfcn field of the options structure specifies one or more functions that the solver calls after each iteration. Typically, you might use an output function to plot points at each iteration or to display optimization quantities from the algorithm. To set up an output function:

  1. Write the output function as a MATLAB file function or subfunction.

  2. Use statset to set the value of Outputfcn to be a function handle, that is, the name of the function preceded by the @ sign. For example, if the output function is outfun.m, the command

     options = statset('OutputFcn', @outfun);

    specifies OutputFcn to be the handle to outfun. To specify multiple output functions, use the syntax:

     options = statset('OutputFcn',{@outfun, @outfun2});
  3. Call the optimization function with options as an input argument.

For an example of an output function, see Sample Output Function.

Structure of the Output Function.  The function definition line of the output function has the following form:

stop = outfun(beta,status,state)

where

The solver passes the values of the input arguments to outfun at each iteration.

Fields in status.  The following table lists the fields of the status structure:

FieldDescription
procedure
  • 'ALT' — alternating algorithm for the optimization of the linear mixed effects or restricted linear mixed effects approximations

  • 'LAP' — optimization of the Laplacian approximation for first order or first order conditional estimation

iterationAn integer starting from 0.
innerA structure describing the status of the inner iterations within the ALT and LAP procedures, with the fields:
  • procedure — When procedure is 'ALT':

    • 'PNLS' (penalized non-linear least squares)

    • 'LME' (linear mixed-effects estimation)

    • 'none'

    When procedure is 'LAP',

    • 'PNLS' (penalized non-linear least squares)

    • 'PLM' (profiled likelihood maximization)

    • 'none'

  • state — one of the following:

    • 'init'

    • 'iter'

    • 'done'

    • 'none'

  • iteration — an integer starting from 0, or NaN. For nlmefitsa with burn-in iterations, the output function is called after each of those iterations with a negative value for STATUS.iteration.

fvalThe current log likelihood
PsiThe current random-effects covariance matrix
thetaThe current parameterization of Psi
mseThe current error variance

States of the Algorithm.  The following table lists the possible values for state:

stateDescription

'init'

The algorithm is in the initial state before the first iteration.

'iter'

The algorithm is at the end of an iteration.

'done'

The algorithm is in the final state after the last iteration.

The following code illustrates how the output function might use the value of state to decide which tasks to perform at the current iteration:

switch state
    case 'iter'
          % Make updates to plot or guis as needed
    case 'init'
          % Setup for plots or guis
    case 'done'
          % Cleanup of plots, guis, or final plot
otherwise
end

Stop Flag.  The output argument stop is a flag that is true or false. The flag tells the solver whether it should quit or continue. The following examples show typical ways to use the stop flag.

Stopping an Optimization Based on Intermediate Results.  

The output function can stop the estimation at any iteration based on the values of arguments passed into it. For example, the following code sets stop to true based on the value of the log likelihood stored in the 'fval'field of the status structure:

stop = outfun(beta,status,state)
stop = false;
% Check if loglikelihood is more than 132.
if status.fval > -132
    stop = true;
end
Stopping an Iteration Based on GUI Input.  

If you design a GUI to perform nlmefit iterations, you can make the output function stop when a user clicks a Stop button on the GUI. For example, the following code implements a dialog to cancel calculations:

function retval = stop_outfcn(beta,str,status)
persistent h stop;
if isequal(str.inner.state,'none')
    switch(status)
        case 'init'
            % Initialize dialog
            stop = false;
            h = msgbox('Press STOP to cancel calculations.',...
                'NLMEFIT: Iteration 0 ');
            button = findobj(h,'type','uicontrol');
            set(button,'String','STOP','Callback',@stopper)
            pos = get(h,'Position');
            pos(3) = 1.1 * pos(3);
            set(h,'Position',pos)
            drawnow
        case 'iter'
            % Display iteration number in the dialog title
            set(h,'Name',sprintf('NLMEFIT: Iteration %d',...
                str.iteration))
            drawnow;
        case 'done'
            % Delete dialog
            delete(h);
    end
end
if stop
    % Stop if the dialog button has been pressed
    delete(h)
end
retval = stop;
 
    function stopper(varargin)
        % Set flag to stop when button is pressed
        stop = true;
        disp('Calculation stopped.')
    end
end

Sample Output Function.  nmlefitoutputfcn is the sample Statistics Toolbox output function for nlmefit and nlmefitsa. It initializes or updates a plot with the fixed-effects (BETA) and variance of the random effects (diag(STATUS.Psi)). For nlmefit, the plot also includes the log-likelihood (STATUS.fval).

nlmefitoutputfcn is the default output function for nlmefitsa. To use it with nlmefit, specify a function handle for it in the options structure:

opt = statset('OutputFcn', @nlmefitoutputfcn, …)
beta = nlmefit(…, 'Options', opt, …)

To prevent nlmefitsa from using of this function, specify an empty value for the output function:

opt = statset('OutputFcn', [], …)
beta = nlmefitsa(…, 'Options', opt, …)

nlmefitoutputfcn stops nlmefit or nlmefitsa if you close the figure that it produces.

Example: Mixed-Effects Models Using nlmefit and nlmefitsa

The following example also works with nlmefitsa in place of nlmefit.

The data in indomethacin.mat records concentrations of the drug indomethacin in the bloodstream of six subjects over eight hours:

load indomethacin

gscatter(time,concentration,subject)
xlabel('Time (hours)')
ylabel('Concentration (mcg/ml)')
title('{\bf Indomethacin Elimination}')
hold on

Specifying Mixed-Effects Models discusses a useful model for this type of data. Construct the model via an anonymous function as follows:

model = @(phi,t)(phi(1)*exp(-exp(phi(2))*t) + ...
                 phi(3)*exp(-exp(phi(4))*t));

Use the nlinfit function to fit the model to all of the data, ignoring subject-specific effects:

phi0 = [1 1 1 1];
[phi,res] = nlinfit(time,concentration,model,phi0);

numObs = length(time);
numParams = 4;
df = numObs-numParams;
mse = (res'*res)/df
mse =
    0.0304

tplot = 0:0.01:8;
plot(tplot,model(phi,tplot),'k','LineWidth',2)
hold off

A boxplot of residuals by subject shows that the boxes are mostly above or below zero, indicating that the model has failed to account for subject-specific effects:

colors = 'rygcbm';
h = boxplot(res,subject,'colors',colors,'symbol','o');
set(h(~isnan(h)),'LineWidth',2)
hold on
boxplot(res,subject,'colors','k','symbol','ko')
grid on
xlabel('Subject')
ylabel('Residual')
hold off

To account for subject-specific effects, fit the model separately to the data for each subject:

phi0 = [1 1 1 1];
PHI = zeros(4,6);
RES = zeros(11,6);
for I = 1:6
    tI = time(subject == I);
    cI = concentration(subject == I);
    [PHI(:,I),RES(:,I)] = nlinfit(tI,cI,model,phi0);
end

PHI
PHI =
    0.1915    0.4989    1.6757    0.2545    3.5661    0.9685
   -1.7878   -1.6354   -0.4122   -1.6026    1.0408   -0.8731
    2.0293    2.8277    5.4683    2.1981    0.2915    3.0023
    0.5794    0.8013    1.7498    0.2423   -1.5068    1.0882

numParams = 24;
df = numObs-numParams;
mse = (RES(:)'*RES(:))/df
mse =
    0.0057

gscatter(time,concentration,subject)
xlabel('Time (hours)')
ylabel('Concentration (mcg/ml)')
title('{\bf Indomethacin Elimination}')
hold on
for I = 1:6
    plot(tplot,model(PHI(:,I),tplot),'Color',colors(I))
end
axis([0 8 0 3.5])
hold off

PHI gives estimates of the four model parameters for each of the six subjects. The estimates vary considerably, but taken as a 24-parameter model of the data, the mean-squared error of 0.0057 is a significant reduction from 0.0304 in the original four-parameter model.

A boxplot of residuals by subject shows that the larger model accounts for most of the subject-specific effects:

h = boxplot(RES,'colors',colors,'symbol','o');
set(h(~isnan(h)),'LineWidth',2)
hold on
boxplot(RES,'colors','k','symbol','ko')
grid on
xlabel('Subject')
ylabel('Residual')
hold off

The spread of the residuals (the vertical scale of the boxplot) is much smaller than in the previous boxplot, and the boxes are now mostly centered on zero.

While the 24-parameter model successfully accounts for variations due to the specific subjects in the study, it does not consider the subjects as representatives of a larger population. The sampling distribution from which the subjects are drawn is likely more interesting than the sample itself. The purpose of mixed-effects models is to account for subject-specific variations more broadly, as random effects varying around population means.

Use the nlmefit function to fit a mixed-effects model to the data.

The following anonymous function, nlme_model, adapts the four-parameter model used by nlinfit to the calling syntax of nlmefit by allowing separate parameters for each individual. By default, nlmefit assigns random effects to all the model parameters. Also by default, nlmefit assumes a diagonal covariance matrix (no covariance among the random effects) to avoid overparametrization and related convergence issues.

nlme_model = @(PHI,t)(PHI(:,1).*exp(-exp(PHI(:,2)).*t) + ...
                      PHI(:,3).*exp(-exp(PHI(:,4)).*t));

phi0 = [1 1 1 1];
[phi,PSI,stats] = nlmefit(time,concentration,subject, ...
                          [],nlme_model,phi0)
phi =
    0.4606
   -1.3459
    2.8277
    0.7729
PSI =
    0.0124         0         0         0
         0    0.0000         0         0
         0         0    0.3264         0
         0         0         0    0.0250
stats = 
      logl: 54.5884
       mse: 0.0066
       aic: -91.1767
       bic: -71.4698
    sebeta: NaN
       dfe: 57

The mean-squared error of 0.0066 is comparable to the 0.0057 of the 24-parameter model without random effects, and significantly better than the 0.0304 of the four-parameter model without random effects.

The estimated covariance matrix PSI shows that the variance of the second random effect is essentially zero, suggesting that you can remove it to simplify the model. To do this, use the REParamsSelect parameter to specify the indices of the parameters to be modeled with random effects in nlmefit:

[phi,PSI,stats] = nlmefit(time,concentration,subject, ...
                          [],nlme_model,phi0, ...
                          'REParamsSelect',[1 3 4])
phi =
    0.4606
   -1.3460
    2.8277
    0.7729
PSI =
    0.0124         0         0
         0    0.3270         0
         0         0    0.0250
stats = 
      logl: 54.5876
       mse: 0.0066
       aic: -93.1752
       bic: -75.6580
    sebeta: NaN
       dfe: 58

The log-likelihood logl is almost identical to what it was with random effects for all of the parameters, the Akaike information criterion aic is reduced from -91.1767 to -93.1752, and the Bayesian information criterion bic is reduced from -71.4698 to -75.6580. These measures support the decision to drop the second random effect.

Refitting the simplified model with a full covariance matrix allows for identification of correlations among the random effects. To do this, use the CovPattern parameter to specify the pattern of nonzero elements in the covariance matrix:

[phi,PSI,stats] = nlmefit(time,concentration,subject, ...
                          [],nlme_model,phi0, ...
                          'REParamsSelect',[1 3 4], ...
                          'CovPattern',ones(3))
phi =
    0.5613
   -1.1407
    2.8148
    0.8293
PSI =
    0.0236    0.0500    0.0032
    0.0500    0.4768    0.1152
    0.0032    0.1152    0.0321
stats = 
      logl: 58.4731
       mse: 0.0061
       aic: -94.9462
       bic: -70.8600
    sebeta: NaN
       dfe: 55

The estimated covariance matrix PSI shows that the random effects on the last two parameters have a relatively strong correlation, and both have a relatively weak correlation with the first random effect. This structure in the covariance matrix is more apparent if you convert PSI to a correlation matrix using corrcov:

RHO = corrcov(PSI)
RHO =
    1.0000    0.4707    0.1179
    0.4707    1.0000    0.9316
    0.1179    0.9316    1.0000

clf; imagesc(RHO)
set(gca,'XTick',[1 2 3],'YTick',[1 2 3])
title('{\bf Random Effect Correlation}')
h = colorbar;
set(get(h,'YLabel'),'String','Correlation');

Incorporate this structure into the model by changing the specification of the covariance pattern to block-diagonal:

P = [1 0 0;0 1 1;0 1 1] % Covariance pattern
P =
     1     0     0
     0     1     1
     0     1     1

[phi,PSI,stats,b] = nlmefit(time,concentration,subject, ...
                            [],nlme_model,phi0, ...
                            'REParamsSelect',[1 3 4], ...
                            'CovPattern',P)
phi =
    0.5850
   -1.1087
    2.8056
    0.8476
PSI =
    0.0331         0         0
         0    0.4793    0.1069
         0    0.1069    0.0294
stats = 
      logl: 57.4996
       mse: 0.0061
       aic: -96.9992
       bic: -77.2923
    sebeta: NaN
       dfe: 57
b =
   -0.2438    0.0723    0.2014    0.0592   -0.2181    0.1289
   -0.8500   -0.1237    0.9538   -0.7267    0.5895    0.1571
   -0.1591    0.0033    0.1568   -0.2144    0.1834    0.0300

The block-diagonal covariance structure reduces aic from -94.9462 to -96.9992 and bic from -70.8600 to -77.2923 without significantly affecting the log-likelihood. These measures support the covariance structure used in the final model.

The output b gives predictions of the three random effects for each of the six subjects. These are combined with the estimates of the fixed effects in phi to produce the mixed-effects model.

Use the following commands to plot the mixed-effects model for each of the six subjects. For comparison, the model without random effects is also shown.

PHI = repmat(phi,1,6) + ...                 % Fixed effects
      [b(1,:);zeros(1,6);b(2,:);b(3,:)];    % Random effects
  
RES = zeros(11,6); % Residuals
colors = 'rygcbm';
for I = 1:6
    fitted_model = @(t)(PHI(1,I)*exp(-exp(PHI(2,I))*t) + ...
                        PHI(3,I)*exp(-exp(PHI(4,I))*t));
    tI = time(subject == I);
    cI = concentration(subject == I);
    RES(:,I) = cI - fitted_model(tI);
    
    subplot(2,3,I)
    scatter(tI,cI,20,colors(I),'filled')
    hold on
    plot(tplot,fitted_model(tplot),'Color',colors(I))
    plot(tplot,model(phi,tplot),'k')
    axis([0 8 0 3.5])
    xlabel('Time (hours)')
    ylabel('Concentration (mcg/ml)')
    legend(num2str(I),'Subject','Fixed')
end

If obvious outliers in the data (visible in previous box plots) are ignored, a normal probability plot of the residuals shows reasonable agreement with model assumptions on the errors:

clf; normplot(RES(:))

Regression Trees

Introduction

Parametric models specify the form of the relationship between predictors and a response, as in the Hougen-Watson model described in Parametric Models. In many cases, the form of the relationship is unknown, and a parametric model requires assumptions and simplifications. Regression trees offer a nonparametric alternative. When response data are categorical, classification trees are a natural modification.

Algorithm Reference.  The algorithms used by Statistics Toolbox classification and regression tree functions are based on those in Breiman, L., et al., Classification and Regression Trees, Chapman & Hall, Boca Raton, 1993.

Example: Regression Trees

This example uses the data on cars in carsmall.mat to create a regression tree for predicting mileage using measurements of weight and the number of cylinders as predictors. Note that, in this case, one predictor (weight) is continuous and the other (cylinders) is categorical. The response (mileage) is continuous.

Load the data and use the classregtree constructor of the classregtree class to create the regression tree:

load carsmall

t = classregtree([Weight, Cylinders],MPG,...
                 'cat',2,'splitmin',20,...
                 'names',{'W','C'})

t = 

Decision tree for regression
 1  if W<3085.5 then node 2 elseif W>=3085.5 then node 3 else 23.7181
 2  if W<2371 then node 4 elseif W>=2371 then node 5 else 28.7931
 3  if C=8 then node 6 elseif C in {4 6} then node 7 else 15.5417
 4  if W<2162 then node 8 elseif W>=2162 then node 9 else 32.0741
 5  if C=6 then node 10 elseif C=4 then node 11 else 25.9355
 6  if W<4381 then node 12 elseif W>=4381 then node 13 else 14.2963
 7  fit = 19.2778
 8  fit = 33.3056
 9  fit = 29.6111
10  fit = 23.25
11  if W<2827.5 then node 14 elseif W>=2827.5 then node 15 else 27.2143
12  if W<3533.5 then node 16 elseif W>=3533.5 then node 17 else 14.8696
13  fit = 11
14  fit = 27.6389
15  fit = 24.6667
16  fit = 16.6
17  fit = 14.3889

t is a classregtree object and can be operated on with any of the methods of the class.

Use the type method of the classregtree class to show the type of the tree:

treetype = type(t)
treetype =
regression

classregtree creates a regression tree because MPG is a numerical vector, and the response is assumed to be continuous.

To view the tree, use the view method of the classregtree class:

view(t)

The tree predicts the response values at the circular leaf nodes based on a series of questions about the car at the triangular branching nodes. A true answer to any question follows the branch to the left; a false follows the branch to the right.

Use the tree to predict the mileage for a 2000-pound car with either 4, 6, or 8 cylinders:

mileage2K = t([2000 4; 2000 6; 2000 8])
mileage2K =
   33.3056
   33.3056
   33.3056

Note that the object allows for functional evaluation, of the form t(X). This is a shorthand way of calling the eval method of the classregtree class.

The predicted responses computed above are all the same. This is because they follow a series of splits in the tree that depend only on weight, terminating at the left-most leaf node in the view above. A 4000-pound car, following the right branch from the top of the tree, leads to different predicted responses:

mileage4K = t([4000 4; 4000 6; 4000 8])
mileage4K =
   19.2778
   19.2778
   14.3889

You can use a variety of other methods of the classregtree class, such as cutvar, cuttype, and cutcategories, to get more information about the split at node 3 that distinguishes the 8-cylinder car:

var3 = cutvar(t,3) % What variable determines the split?
var3 = 
    'C'

type3 = cuttype(t,3) % What type of split is it?
type3 = 
    'categorical'

c = cutcategories(t,3) % Which classes are sent to the left 
                       % child node, and which to the right?
c = 
    [8]    [1x2 double]
c{1}
ans =
     8
c{2}
ans =
     4     6

Regression trees fit the original (training) data well, but may do a poor job of predicting new values. Lower branches, especially, may be strongly affected by outliers. A simpler tree often avoids over-fitting. To find the best regression tree, employing the techniques of resubstitution and cross-validation, use the test method of the classregtree class.

  


 © 1984-2010- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS