Interpretation of lung function test parameters is usually based on comparisons of data with reference (predicted) values based on healthy subjects. Predicted values are obtained from studies of “normal” or “healthy” subjects with similar anthropometric and ethnic characteristics. Regression models are generally used to obtain the reference values from measurements observed in a representative sample of healthy subjects.

The study aims to carry out a statistical evaluation of the Indian prediction models of lung function parameters and critically evaluate the reference values for the same in an Indian context.

The screening and inclusion of the articles for the study was done using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. Evaluation of the prediction models has been done with respect to modeling approach, regression diagnostics and methodology protocol. The suitability of the models has also been evaluated using a checklist comprising of 8 criteria developed using the American Thoracic Society (ATS) guidelines.

Using the PRISMA guidelines 32 articles with a total sample size of 25,289 subjects were included in the final synthesis. Multiple linear regression models were used in 27 articles, with one additionally using weighted least squares technique and 4 using step-wise regression method. Regression diagnostics as per the ATS guidelines were performed and reported by 22 articles. The prediction models were traditionally developed using ordinary least squares method (OLS) without examining the homoskedasticity of residuals. The quality assessment using the checklist developed revealed that only 5 articles satisfied more than 7 out of 8 criteria, and a further 8 articles satisfied less than 3 criteria of suitability of prediction models.

Indian prediction models for lung function models are traditionally based on linear regression models, however with more advancement in computational power for sophisticated statistical techniques, more robust prediction models are required in the Indian context.

Lung function tests play a vital role in diagnosing respiratory diseases such as asthma or chronic obstructive pulmonary disease (COPD), assessing disease severity and monitoring treatment responses [

Lung function capacity increases with age in childhood (due to growth and maturation) and declines with age in adulthood (due to loss of elastic recoil). It also depends on height as a proxy for chest size [

Lung function parameters are generally found to be close to Gaussian distribution in the middle-age groups, but not for extreme values. The distributions of flow measurements and ratio measurements in lung function values are generally not symmetric [

Prediction models using regression equations provide an efficient and economical method for describing the expected values of pulmonary parameters as a function of sex, height and age [

Evaluation of how well the regression model fits the data is done using regression diagnostic techniques. Predictive model fitting is considered as incomplete without running the regression diagnostics [

The relationship between body size and lung function is of complex nature, especially during periods of rapid growth in human body [

The present study has been undertaken with the aim to carry out a statistical evaluation of the Indian prediction models of lung function parameters and to critically evaluate the reference values for the same in Indian context. The study will evaluate the statistical approach of prediction modeling, the parameters of regression diagnostics reported and the measurement of lung function parameters. The study will contribute to identifying the limitations and gap areas of the present Indian prediction models and identify further avenues of research in improving the methodology and application of prediction models in the Indian context.

We searched the publications relating to Indian prediction equations for lung function parameters listed in the electronic database PubMed (source:

The included articles were screened and crosschecked independently by authors for relevance and suitability. The percentage of agreement between the authors on the quality of the articles ranged between 90–100%. All the disagreements were resolved by consensus among the authors. The references from the selected publications were also screened, and relevant articles were included in the analysis.

The search was limited only to articles in English. The search was limited to PubMed and Google Scholar due to the non-accessibility of Medline and Embase. The titles of the articles were first screened for possible relevance and exclusion. All the remaining articles were then considered as relevant for potential screening. In the case of articles where the full texts were not available, efforts were made to obtain the full texts by contacting the corresponding authors and journals. The articles received after that communication were subsequently screened for possible inclusion.

Identification, screening, eligibility, inclusion of articles and meta-analysis for the study follows Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) [

The information abstracted from each of the selected articles included: author name, publication year, sample size, age group, gender, regression model, separate model for male and females, adjusted for smoking, reported regression diagnostic method, lung function parameters studied, instrument used and number of citations. The data for the citations of each article was also reviewed from Google Scholar (source:

The evaluation of the quality and suitability of the lung function prediction models was done using a checklist prepared using the recommendations of the American Thoracic Society [

Quality checklist for assessment of the suitability of prediction models as per the ATS guidelines.

S No | Assessment criteria |
---|---|

1 | Use of acceptable methods and equipment for measurement of lung function parameters |

2 | Adequately defined sample size for prediction models |

3 | Adequately described statistical methodology protocol for prediction equation generation |

4 | Reporting parameters of regression diagnostics |

5 | Validation of prediction models on independent study samples |

6 | Inclusion of age and height as independent predictor variables for lung function parameters |

7 | Separate prediction models for male and female subjects |

8 | Reporting lower limit of normal values or information regarding calculation of the same |

During the initial search using keywords, 1,068 titles related to keywords were retrieved. The abstracts of all the articles selected after initial search were evaluated for possible inclusion. Out of these, 348 articles were considered relevant, and the full texts of these were retrieved for detailed examination and scrutiny. Out of the 348 articles, 316 were subsequently excluded due to not being relevant to the analysis, and so 32 articles were included in the final study. The detailed procedure for the inclusion of articles is presented in Figure

Flowchart for inclusion of articles in the study.

The characteristics of the included articles (author name, publication year, sample size, age group, gender, regression model, separate model for male and females, regression diagnostic reported, pulmonary function parameters studied, instrument used and number of citations) in the final synthesis are shown in Table

Characteristics of the articles for Indian prediction models included in the analysis.

S No | Author | Sample size | Age group | Gender | Regression model | Separate model | Regression diagnostics | Pulmonary parameters | Instrument used | Detailed method | Quality Score | Citations |
---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | Parmar et al. 1977 | 595 | 6–16 years | Boys and Girls | Linear regression model | Yes | No coefficient reported | PEFR | Wright Peak Flow meter | No | 37.5 | 37 |

2 | Singh & Peri et al. 1978 | 663 | 4–16 years | Boys and Girls | Linear regression model | Yes | R^{2}, SEE reported |
PEFR | Wright Peak Flow meter | No | 37.5 | 17 |

3 | Singh et al. 1979 | 851 | 17–70 years | Males and Females | Linear regression model | Yes | R, SEE reported | PEFR | Peak Flow meter | No | 62.5 | 34 |

4 | Malik et al. 1981 | 605 | 5–16 years | Boys and Girls | Linear regression model | Yes | R^{2}, SEE reported |
PEFR | Wright Peak Flow meter | No | 50 | 30 |

5 | Gupta et al. 1982 | 1427 | 3–22+ years | Males and Females | Linear, Interaction, General, Proportional, Multiplicative, Polynomial | Yes | RSD reported, homogeneity of variance tested | PEFR | Wright Peak Flow meter | No | 75 | 31 |

6 | Aundhakar, et al. 1985 | 515 | 6–15 years | Boys and Girls | Linear regression model | Yes | R^{2} reported |
PEFR | Wright Peak Flow meter | No | 37.5 | 9 |

7 | Udwadia et al. 1987 | 760 | <30 years and> 30years | Males and Females | Multiple linear regression model | Yes | R, R^{2} and SD around the regression line reported (S_{y.x}). |
FEV_{1}, FVC, FEV_{1}/FVC, PEF, FEF25–75%, Vmax50%, Vmax75% |
Computerized spirometer | No | 87.5 | 37 |

8 | Chatterjee et al. 1988 | 334 | 20–60 years | Males | Multiple linear regression model | No | R, R2 and SEE | FEV_{1}, FVC, FEV_{1}/FVC, PEF, FEF25–75%, Vmax50%, Vmax75% |
Wright Peak Flow meter | Yes | 75 | 18 |

9 | Vijayan et al. 1990 | 247 | 15–40 years | Males and Females | Linear regression model | Yes | R^{2}, SEE reported |
FVC, FEV, FRC, TLC, VA | Spirometer | Yes | 87.5 | 80 |

10 | Dikshit et al. 1991 | 127 | 55–85 years | Males | Linear regression model | No | R coefficient reported. | PEFR | Wright Peak Flow meter | No | 25 | 16 |

11 | Mohan Rao et al. 1992 | 96 | 15–40 years | Males and Females | Multiple linear regression model | Yes | R, R^{2} and SEE |
FVC, FEV_{1}, FEF25–75%, PEFR |
Wright Peak Flow meter | No | 50 | 10 |

12 | Ray et al. 1993 | 2000 | 10–59 years | Males and Females | Multiple linear regression model | Yes | R^{2} reported |
PEFR | Wright Peak Flow meter | Yes | 87.5 | 23 |

13 | Swaminathan and Venkatesh et al. 1993 | 345 | 4–15 years | Boys and Girls | Linear regression model | Yes | No coefficient reported | PEFR | Wright Peak Flow meter | No | 50 | 51 |

14 | Chowgule, Shetye & Parmar et al. 1995 | 632 | 6–15 years | Boys and Girls | Linear regression model | Yes | R^{2} reported |
PEFR | Wright Peak Flow meter | No | 50 | 66 |

15 | Sharma et al. 1997 | 410 | 10–15 years | Boys and Girls | Linear regression model | Yes | R^{2} reported |
PEFR | Wright Peak Flow meter | No | 62.5 | 22 |

16 | Pande et al. 1997 | 1257 | 6–17 years | Boys and Girls | Linear regression model | Yes | R^{2}, SEE reported |
PEFR | Wright Peak Flow meter | No | 62.5 | 21 |

17 | Rajkapoor et al. 1997 | 186 | 6–13 years | Boys and Girls | Linear regression model | Yes | No coefficient reported | PEFR | Wright Peak Flow meter | No | 50 | 17 |

18 | Harikumaran et al. 1997 | 109 | 5–16 years | Boys | Multiple linear regression model | No | No coefficient reported | VC, IVC, FVC, FEV1, PEF, FEF, PIF, FMFT, MVV | Computerized spirometer | No | 25 | 28 |

19 | Vijayan et al. 2000 | 469 | 7–19 years | Boys and Girls | Linear regression model | Yes | No coefficient reported | PEFR | Wright Peak Flow meter | No | 50 | 66 |

20 | Verma SS et al. 2000 | 173 | 8–13 years | Boys and Girls | Linear regression model | No | R^{2}, SEE reported |
PEFR | Wright Peak Flow meter | No | 50 | 7 |

21 | Sitarama Raju et al. 2003 | 1555 | 5–15 years | Boys | Linear, quadratic, cubic or logarithmic | No | R, R^{2} and SE of estimate reported. |
FVC, FEV_{1}, PEFR, FEV_{1}/FVC |
Wright Peak Flow meter | Yes | 50 | 52 |

22 | Sitaram Raju et al. 2004 | 1038 | 5–15 years | Girls | Step-wise linear regression model | No | R^{2}, SEE reported |
FEV_{1}, FVC, PEFR, FEV_{1}/FVC |
Wright Peak Flow meter | Yes | 37.5 | 1 |

23 | Dikshit et al. 2005 | Not mentioned | <20 years to> 60 years | Males and Females | Linear regression model | Yes | R, SEE reported | PEFR | No | 75 | 50 | |

24 | Raju et al. 2005 | 2616 | 5–15 years | Boys and Girls | Step-wise linear regression model | Yes | R, F and SEE reported | FEV_{1}, FVC, PEFR |
Wright Peak Flow meter and Spirometer | Yes | 75 | 39 |

25 | Prasad et al. 2005 | 897 | 10–60 years | Males and Females | Step-wise linear regression model | Yes | R, R^{2} and RSD reported |
PEFR | Wright Peak Flow meter | Yes | 75 | 59 |

26 | Mathur et al. 2007 | 137 | 20–68 years | Males | Weighted least squares method | No | R^{2}, RSD reported |
PEFR | Wright Peak Flow meter | Yes | 62.5 | 29 |

27 | Saleem et al. 2011 | 3080 | 18–65 years | Males and Females | Multiple linear regression model | Yes | R^{2}, SEE reported |
FVC, FEV1, PEFR, FEF25–75 | Spirometer | No | 75 | 2 |

28 | Jacob et al. 2013 | 1165 | 5–17 years | Boys and Girls | Linear regression model | Yes | R, R^{2} reported |
PEFR | Not specified | No | 62.5 | 7 |

29 | Chhabra et al. 2014 | 685 | 18–60 years | Males and Females | Multiple linear regression procedure. Linear and non-linear models | Yes | R^{2}, Adjusted R^{2} and SEE reported |
FVC, FEV_{1}, PEFR, FEF_{25–75}, FEF_{50}, FEF_{75}, FEV_{1}/FVC |
Spirometer | Yes | 100 | 21 |

30 | Shivkumar et al. 2014 | 91 | 10–15 years | Boys | Step-wise linear regression model | No | Adjusted R^{2} reported |
FVC, FEV_{1}, FEV_{1}/FVC,MEF25, MEF50, MEF75, MMEF, PEF |
Spirometer | No | 14.28 | 0 |

31 | Dasgupta et al. 2015 | 706 | 15–69 years | Males and Females | Multiple linear regression model | Yes | R^{2}, SEE reported |
FEV1, FVC | Spirometer | Yes | 100 | 1 |

32 | Pramanik et al. 2015 | 1518 | 10–18 years | Boys | Linear regression model | No | No coefficient reported | FVC, FEV_{1}, PEFR, FEF25–75% |
Spirometer | No | 37.5 | 2 |

Among the 32 articles analyzed, 7 articles included only male subjects, 1 article included only female subjects and the remaining 24 articles included both male and female subjects for a prediction model for lung functions. Out of the 32 articles included in the analysis, 23 reported separate prediction models for male and female subjects, whereas the rest of the articles reported only one model for prediction for either gender. In terms of the spirometer used for measuring the lung function parameters, 22 articles reported using the Wright Peak Flow meter, 8 articles reported using Spirometer and no specific information regarding the instrument was mentioned in two articles.

Linear regression models were used for prediction of pulmonary parameters in 27 articles. In addition to linear models, non-linear models developed by Chabbra et al. [

Out of the 32 articles included, 26 articles reported regression diagnostics, and the other 6 articles did not report any regression diagnostic coefficient. Prediction models in 22 articles provided correlation coefficients and coefficient of determination for the equations developed, and 18 articles additionally provided the standard error of the estimate (SEE).

Only 10 articles defined the detailed methodology protocol of the prediction equations, which in statistical terms provides a basis for better evaluation of the prediction model. The minimum sample size required for multivariate regression analysis of the lung function parameters is 150 for validation of prediction models [

In 29 articles, the regression coefficients for prediction variables of lung function parameters were calculated using ordinary least square (OLS) approach without examining the homoskedasticity of residuals. The heteroskedasticity of residuals was examined by Prasad et al. [

In regression model, each data point must provide equally precise information about the deterministic part of the total variation (i.e., the standard deviation of the term must be constant over all values of the predictor variables). However, this assumption does not always hold well in case of OLS model. In such circumstances, precise estimates of regression coefficients were obtained using two approaches viz. transforming the data or using weights. The evaluation of the prediction models revealed that the prediction models using transformed variables were presented in 6 articles, and additionally, Mathur et al. used weighted least squares method [

The total number of citations for the articles included in the analysis was 882. The number of citations ranged from no citations for Shivkumar et al. [

The quality assessment of the prediction models using the checklist developed based on the ATS guidelines revealed that only 2 articles satisfied all the criteria of suitability of prediction models and 3 articles satisfied seven out the eight criteria of suitability. In total, 8 articles satisfied less than three criteria of suitability of prediction model.

The quality assessment score for the prediction models was plotted as percentage value against the number of citations of the respective article. Reference values obtained from these prediction models are used for the biological and clinical interpretation lung function status. The reference values must ideally come from prediction models that satisfied all the criteria for suitability as recommended by the ATS. However, many articles that scored high on the quality assessment had low citations as compared to articles with lower scores. The correlation coefficient between the quality score and number of citations was 0.22 (Figure

Scatter plot of the relationship between the quality assessment score and number of citations of the selected articles.

The American Thoracic Society (ATS) has suggested various statistical considerations for prediction of lung parameters. These considerations include separate equations for male and female subjects, as well as separate equations based on ethnicity [

In the present study, statistical evaluation of parameters of regression diagnostics reported by the prediction equations was undertaken on the basis of coefficient of determination (R^{2}) and the reported SEE of the constants and regression coefficients. The goodness of fit for the regression model is generally reported using the coefficient of determination (R^{2}) and the standard error of the estimate (SEE). The proportion of the variability in the observed data explained by the predictor variables is given by the R^{2} value, and SEE is the average SD of data around the fitted regression line. As the differences between the predicted and observed values of lung function parameters in the reference population diminishes, the SEE value decreases, and correspondingly R^{2} increases [^{2} and SEE values may be able to define the ability of the prediction model to describe the tails of the distributions or the lower limit of “normal” value, and hence they are insufficient criteria to chose the best prediction model to clinically evaluate a population [

The statistical considerations for lung function prediction equations as suggested by the ATS were also evaluated in the present study. The evaluation was carried out on the parameters for reporting of detailed statistical methodology of regression models, availability of lower limit of normal vales or information for the calculation of the same and accompanying validation data on an independent data set for testing the validity of the prediction models.

Conventionally, in the Indian context the lung function pulmonary parameters have been modeled using traditional linear regression models, with the assumption of homoscedasticity and normality of residuals [

The step-wise regression model used in some articles provides more power and information than OLS procedure. It allows the handling of numerous predictor variables, fine-tuning the model for choosing the optimum predictor variables [^{2} and adjusted R^{2} values are too high. Additionally, the predicted values and confidence interval of the estimates are often too narrow [

Quantile regression, in comparison to linear regression, is a more adequate method to calculate reference ranges because it makes no distributional assumption and allows an independent estimation of conditional quantile functions resulting in reference limits, which are independent of global parameters like the standard deviation. Furthermore, the quantile regression shows a high robustness to outlier observations [

The more recent generalized additive modelling of location, scale and shape technique (GAMLSS) provides an extension to the LMS method [

In recent years, many advances have occurred in development of lung function prediction equations, such as development of standardized measurement protocols across all age groups, including those for preschool children [

India has the second largest population in the world, having 18% share of the global population. However, India has a disproportionately high percentage (32%) of the global DALYs from chronic respiratory diseases [

The authors have no competing interests to declare.