Prediction Of Pm10 Concentrations Using Logistic Regression Analysis: Case Study In Jerantut
Particulate matter (PM 10) can cause several serious negative health effects to humans when it is present in the environment. Thus, it is important for us to forecast its concentration levels in the environment so that we can reduce the risk of exposure towards particulate matter. Secondary data on the concentration of PM 10, sulphur dioxide (SO 2), nitrogen dioxide (NO 2), ground level ozone (O 3), carbon monoxide (CO) along with temperature and relative humidity at Jerantut monitoring stations between 2010 to 2012 obtained from Department of Environment. The main objective of this study is to describe the relationship between PM 10 with other gases and weather conditions by using correlation. It also aims to determine the best prediction categories. Furthermore, this research aims to find a model for predicting the concentration of PM 10 using logistic regression. PM 10 and O 3 at Jerantut monitoring station were found to have a strong positive correlation. The best logistic regression model was obtained at Jerantut station in 2010 with an R 2 value of 0.565. The best prediction category for Jerantut monitoring stations was shown to be healthy with a correct percentage of more than 85% obtained from the analysis of the overall and annual results between 2010 to 2012.
Keywords: Air pollutionParticulate matterprediction model
Air pollution basically refers to the contamination of the indoor or outdoor environment by any types of agent that modifies the natural characteristics of the atmosphere ( World Health Organization, 2014). Hanapi and Din ( 2012) have described that the cause of air pollution may come from many sources such as waste products, construction work, factory emissions and vehicles. Pollutants at ground level are caused by human activities and natural events. Acson International in their Healthy Air Booklet stated that the main sources of air pollution in Malaysia are industrial fuel burning, motor vehicles, domestic fuel burning, power stations as well as the burning of industrial and municipal waste ( Acson Malaysia Sales and Service Sdn Bhd, 2012). Malaysia, Air Pollution Index (API) is currently use as an indicator to measure the air quality ( Hanapi & Din, 2012). According to the Department of Environment Malaysia (2013), API is calculated based on five major types of air pollutants at air pollution monitoring stations belong to Department of Environment Malaysia. These include PM 10, sulphur dioxide, nitrogen dioxide, ground level ozone and carbon monoxide. The indications of API value below 50 classified as good, 51-100 classified as moderate, 201-300 classified as unhealthy, more than 300 is classified as hazardous whereas an API value above 500 is classified as an emergency.
PM known as particulate matter or fine dust, it is a complex mixture of liquid droplets with extremely small particles. In addition, it is made up of several components including organic chemicals, acids, dust particles, soil and metals. PM with an aerodynamic diameter of less than 10μ
Department of Occupational Safety and Health Malaysia (2014) stated that PM 10 can negatively affect human health if the API value exceeds 100. Environment Statistics Time Series Malaysia (2013) summarised that unhealthy events caused by transboundary heavy particulate matter were recorded between 2002 to 2013 with a maximum of 3 days recorded in 2005. By referring to heavy particulate matter pollution reported by the New Straits Times ( 2014), the standard operating procedure for schools to close is when the API exceeds 200 where the air quality is at a “very unhealthy” level. However, the number of days for the school to be closed depends on the duration of the high particulate event. This causes uncertainty to the public because they would not be able to know the duration of closure. It would make it difficult for them to plan or schedule outdoor social activities. In reduced the difficulties, the appropriate method of prediction.
According to Pascal et al. ( 2014), exposure to PM 10 has been consistently associated with serious health outcomes, resulting in an increase in mortality and hospital admissions predominantly related to cardiovascular and respiratory disease. There are many significant studies have linked PM 10 to a series of significant health problems, including aggravated asthma, increase in respiratory symptoms like coughing and difficult breathing, chronic bronchitis, decreased lung function, and premature death. One of the unhealthy events in Malaysia is the presence of heavy particulate matter caused by uncontrolled forest fires originating from the Indonesian province of Sumatra during the burning season ( Norela, Saidah, & Mahmud, 2013). Forest fires are normally used for land preparation and forest clearance by people involved in farming. Unfortunately, this could develop into uncontrollable wildfires. This situation usually happens between June and November coinciding with drier weather conditions ( Salinas et al., 2013). Due to these issues, there are need to provide an early warning to those who may be effected. Short term prediction is quite relevant to provide the information about PM 10 concentration.
Is logistic regression suitable for prediction of PM 10 concentration?
Purpose of the Study
The main objective in this study is to describe the relationship between PM 10 with other gases and weather conditions by using correlation. It also aims to determine the best prediction categories. Furthermore, this research aims to develop a model for predicting the concentration of PM 10 using logistic regression.
From a set of variables that can be continuous, discrete, dichotomous or a mixture of these variables, we can use a method to predict a discrete outcome. This method is known as logistic regression. Logistic regression can be used to answer the same questions as discriminant analysis. However, the difference between logistic regression and discriminant analysis is that it has no assumption about the distribution of independent variables. The application of logistic analysis is predicting the success or failure of a new product, determining what category of a credit risk a person will fall into and predicting whether a firm will be successful or otherwise.
In statistical analysis, the main objectives of logistic regression are to correctly predict categories of outcome for individual cases as well as to establish a relationship between the outcome and the independent variables.
The main purpose of logistic regression in statistical analysis to correctly predict categories of outcome for individual cases. A model must create that includes useful and related independent variables in order accomplish this purpose. Beside that, logistic regression also purposely to measure the relationship between categorical dependent variable and independent variables.
Logistic regression does not require the assumption of normality. However, the sample size must be large enough, at least 100 observations and a ratio of 20 observations for each independent variable. For this distribution, a log transformation needed along to create link with a normal regression equation. The log transformation or known as logistic regression of also called as defined as:
60% of the training data was used to obtain the logistic regression model. Another 40% of the data was used for validation purposes. When the percentage correct prediction of the training data is the same or higher than the validation data, the model is considered as good and suitable to used for prediction.
Data and Area of Study
In this research, the secondary data used was recorded between 2010 to 2012. This data set consists of the data on air pollutants such as PM 10, CO, SO 2, NO 2 and O 3 with the meteorological data of temperature and relative humidity. The secondary data was obtained from the Air Quality Division of the Department of Environment Malaysia. The data was collected and monitored by Alam Sekitar Malaysia Sdn. Bhd. (ASMA), which is the authorized agency for DoE ( Azid et al., 2014). The data was subjected to standard quality control processes and quality assurance procedures which followed the standard quality outlines by the United States Environment Protection Agency (USEPA) ( Latif et al., 2014).
Based on the descriptive statistics provided in Table
Correlation between PM 10, other Gaseous and Meteorological Parameters
The Pearson correlation analysis was used to study the correlation between gaseous (SO 2, NO 2, O 3 and CO 2), PM 10 and meteorological parameters. The correlation between other gaseous, PM 10 and meteorological parameters for Jerantut monitoring stations is shown in Table
From the Table
A positive significant correlation between PM 10 and temperature is expected as higher temperature leads to high evaporation and resuspension of particles in ambient air. Furthermore, the negative correlation between relative humidity and PM 10 was also expected. This is because humidity and rainfall would reduce the number of particulate matter in the air because of the wash-out process ( Mahiyuddin et al., 2013). High temperature tends to cause lower humidity level and hot weather, which in turn promotes local and regional biomass burning that subsequently increases the quantity of particles in air ( Latif et al., 2014).
Logistic Regression Analysis
The logistic regression analysis was conducted to determine the best fitting model describing the relationship between dependent variables which include healthy, moderate or unhealthy and a set of independent explanatory variables which include temperature, relative humidity, SO 2, NO 2, O 3 and CO. The value of R 2 and the percentage of the correct prediction of group classification were also calculated between 2010 to 2012 to find the best fit model.
The overall and yearly regression model and R 2 values between 2010 to 2012 at Jerantut Station are shown in Table
From the secondary data obtained from the DoE which was analysed via descriptive statistics and correlation, the result shows that the level of maximum concentration of PM 10 at Jerantut station was under the limit based on the Malaysian Ambient Air Quality Guidelines (MAAQG) from 2010 to 2012. The correlation analysis between PM 10 and other gases and meteorological parameters at Jerantut station showed a strong correlation value of 0.614 between PM 10 and O 3. The result of the logistic regression analysis had a classification percentage of more than 90% for training and validation data every year. Moreover, the best logistic regression at Jerantut station in 2010 was an R 2 value of 0.565. The best prediction of percentage correct obtained was more than 85% which is considered healthy for the overall and yearly analysis.
Special thanks to Universiti Sains Malaysia for the funding with a Short-term Grant (PJJAUH/6315089).
- Acson Malaysia Sales and Service Sdn Bhd (2012). Air Pollution and Its Sources. Healthy Air Booklet, Available from: http://www.acson.com.my
- Afroz, R., Hassan, M. N., & Ibrahim, N. A. (2003). Review of Air Pollution and Health Impacts in Malaysia. Environmental Research, 93(2), 71-77.
- Azid, A., Juahir, H., Toriman, M. E., Endut, A., Kamarudin, M. K. A., Rahman, M. N. A., & Yunus, K. (2014). Source Apportionment of Air Pollution: A Case Study in Malaysia. Jurnal Teknologi, 72(1), 83-88.
- Bycenkiene, S., Plauskaite, K., Dudoitis, V., & Ulevicius, V. (2014). Urban Background Levels of Particle Number Concentration and Sources in Vilnius, Lithuania. Atmospheric Research, 143, 279-292.
- Department of Environment Malaysia (2013). Ministry of Natural Resources and Environment, http://apims.doe.gov.my/apims/General%20Info%20of%20Air%20Pollutant%20Index.pdf
- Department of Occupational Safety and Health (2014). Guidelines for the Protection of EmployeesAgainst the Effects of Haze at Workplace, Available from: http://www.dosh.gov.my/index.php?option=com_content&view=article&id=856:guidelines for-the-protection-of-employees-against-the-effects-of-haze-at-workplaces&catid=491:guidelines&Itemid=1199&lang=en
- Environment Statistics Time Series Malaysia (2013). Available from: https://www.statistics.gov.my/dosm/uploads/files/3_Time%20Series/Malaysia%20Time%20Series%202013/19Alam_Sekitar.pdf
- Hanapi, N., & Din, S. A. M. (2012). A Study on the Airbone Particulates Matter in Selected Museums of Peninsular Malaysia. Procedia-Social and Behavioural Sciences, 50, 602-613.
- Latif, M. T., Dominick, D., Ahamad, F., Khan, M. F., Juneng, L., Hamzah, F. M., & Nadzir, M. S. M. (2014). Long Term Assessment of Air Quality from a Background Station on theMalaysian Peninsula. Science of the Total Environment, 482, 336-348.
- Mahiyuddin, W.R.W., Sahani, M., Aripin, R., Latif, M.T., Thach, T.Q., and Wong, C.M. (2013).Short-term Effects of Daily Air Pollution on Mortality. Atmospheric Environment, 65, 69-79.
- New Straits Times (2014). Haze: Schools in Kuala Langat District also ordered to close, 14 March. Available from: http://www2.nst.com.my/7-day-news/wednesday/haze-schools-in-kuala-langat-district-also-ordered-to-close1.512403
- Norela, S., Saidah, M. S., & Mahmud, M. (2013). Chemical Composition of the Haze in Malaysia2005. Atmospheric Environment, 77, 1005-1010.
- Pascal, M., Falq, G., Wagner, V., Chatignoux, E., Corso, M., Blanchard, M., & Larrieu, S. (2014). Short-term Impacts of Particulate Matter (PM10, PM10-2.5, PM2.5) on Mortality in Nine French Cities. Atmospheric Environment, 95, 174-184.
- Salinas, S. V., Chew, B. N., Miettinen, J., Campbell, J. R., Welton, E. J., Reid, J. S., & Liew, S. C. (2013). Physical and Optical Characteristics of the October 2010 Haze Event over Singapore: A Photometric and Lidar Analysis. Atmospheric Research, 122, 555-570.
- Titos, G., Lyamani, H., Pandilfi, M., Alastuey, A., & Alados-Arboledas, L. (2014). Identification of Fine(PM1) and coarse (PM10-1) Sources of Particulate Matter in an Urban Environment. Atmospheric Environment, 89, 593-602.
- Ul-Saufie, A. Z. (2012). PM10 Concentration Short Term Prediction Using Regression ArtificialNeural Network and Hybrid Models (Doctoral Dissertation). Universiti Sains Malaysia.
- World Health Organization (2014). Available from: http://www.who.int/topics/air_pollution/en/
About this article
Cite this paper as:
Click here to view the available options for cite this article.