BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks

Hashemi H, Mahaki B, Farnoosh R. Comparing Spatial Ecological Regression Models in Breast Cancer Incidence in Iran. J Research Health 2024; 14 (4) :329-340

URL: http://jrh.gmu.ac.ir/article-1-2080-en.html

URL: http://jrh.gmu.ac.ir/article-1-2080-en.html

2- Department of Biostatistics, School of Health, Kermanshah University of Medical Sciences, Kermanshah, Iran. ,

3- School of Mathematics, Iran University of Science and Technology, Tehran, Iran.

Cancer is a disease caused by the abnormal growth of cells in the body. Abnormal growth of such cells eventually leads to the formation of large masses (tumors) [1]. Cancer is one of the leading causes of death worldwide. Despite many efforts to reduce cancer deaths in recent years, cancers are still the second most common cause of death after cardiovascular disease in developed countries and the third leading cause of death in developing countries [2]. According to the World Health Organization (WHO), cancer kills 7.6 (13% of all deaths) millions of people worldwide each year. About 70% of all cancer deaths occur in low and middle-income countries [3].

Estimates show that by 2030, about 21 million new cases of cancer will occur annually, of which 60% to 70% will be in developing countries, and the global burden of cancer is increasing due to aging and population growth, as well as high-risk behaviors, especially smoking. In addition to the psychological and economic consequences, patients with cancer suffer from cancer pain, which adversely affects their quality of life [4].

Today, 45% to 50% of deaths in women aged 45 to 64 years, and 30% of deaths in men aged 45 to 64 years are associated with cancers. This high rate indicates a significant increase in cancer deaths compared to the last century. Air pollution, smoking, diet change, alcohol consumption, stress, and so on can be risk factors for cancer, which has grown significantly in the last century [5].

In Iran, cancer is the third leading cause of death. Every year, more than 30000 people in Iran die of cancer. It is estimated that more than 70000 new cancers occur annually in the country [6].

Cancer rates are expected to rise in the future due to increased life expectancy and modern lifestyle trends [7]. Accordingly, the priority of health policymakers should be to establish a national center for cancer control and prevention. The forecast of health organizations for the incidence of cancer in Iran in 2020 will reach 85653 cases in the total population and the rate of cancer deaths will reach 622897 cases [6].

Of all cancers, breast cancer is the most common and a major cause of cancer death in women in Western countries. The breast is a secretory organ made up of glandular tissue and ducts. The cause of this cancer is unknown; however, due to the rarity of this cancer in men, its etiology is due to female hormones [8]. These tumors grow slowly but reach an advanced stage shortly after onset. The variability of the distribution of this cancer in different regions shows the influence of the role of environmental factors in this cancer [9].

According to many studies, physical activity, overweight, and obesity are the most important and measurable factors in the incidence of cancer, so many studies on breast cancer and risk factors have been conducted [10-12]. Few studies have been done at the ecological level, given the geographical distribution [13]. In this study, based on ecological information and mathematical models, changes in breast cancer incidence were considered. Due to the aging population and consequently the upward trend of cancer in Iran and the importance of cancer prevention and the spread of cancer in different parts of the country, it is necessary to identify risk factors and areas with risk factors [14]. In recent years, attention to mapping and the risk of disease has increased significantly because the geographical distribution of incidence, prevalence, and mortality rates has played an important role in identifying risk factors and causes of many diseases and should not be underestimated [15].

A study by Colonna et al. examined how to select and interpret a Bayesian spatial model and a Poisson regression model to explain the variability of small-area cancers. In this study, Besag-York-Mollie (BYM) models were used to map diseases, spatial autocorrelation tests (Moran statistics) of systemic inflammatory response syndrome, disseminated intravascular coagulation criteria to compare different BYM and compare the experimental variance of structural heterogeneity and non-structural heterogeneity of the BYM model [16].

Renart et al. investigated common ecological regression errors of common cancers on the exclusion index. They presented two models of relative risk estimated by the indirect method and the use of systemic inflammatory response syndrome as the response variable (model 1) and the relative risk model estimated using the model, including age as the explanatory variable and crude cancer rate as the response variable (model 2). They compared and found that model 2 fits better while model 1 leads to skewness. Accordingly, if the age variable was considered one of the explanatory variables and the raw rates as the response variable was used to explain the relative risk of cancer outbreak using ecological models that control geographic variability, the raw variable would be less skewed [17].

Hou et al. showed that a healthy eating pattern, which includes eating fruits and vegetables, can reduce the risk of breast cancer. In many studies, researchers analyze data that contains geographic information and provides information about a specific location and space. The data that has such a property is called spatial data [18].

With access to spatial or spatio-temporal information, statistical methods have also been developed to use this data to obtain more accurate information. One of the most important of such methods is disease mapping. Disease mapping is one of the oldest and most important tools for making assumptions about the cause of diseases and identifying areas that need to be studied more closely. The study of spatial (or space-time) changes and the rate of disease is called disease mapping.

Therefore, in the present study, the incidence of breast cancer in the provinces of Iran and the role of risk factors for overweight or obesity and physical activity were investigated using BYM which considers the role of spatial correlations between cancer incidence in the study areas. The models used in this study are complete BYM and experimental Bayesian (Gamma Poisson, log normal) models. Accordingly, the present study investigates the incidence of breast cancer in Iran at the province level and also explores the impact of some covariates of some risk factors by use of the temporal risk of cancer, Poisson, log-normal, and BYM models.

This study is an ecological analysis that examines the relationship between the prevalence of diseases and risk factors in groups. The regression analysis of this model is based on ecological regression. Due to the non-independence of regions (responses) in geographical studies, to prevent possible bias in estimating regression coefficients, the spatial correlation structure of responses in neighboring regions should be considered. This type of regression is called spatial ecology regression. The Bayesian approach should be used to fit this model. This model is called the Bayesian spatial ecology regression model. In this study, the data from 30 provinces that were registered at the Cancer Registration Center was used.

Information on the incidence of breast cancer was extracted from the annual national reports of cases registered by the Center for Non-Communicable Diseases Management of the Ministry of Health and Medical Education. The data were collected by the Cancer Department and from the cancer registration system. Cancer data were extracted from the Center for Cancer and Non-Communicable Diseases of the Ministry of Health. New cases with definite confirmation of cancer are registered in the disease registration centers of the province and are referred to the national center annually.

Disease mapping examines the spatial (or space-time) changes in disease rates, and it shows the geographical distribution of a disease within a given population, which determines the spatial pattern of the addresses of several specific diseases. One of the important goals in the analysis of spatial or spatio-temporal data is the use of statistical models to determine the effects of potential risk factors on the occurrence of the desired outcome.

The simplest method of these methods is raw mapping of diseases, which is usually misleading due to the small number of values in the areas and not considering the spatial correlation between areas, in which the relative risk criterion of each area can be used. In that condition, the probability of a person getting an illness in an area is divided by the probability of that person from the population, which is called the standardized mortality rate (SMR).

Gamma-Poisson models, normal logs, and experimental BYM are the subsets of Bayesian models. In data related to disease mapping, due to lack of data, the use of Bayesian methods that combine data with previous information is a more appropriate method than the SMR method. The following model was used to evaluate the expected incidence of disease in an area and to assess the relationship between incidence and risk factors [19].

In disease mapping, we assume that the study area is divided into a smaller area (I=1, 2… I) and the observed number of deaths due to the disease (or incidence) in the I and E areas. I is the expected number of diseases in that area. Assuming the target community, SMR is defined as follows (Equation 1) [20]:

SMR values are an estimate of the relative risk of each area. To calculate these indices, the map is divided into n adjacent non-interfering regions (I=1, …, n).The number of observed and expected events in the area I are denoted, respectively, in which it is assumed to be fixed and known in the study period and the product of the population of each area in the total incidence (to interfering regions (I = 1, …, n). The number of observed and relative risk of disease in the region I is also indicated in the population in area I (Equation 2).

In disease mapping, it is assumed that the number of events between regions is independent of each other and follows the Poisson distribution on average.

If Y is defined as the number of events of an event in a given spatial or temporal interval such that the average of the event in question in that interval, then Y will have a Poisson distribution with parameter λ (Equation 3) [21]:

In Bayesian inference, parameters are considered as random variables that are used to observe data to update the previous information. At the core of Bayesian analysis is data likelihood. Validation is the co-distribution of the observed data in terms of a parameter or parameter vector (θ). It can also be defined as a function that describes the dependence of the parameters on the sample values. All data information is expressed by the likelihood function. In addition, the probability principle implies that any event that did not occur does not affect the final inference because all inferences are based on the probability and information of the observed data [22].

The normal log model model has limitations because the compatibility of the independent variable is difficult and spatial correlations between regional rates are not possible. The normal log model is more flexible for relative risk (Equation 4).

To examine the effect of prior distribution selection on relative risk estimation, it is important to use sensitive tests for different prior distribution choices. If the data is large then the data overrides the previous distribution. Therefore, in such a case, selecting the initial values of the parameters is less important. If the data is small, choosing the right combination of initial parameter values becomes important [23].

The summary of the data updated by year and province from 1999 to 2010 are used for analysis and we used several types of data to estimate the mortality of all causes in Iran, including data source system virtual reality (VR) (data source=32), and surveys. We extracted the censuses, from which the mortality call data (data source=9) were summarized, in addition to birth history (5 data sources for SBH and complete birth history) (one data source for CBH). The Ministry of Health and Medical Education of Iran is responsible for providing and managing cancer registration in Iran. Cancer registration information from 2000 to 2010 is available for the whole country. However, the cancer registration system has problems, such as missing data, incomplete cancer registration system, and duplicate data. Meanwhile, the data is only available to some researchers for a limited number of years. Research on non-communicable diseases has been conducted at the University of Tehran.

The total number of registered breast cancers in the years 2005 to 2009 in Iran was 32 694 cases. Among the provinces, the most observed cases are related to Isfahan Province with 2862 cases, and Khorasan Razavi Province with 2646 cases. The lowest incidence is associated with Kohgiluyeh and Boyer-Ahmad provinces with 104 cases. The Table 1 shows the total number of registered and expected breast cancers from 2005 to 2009.

According to the observed and expected values of breast cancer incidence in the provinces, the results showed that at the end of 2009, Tehran Province (n=2088) and Isfahan Province (n=643) had the highest incidence, and South Khorasan Province and Kohkiluyeh and Boyer-Ahmad Province had the lowest cancer rates.

In estimating the proportional risk of breast cancer with and without the involvement of risk factors according to the BYM model, the results showed that the highest incidence of breast cancer was in Tehran Province and the lowest was in Sistan Baluchestan and Kohkiluyeh Boyer-Ahmad provinces (Table 1).

In estimating the relative risk of breast cancer with and without the involvement of risk factors according to the normal log model, the results showed that the highest incidence of breast cancer was in Tehran Province and the lowest was in Sistan Baluchestan and Kohkiluyeh Boyerahmad provinces and the incidence of cancer in the provinces was lower in the column without the presence of risk factors (Table 2).

In estimating the provincial risk of breast cancer with and without the involvement of risk factors according to the Gamma-Poisson model, the results showed that the highest incidence of breast cancer was in Tehran and Isfahan provinces and the lowest in Kohkiluyeh-Boyerahmad Province and then in Sistan and Baluchestan Province. The results in this study were opposite to other models (Table 3).

According to Table 4, regarding the Gamma-Poisson model, the role of risk factors has become significant. This model does not consider the correlation between provinces. For this reason, using the table related to the Gamma-Poisson model can be misleading. In the log-normal model, considering the non-structural heterogeneity, the effects of the variables were adjusted and the factors of overweight and obesity became significant. The positiveness of this coefficient shows that the increase in overweight and obesity increases the incidence of cancer.

Figure 1A shows the relative risk of provinces without adjusting risk factors and without considering structural and non-structural heterogeneities. According to this map, the provinces of Yazd and Tehran have the highest risk, and Sistan and Baluchestan Province has the lowest risk of breast cancer. Meanwhile, the central provinces are more at risk.

Figure 1B shows the relative risk of the provinces by considering non-structural heterogeneities without modulating the effect of risk factors. According to this map, the northwestern and southeastern provinces have a lower risk of breast cancer, and the provinces of Sistan Baluchestan and Kohgiluyeh and Boyer-Ahmad have the lowest risk of breast cancer.

Figure 1C shows the relative risk of the provinces by considering non-structural heterogeneity and by modifying the risk factors. According to this map, Khorasan Razavi and Hamedan provinces have the highest risk of breast cancer, and Kohgiluyeh and Boyer-Ahmad, Sistan and Baluchestan, and Ardabil provinces have the lowest risk of breast cancer.

Figure 1D shows the relative risk of the provinces without modulating the effect of risk factors and taking into account structural and non-structural heterogeneities. According to this map, Isfahan and Tehran provinces have the highest risk of Sistan and Baluchestan, Kohgiluyeh, and Boyer-Ahmad provinces.

Figure 1E shows the relative risk of the provinces by modulating the effect of risk factors and considering structural and non-structural heterogeneities. According to this map, the provinces of Yazd, Qazvin, Ardabil, and North Khorasan have the lowest risk of breast cancer, and the provinces of Khorasan Razavi, Khorramabad, and Hamedan have the highest risk.

According to the BYM model, without adjusting the effect of risk factors, the provinces of Isfahan, Yazd, and Tehran have the highest risk of breast cancer, followed by the provinces of North, Fars, Khuzestan, and North Khorasan, and the northeastern and southwestern provinces have the highest risk. Among the provinces, Sistan and Baluchistan Province and Chaharmahal and Bakhtiari Province had the lowest risk of infection. These results are consistent with the results of the study by Khoshkar et al [24]. In the present study, the time trend of cancer incidence was estimated in 27% of the rising regions. Importantly, the estimation of the rising trend of risk for provinces with low cancer risk, which indicates changes in the pattern of cancer incidence in these provinces and the need for serious interventions. Meanwhile, the great distance of most of these areas from the center of the country and their location in the border points, which are economically different from other provinces of the country is considerable. Many studies have found the role of distance and economic status to be effective in controlling and reducing the incidence and complications of cancer.

In the log-normal model, considering the non-structural heterogeneity, the effects of the variables were adjusted and the factors of overweight and obesity became significant. The positiveness of this coefficient shows that the increase in overweight and obesity increases the incidence of cancer. The log-normal model does not consider the structural correlation between provinces. The most complete model is the BYM model, which considers both structural and non-structural correlations. Considering the structural and non-structural heterogeneity, none of the risk factors are significant. Table 4 shows the comparison of the goodness of fit of Gamma-Poisson, log-normal, and BYM models with and without risk factors using the deviance information criterion (DIC) index. The BYM model has the best fit without the presence of risk factors because it includes structural and non-structural heterogeneities. In this model, the effect of risk factors is moderated by considering structural and non-structural heterogeneities, and there is no need to consider risk factors. The Gamma-Poissen model has the worst fit without the presence of risk factors. Considering the risk factors, the BYM and log-normal models have almost the same fit. In the log-normal model, the inclusion of risk factors did not change the goodness of fit of the model. The Gamma-Poissen model has a poor fit because this model does not consider spatial correlations between provinces. Therefore, it is better to use the BYM model among ecological regression models in ecological analysis.

The unadjusted BYM model had the best fit among the considered models. Without adjusting the effect of risk factors, the provinces of Isfahan, Yazd, and Tehran have the highest incidence of breast cancer and the provinces of Sistan and Baluchistan, Chaharmahal and Bakhtiari have the lowest incidence. By adjusting the risk factors, Khorasan-Razavi, Lorestan, and Hamedan provinces have the highest, and Ardabil, Kohgiluyeh and Boyer Ahmad provinces have the lowest relative risk. For prostate cancer in the unadjusted model, Fars, Semnan, Isfahan, and Tehran provinces have the highest, and Sistan and Baluchistan province has the lowest relative risk. By adjusting the effect of risk factors, Fars and Zanjan provinces have the highest relative risk and Kerman, North Khorasan, Kohgiluyeh and Boyer Ahmad, Qazvin, and Kermanshah provinces have the lowest relative risk.

Air pollution, family history, neonate feeding situation, and other covariates were not available at the province level. So, we suggest conducting further ecological research regarding these factors as well.

Compliance with ethical guidelines

In this study, the principle of confidentiality of information was observed and no information except for the conditions of the research was placed in the possession of any organization or organization.

This study was funded by Islamic Azad University, Science and Research Branch.

Study design: Hasti Hashemi; Methodology: Behzad Mahaki; Data collection: Hasti Hashemi and Rahman Farnoosh; Data analysis: Hasti Hashemi and Behzad Mahaki.

The authors declared no conflict of interest.

The authors are thankful to all researchers and breast cancer specialists who kindly participated in this study.

- Bovero A, Gottardo F, Botto R, Tosi C, Selvatico M, Torta R. Definition of a good death, attitudes toward death, and feelings of interconnectedness among people taking care of terminally ill patients with cancer: An exploratory study. The American Journal of Hospice & Palliative Care. 2020; 37(5):343-9. [DOI:10.1177/1049909119883835] [PMID]
- Yang B, Choi H, Lee SK, Chung SJ, Yeo Y, Shin YM, et al. Risk of coronavirus disease 2019 occurrence, severe presentation, and mortality in patients with lung cancer. Cancer Research and Treatment. 2021; 53(3):678-84. [DOI:10.4143/crt.2020.1242] [PMID]
- WHO. WHO report on cancer: Setting priorities, investing wisely and providing care for all. Geneva: 2020. [Link]
- Williams F, Zoellner N, Hovmand PS. Understanding Global Cancer Disparities: The role of social determinants from system dynamics perspective. Transdisciplinary Journal of Engineering & Science. 2016; 7:10.22545/2016/00072. [DOI:10.22545/2016/00072] [PMID]
- Curtin SC. Trends in cancer and heart disease death rates among adults aged 45-64: United States, 1999-2017. National vital Statistics Reports. 2019; 68(5):1-9. [PMID]
- Farhood B, Geraily G, Alizadeh A. Incidence and mortality of various cancers in Iran and compare to other countries: A review article. Iranian Journal of Public Health. 2018; 47(3):309-16. [PMID]
- You W, Henneberg M. Cancer incidence increasing globally: The role of relaxed natural selection. Evolutionary Applications. 2018; 11(2):140-52. [DOI:10.1111/eva.12523] [PMID]
- Ghoncheh M, Pournamdar Z, Salehiniya H. Incidence and mortality and epidemiology of breast cancer in the world. Asian Pacific Journal of Cancer Prevention. 2016; 17(S3):43-6. [DOI:10.7314/APJCP.2016.17.S3.43] [PMID]
- Frugtniet B, Jiang WG, Martin TA. Role of the WASP and WAVE family proteins in breast cancer invasion and metastasis. Breast Cancer (Dove Med Press). 2015; 7:99-109. [DOI:10.2147/BCTT.S59006] [PMID]
- Ballard-Barbash R, Hunsberger S, Alciati MH, Blair SN, Goodwin PJ, McTiernan A, et al. Physical activity, weight control, and breast cancer risk and survival: Clinical trial rationale and design considerations. Journal of the National Cancer Institute. 2009; 101(9):630-43. [DOI:10.1093/jnci/djp068] [PMID]
- Leray H, Malloizel-Delaunay J, Lusque A, Chantalat E, Bouglon L, Chollet C, et al. Body Mass Index as a major risk factor for severe breast cancer-related lymphedema. Lymphatic Research and Biology. 2020; 18(6):510-6. [DOI:10.1089/lrb.2019.0009] [PMID]
- Rouanet P, Roger P, Rousseau E, Thibault S, Romieu G, Mathieu A, et al. HER2 overexpression a major risk factor for recurrence in pT1a-bN0M0 breast cancer: Results from a French regional cohort. Cancer Medicine. 2014; 3(1):134-42. [DOI:10.1002/cam4.167] [PMID]
- Kitaeva AB, Gorshkov AP, Kirichek EA, Kusakin PG, Tsyganova AV, Tsyganov VE. General patterns and species-specific differences in the organization of the tubulin cytoskeleton in indeterminate nodules of three legumes. Cells. 2021; 10(5):1012. [DOI:10.3390/cells10051012] [PMID]
- Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, et al. Risk factors and preventions of breast cancer. International Journal of Biological Sciences. 2017; 13(11):1387-97. [DOI:10.7150/ijbs.21635] [PMID]
- Oertelt-Prigione S, Seeland U, Kendel F, Rücke M, Flöel A, Gaissmaier W, et al. Cardiovascular risk factor distribution and subjective risk estimation in urban women--the BEFRI study: A randomized cross-sectional study. BMC Medicine. 2015; 13:52. [DOI:10.1186/s12916-015-0304-9] [PMID]
- Colonna M, Sauleau EA. How to interpret and choose a Bayesian spatial model and a Poisson regression model in the context of describing small area cancer risks variations. Revue D'epidemiologie et de Sante Publique. 2013; 61(6):559-67. [DOI:10.1016/j.respe.2013.07.686] [PMID]
- Renart G, Saez M, Saurina C, Marcos-Gragera R, Ocaña-Riola R, Martos C, et al. A common error in the ecological regression of cancer incidence on the deprivation index. Revista Panamericana de Salud Publica=Pan American Journal of Public Health. 2013; 34(2):83-91. [PMID]
- Hou R, Wei J, Hu Y, Zhang X, Sun X, Chandrasekar EK, et al. Healthy dietary patterns and risk and survival of breast cancer: A meta-analysis of cohort studies. Cancer Causes & Control. 2019; 30(8):835-46. [DOI:10.1007/s10552-019-01193-z] [PMID]
- Thygesen HH, Zwinderman AH. Modeling Sage data with a truncated gamma-Poisson model. BMC Bioinformatics. 2006; 7:157. [DOI:10.1186/1471-2105-7-157] [PMID]
- Yanagimoto T, Kashiwagi N. Empirical Bayes methods for smoothing data and for simultaneous estimation of many parameters. Environmental Health Perspectives. 1990; 87:109-14. [DOI:10.1289/ehp.9087109] [PMID]
- Joe H, Zhu R. Generalized Poisson distribution: The property of mixture of Poisson and comparison with negative binomial distribution. Biometrical Journal. 2005; 47(2):219-29. [DOI:10.1002/bimj.200410102] [PMID]
- Wright DK, MacEachern S, Lee J. Analysis of feature intervisibility and cumulative visibility using GIS, Bayesian and spatial statistics: A study from the Mandara Mountains, northern Cameroon. PLoS One. 2014; 9(11):e112191. [DOI:10.1371/journal.pone.0112191] [PMID]
- Gajewski BJ, Sedwick JD, Antonelli PJ. A log-normal distribution model of the effect of bacteria and ear fenestration on hearing loss: A Bayesian approach. Statistics in Medicine. 2004; 23(3):493-508. [DOI:10.1002/sim.1606] [PMID]
- Khoshkar AH, Koshki TJ, Mahaki B. Comparison of bayesian spatial ecological regression models for investigating the incidence of breast cancer in Iran, 2005- 2008. Asian Pacific Journal of Cancer Prevention. 2015;16(14):5669-73. [DOI: 10.7314/apjcp.2015.16.14.5669] [PMID]

Type of Study: Orginal Article |
Subject:
● Health Education

Received: 2022/03/28 | Accepted: 2024/02/10 | Published: 2024/07/1

Received: 2022/03/28 | Accepted: 2024/02/10 | Published: 2024/07/1

Rights and permissions | |

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |