Molecular and Genetic Epidemiology of Cancer in Low- and Medium-Income Countries

Background: Genetic and molecular factors can play an important role in an individual’s cancer susceptibility and response to carcinogen exposure. Cancer susceptibility and response to carcinogen exposure can be either through inheritance of high penetrance but rare germline mutations that constitute heritable cancer syndromes, or it can be inherited as common genetic variations or polymorphisms that are associated with low to moderate risk for development of cancer. These polymorphisms can interact with environmental exposures and can influence an individual’s cancer risk through multiple pathways, including affecting the rate of metabolism of carcinogens or the immune response to these toxins. Thus, these genetic polymorphisms can account for some of the geographical differences seen in cancer prevalence between different populations.


INTRODUCTION
Genetic and inherited factors play a significant role in predicting an individual's susceptibility to cancer. 1 The risk for nasopharyngeal cancer is 4-to 10-fold among individuals with a first-degree relative with the cancer. 2 Similarly, first-degree relatives of breast cancer patients have a 2-fold higher risk for developing breast cancer 3,4 and, a family history of pancreatic cancer is associated with a 9-fold increased risk compared with the general population. 5 Carcinogenesis is considered to be a multistep, evolving process that involves interaction between genetic and environmental factors. 6,7 Molecular or genetic epidemiology is the discipline of epidemiological study that focuses on the contribution of genetic and environmental risk factors to cancer risk and the interplay between them at the cellular and molecular levels. 8,9 A major breakthrough in this field was the sequencing of complete human DNA by The Human Genome Project. 10 This was followed by the creation of genome-wide databases of common genetic variations in humans. This information, in addition to recent advances in genotyping technology, has provided the foundation to more comprehensively examine genetic variations in relation to cancer risk using genome-wide association studies (GWAS; Fig. 1). GWAS have identified loci for multiple new cancer-susceptibility genes 11 and use the design of observational epidemiology studies (case-control and cohort studies) to determine association of low penetrance alleles with cancer risk by calculating odds ratio or relative risk. Human genome is comprised of discrete regions, of linkage disequilibrium (LD), where nearby single nucleotide polymorphisms (SNPs) show strong correlation with one another. 12 The International HapMap Project was initiated in 2002 to study LD patterns across the genome in multiple populations. The HapMap Project genotyped more than 3 million SNPs in 269 samples from 4 populations (Northern and Western European ancestry, Chinese, Japanese, and Yorubans from Nigeria). 13 The completion of these 2 projects and the advances in genomesequencing technologies including next-generation sequencing and high-volume SNP genotyping platforms such as the Illumina BeadArray and the Affymetrix GeneChip array have revolutionized the practice of molecular epidemiology. Commons terms used in the field of molecular and genetic epidemiology and their definitions are listed in (Table 1).

GENETIC AND MOLECULAR MECHANISMS OF CARCINOGENESIS
A human gene is composed of coding (exons) and noncoding (introns) regions, and regulatory DNA sequences. 14 A DNA mutation occurring in germline cells may be passed on to progeny and thereby lead to so-called less common heritable cancer syndromes ( Table 2). The degree of cancer risk associated with inheriting a mutation depends on gene penetrance, which is the probability that an individual carrying a mutation will develop disease and may be modified by a combination of genetic and environmental factors. Gene mutations associated with inherited cancer syndromes usually involve oncogenes or tumor suppressor genes. Several hundred proto-oncogenes have been identified that have the potential to be converted into oncogenes and lead to cancer. They usually are involved in processes such as cell division, apoptosis, signal transduction, and cell differentiation.
Additionally, cancer susceptibility also can be inherited through more common nucleotide sequence variations. These variations include SNPs, 1 base-pair substitutions, sequence insertions and deletions, and highly variable repeating nucleotide segments. SNPs are the most common form of variation and exist approximately every 300 bp on average across the genome. 13 An SNP is a DNA sequence variation occurring when a single nucleotide (A, T, C, or G) in the genome differs between members of a species. Between ethnically different populations, there are 7 million differences at the level of single-base pairs occurring at a frequency of at least 5% of the population. 15 Most SNPs have no functional consequence if they occur in a noncoding sequence, or might have modest effects that may interact with environmental factors to increase cancer susceptibility.
Epigenetic mechanisms of carcinogenesis are increasingly being identified as key steps in pathways from exposure to cancer. These mechanisms do not depend on structural changes in DNA but on functional regulation such as DNA methylation and histone modification. These processes are important determinants of gene expression as they determine the process by which the instructions in genes are converted to mRNA, directing protein synthesis. 16 Genomic instability also plays an important role in carcinogenesis both in inherited and sporadic cancers and is broadly classified into microsatellite instability and chromosome instability. In inherited cancers, genomic instability is mostly caused by mutations in DNA-mismatch repair. 17

CANCER SUSCEPTIBILITY FROM HERITABLE CANCER SYNDROMES
Once a cancer is ascertained to have a genetic component through familial aggregation studies, the next step is to identify the pattern of Mendelian inheritance (autosomal dominant or recessive) using segregation analysis. Then linkage analysis is used to localize the chromosomal region containing the cancer-causing gene. This approach has been used to identify multiple inherited cancer syndromes. For instance, linkage analysis was conducted as a genome-wide search among 23 families with multiple individuals affected with early-onset breast cancer to identify associated genetic loci. 18 They linked a region on chromosome 17q21 and positional cloning was used to localize the BRCA1 gene in this region. 19 Two years later BRCA2, located on chromosome 13q12e13, was identified. 20 Both BRCA1 and BRCA2 are tumor suppressor genes. Mutations in BRCA1 and BRCA2 also are associated with other malignancies, such as ovarian, prostate, and pancreatic cancers; malignant melanoma; gallbladder and bile duct cancers; and stomach cancer. 21,22 In general, women with BRCA1 have a lifetime risk for breast cancer of 60% to 85% and prevalence of BRCA1 or BRCA2 mutations in the general population range from 1 in 200 to 1 in 1000. 23 Factors that modify penetrance appear to be reproductive factors and exogenous hormones. 24 Cancer risk from BRCA1 and BRCA2 is reported to be highest in Ashkenazi Jews. 25 It is estimated that 5% to 10% of all breast cancers can be attributed to highly penetrant germline mutations and mutations in BRCA1 and BRCA2 contribute to 90% of these. 26 Other heritable syndromes associated with breast cancer include Li-Fraumeni familial cancer syndrome (mutations in P53) 23 and Cowden's syndrome (mutation in PTEN). 26 Linkage analysis also has been used to identify a heritable cancer syndrome in melanoma. As 5% to 10% of melanomas occur among individuals with a strong family history of the disease, linkage analyses of melanoma kindreds in the 1990s was performed. The analyses identified a familial melanoma susceptibility locus on the short arm of chromosome 9. 27,28 This led to the identification, of the CDKN2A gene, which is the most common cause of inherited susceptibility to melanoma. CDKN2A mutation penetrance has been estimated to be 30% by age 50 years and 67% by age 80 and varies by geographical location. 29 The 2 heritable syndromes associated with increased risk for colorectal cancer are familial adenomatous polyposis (FAP) and hereditary nonpolyposis colorectal cancer (HNPCC or Lynch syndrome). FAP is caused by an inherited mutation in the APC gene, a tumor suppressor gene that regulates formation of a protein names b-catenin involved in activation of transcription of various oncogenes. 30,31 This is inherited as an autosomal dominant disorder and is manifested as numerous adenomatous polyps in the epithelium of the large intestine, which almost always progress to cancer. HNPCC is also an autosomal dominant disorder that is associated with an increased risk for colorectal cancer and various other cancers, including endometrial and ovarian cancers. 32,33 HNPCC accounts for up to 3% of all colorectal carcinomas 34 and 5% to 10% of all ovarian cancers. 32 This syndrome is associated with highpenetrance germline mutations in DNA-mismatch repair genes: MSH2, MLH1, PMS1, and PMS 35,36 and leads to microsatellite instability. This defect in DNA repair leads to spontaneous genetic mutations to accumulate in colonic mucosa, which predisposes to the dysplasia, and eventually development of invasive cancers. 37 Degree of microsatellite instability can be measured by polymerase chain reaction and is used as a marker of DNA repair capacity. 38 Patients with heritable syndromes develop cancers at young ages. For instance, women with HNPCC tend to develop endometrial cancer 15 years earlier than the general population. 39 Multiple other inherited cancer syndromes have been identified. Multiple endocrine neoplasia (MEN) types 2A and 2B account for approximately 20% of medullary thyroid cancers and are attributed to germline mutations in the ret proto-oncogene on chromosome 10. 40, 41 Fanconi anemia, ataxia telangiectasia, and Bloom syndrome are inherited disorders associated with inherent chromosome instability and are autosomal recessive disorders. They are associated with increased risk for many cancers, including skin cancer and acute leukemia. 42,43

CANCER SUSCEPTIBILITY FROM COMMON GENETIC POLYMORPHISMS OR SNPS
The polygenic model suggests that multiple genetic variations or polymorphisms may confer a small amount of cancer risk individually, yet in combination, may result in modest susceptibility to cancer. 44 GWAS have been used to identify multiple genetic polymorphisms and cancer susceptibility loci and are more powerful than linkage analysis in identifying genes with modest effects. 45 GWAS of high-risk prostate cancer families in the United States and Sweden showed the first prostate cancer susceptibility locus in 1996 on the long arm of chromosome 1 (HPC1) at 1q24-25. 46 Another polymorphism that has been identified in prostate cancer is in 8q24 that is associated with a population attributable risk of approximately 8% in whites and 16% in blacks. This may be contributing to higher incidence of prostate cancer observed in black men compared with men of European ancestry. 47 GWAS also have identified susceptibility loci for multiple other cancers including lung cancer with susceptibility loci identified at 15q25, 5p15, and 6p21. The loci at 15q25 correspond to nicotine acetylcholine receptor unit and may be contributing to lung carcinogenesis in response to smoking. 48,49 Polymorphisms also influence cancer risk by influencing the response to environmental factors like environmental toxins, diet, alcohol, or tobacco (Fig. 2).

POLYMORPHISMS RELATED TO RESPONSE TO METABOLIZING CARCINOGENS
Many exogenous carcinogens require activation by metabolizing enzymes to their activated forms. On the other

CYP Enzymes
Polymorphisms on CYP1A1 gene are associated with increased risk for lung cancer among smokers in Japanese as well as white populations. 52,53 CYP polymorphisms also have been shown to be associated with increased gastric cancer in high-risk individuals in China in a case-cohort analysis. Heterozygous or homozygous carriage of the variant CYP1A1*2A allele was associated with an almost halved risk for gastric cancer, adjusted relative risk (RR) 0.47. 54 Similarly, a Taiwanese study found an almost 3-fold increased risk for stomach cancer among carriers of the c2/c2 genotype for the CYP2E1 gene, an important activator of nitrosamines. 55

NAT2
Cooking of animal proteins at high temperature creates carcinogens such as heterocyclic amines and polycyclic aromatic hydrocarbons (PAHs). 56 Heterocyclic amines are metabolized by a number of enzymes including NAT2 and polymorphisms in the gene for these enzymes divide the population into slow acetylators (about 59% of the white population) and rapid acetylators. 57 NATs also play a role in detoxifying aromatic amines found in tobacco smoke. 58 In a large case-control study and meta-analysis in whites, NAT2 slow acetylators had a 40% increased risk for developing bladder cancer; this association was more pronounced in cigarette smokers. 59 GST Chemical carcinogens like intermediates of PAHs found in tobacco smoke are detoxified by enzymes including GST and polymorphisms in genes coding for these have been associated with a substantially greater risk for lung cancer in nonsmokers from environmental tobacco smoke when compared with individuals who were heterozygous or homozygous carriers of the wild-type GSTM1 allele. 60 The M1 locus in the GST gene is entirely absent in approximately 50% of whites, and this deletion has been associated quite consistently with increased risk for lung cancer in this population and accounts for 17% of lung cancer cases. 61

POLYMORPHISMS RELATED TO RESPONSE TO DIET AND ALCOHOL
Studies have identified potential interactions between dietary intake and genetic variants and some gene-diet associations have been established. For instance, in breast cancer pathogenesis, studies have demonstrated the interactive effect between folate intake and the 1-carbon metabolism-related genes. SNPs related to the folate-metabolizing enzyme methylenetetrahydrofolate reductase (MTHFR) gene modify associations between folate intake and breast cancer. In a nested case-control study, a positive association between dietary folate intake and breast cancer was observed in participants with MTHFR 677TT genotype. 62,63 Alcohol or ethanol is primarily oxidized to acetaldehyde (its carcinogenic form) by enzyme alcohol dehydrogenase (ADH). 64 Although most ethanol metabolism occurs in the liver, ADH is also expressed in the oral cavity and upper aerodigestive tract. 65,66 Several genes including ADH1B and ADH1C are involved in alcohol metabolism and polymorphisms at these gene loci can influence the risk for cancer. For instance, several studies have reported that ADH1B*1 allele, which is the fast metabolizing allele found in majority of Asians, is associated with an increased risk for head and neck cancer. 67

POLYMORPHISMS RELATED TO RESPONSE TO INFECTIONS
The human leukocyte antigen (HLA) genes encode proteins required for the presentation of foreign antigens, including viral peptides, to the immune system for targeted lysis. Because virtually all nasopharyngeal cancers (NPC) contain Epstein-Barr virus (EBV), individuals who inherit HLA alleles with a reduced ability to present EBV antigens may have an increased risk for developing NPC, whereas individuals with HLA alleles that present EBV efficiently may have a lower risk. 68 In southern Chinese and other Asian populations, HLA-A2-B46 and B17 are generally associated with a 2-to 3-fold increase in NPC risk. 69,70 In contrast, lower risk for NPC was found in association with HLA-A11. 70 In Thailand 71 and in China, 72 polymorphisms in the polymeric immunoglobulin receptor (PIGR), a cell surface receptor proposed to mediate EBV entry into the nasal epithelium are associated with 2-to 3-fold increased risk for NPC. These polymorphisms may partly account for higher prevalence of NPC seen in South-East Asia.
In Egypt and the Middle East where Schistosoma hematobium is endemic, squamous cell carcinomas of the bladder predominate compared with the more common transitional-cell carcinoma seen in America and Europe and these differ at the molecular level. 73,74 A study reported high frequency of 9p loss of heterozygosity (LOH) at CDKN2 gene in Schistosoma-associated bladder cancers at 65% compared with 39% in noneSchistosomaassociated bladder cancers. This suggests that CDKN2 gene on 9p may contribute to the development of the majority of schistosomiasis-associated bladder tumors. 74 Other Polymorphisms Related to Cancer Risk Epidermal growth factor (EGF) and transforming growth factor (TGF)-b, through interaction with cell surface receptors, induce growth signals and are important for tumor growth and progression. A case-control study in Japan showed that individuals with the EGFR A/A or A/ G genotype showed a significantly lower risk for gastric cancer than those with the G/G genotype (adjusted odds ratio [OR], 0.56). This polymorphism is associated with reduced production of EGF. 75 Another study in China also showed that EGF promoter polymorphisms were associated with a significantly decreased risk for gastric cancer. 76 Similarly, in a case-control study of 675 gastric cancer cases and 704 healthy controls in a Chinese population, variant alleles of the promoter polymorphisms, TGFB1 C-509T and TGFBR2 G-875A, were associated with a significantly decreased risk for gastric cancer (OR, 0.65 for -509CT/TT and 0.67 for -875GA/ AA). 77 Although many factors like diet and Helicobacter pylori infection have been implicated in increased prevalence of gastric cancer in China and Japan, these genetic variants also may play a role.
Polymorphisms in XRCC1, which is involved in DNA repair activity, are associated with 20% tp 50% lower risk for NPC in Taiwan 78 and southern China. 79 In a large population-based case-control study in breast cancer, a positive association for XRCC1 codon 399 Arg/Gln or Gln/Gln genotypes compared with Arg/Arg was found among blacks (OR 1.7; 95% CI, 1.1-2.4) but not whites. 80

CLINICAL IMPLICATIONS
The role of genetic and molecular factors in cancer susceptibility can be used clinically in cancer prevention strategies by identifying high-risk individuals in screening programs based on genetic susceptibility loci. However, this is an evolving area as risk prediction models based on SNPs to predict cancer risk have shown only modest benefit. Adding 7 SNPs identified from GWA analyses to the original Gail model Breast Cancer Risk Assessment Tool yielded only a modest improvement in area under the curve (AUC) statistic from 0.607 to 0.632. As per an estimate, at least 280 more SNPs are needed to improve the performance of the test. 81 Although the SNP-based risk score had only a moderate discriminatory accuracy, it still improves the predictive ability of the prediction model. Future studies incorporating both genetic polymorphisms and established cancer risk factors in risk prediction models are needed. 82 Molecular and genetic epidemiology have been used in molecular biomarkers to determine exposure to a potential carcinogen to identify at-risk populations or to measure the efficacy of cancer prevention or control programs. Tobacco smoke contains inhaled carcinogens including the PAHs and N-nitrosamines, which are metabolized by the P450 enzyme system and exert carcinogenic effects through the formation of DNA adducts. 83 Lung cancer patients have significantly higher levels of aromatic/PAH-DNA adducts 84,85 even after adjustment for potential confounders. This can be used to identify individuals who would benefit most from early intervention. A study of 400 smokers showed how DNA adducts can be used to monitor efficacy of a smoking-cessation program. Blood samples were drawn from the participants before they began the program and then at multiple time points after smoking cessation. Levels of PAH-DNA and other adducts reflected cessation; within 8 weeks of quitting, their concentrations were significantly reduced. 86 Other carcinogens are also known to form DNA adducts that can be used as a biomarker. AFB1 is a fungal metabolite present in grains and cereals as a result of improper storage and induces DNA damage in humans. 87 A prospective study of 18,244 men in Shanghai (22 incident cases of liver cancer) used assays for aflatoxin exposure including DNA adducts to assess the relation between aflatoxin exposure and liver cancer. Individuals with liver cancer were more likely to have detectable concentrations of these aflatoxin-related DNA adducts (relative risk [RR], 2.4; 95% CI, 1.0-5.9). 88 A subsequent study in Taiwan showed similar results. 89 This can be used as a useful resource in planning liver cancer prevention and screening programs in highprevalence areas like Asia.

Limitations
One major limitation in conducting GWAS is the need for a large sample size as they evaluate a large number of SNPs increasing the chance of type I error due to multiple comparisons being performed simultaneously. For instance, analysis of 5356 invasive breast cancer cases and more than 7000 controls in the National Cancer Institute Breast and Prostate Cancer Cohort 90 found an association between genetic variations at the CYP19A1 locus with a 10% to 20% increase in endogenous estrogen levels, but not with breast cancer risk. This was thought to be due to the lack of statistical power. Relatively few main effects of low-penetrance alleles have been found to be reproducible, 91 and even fewer geneenvironment interactions are confirmed. One method to address this is by pooling of results of several GWAS to attain statistical power. But this is often hindered by a lack of consistency in quantifying cancer risk. Another source of bias in GWAS is population stratification. This occurs when cases and controls are selected from populations with different ethnic backgrounds leading to false-positive results. 92

CONCLUSIONS
Genetic and molecular factors can play an important role in an individual's cancer susceptibility and response to carcinogen exposure. Therefore, this is an important area to identify potential areas for cancer prevention and screening strategies. More research is needed in this area, especially in investigating new biomarkers and measuring gene-environment interactions. Large collaborative studies and consortia to coordinate research efforts and enhance statistical power by increasing sample size are needed.