VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF TECHNOLOGY
Nguyen Trung Thong
MEDSOFT, DECIPHERING PRINCIPLES OF
TRANSCRIPTION REGULATION IN
EUKARYOTIC GENOMES
MASTER THESIS
Hanoi - 2008
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF TECHNOLOGY
Nguyen Trung Thong
MEDSOFT, DECIPHERING PRINCIPLES OF
TRANSCRIPTION REGULATION IN
EUKARYOTIC GENOMES
Major: Information Technology
Speciality: Computer science
Code: 1.01.10
MASTER THESIS
Advisor: Assoc. Prof. Hoang Xuan Huan
Hanoi - 2008
4
Contents
Abstract ................................................................................................................. 1
Declaration ............................................................................................................ 2
Acknowledgment .................................................................................................. 3
List of Figures ....................................................................................................... 5
Glossary and abbreviations ................................................................................... 6
Chapter 1 Introduction .......................................................................................... 7
1.1 Motivation .................................................................................................... 7
1.2 Thesis works and structure .......................................................................... 9
Chapter 2 Transcription regulation in eukaryotic genomes ................................ 10
2.1 Introduction ................................................................................................ 10
2.1.1 Gene activation .................................................................................... 10
2.1.2 Gene deactivation ................................................................................ 12
2.2 Core promoter and basal transcription machinery ..................................... 13
2.2.1 Structure of core promoter .................................................................. 14
2.2.2 Basal transcription machinery ............................................................. 16
2.3 Regulatory sequences ................................................................................ 17
2.3.1 Enhancers and regulatory promoters ................................................... 18
2.3.2 Activators ............................................................................................ 18
2.3.3 Repressors and corepressors................................................................ 20
Chapter 3 Methods to derive principles of transcription regulation ................... 21
3.1 Principles of transcription regulation ........................................................ 21
3.2 Typical methods to derive principles of transcription regulation .............. 22
3.2.1 Bayesian network based method ......................................................... 22
3.2.2 Motif Expression Decomposition method .......................................... 24
3.2.3 A comparison between two methods .................................................. 26
Chapter 4 An application of MED method ........................................................ 30
4.1 MEDSoft workflow ................................................................................... 31
4.2 Properties of MEDSoft .............................................................................. 34
4.3 Experimental results .................................................................................. 34
Chapter 5 Conclusions and Future work ............................................................ 40
Bibliography ........................................................................................................ 41
Appendix ............................................................................................................. 46
5
List of Figures
Figure 1.1 Central dogma ...................................................................................... 7
Figure 2.1 Gene activation model ....................................................................... 11
Figure 2.2 Sequence elements of core promoter ................................................. 14
Figure 3.1 Gene regulatory network ................................................................... 22
Figure 3.2 Sequence elements that determine the regulation of a set of genes
involved in transcription...................................................................................... 23
Figure 3.3 The Motif-Expression Decomposition Formalism (MED) ............... 25
Figure 3.4 An illustration of the concept of the gene ensemble ......................... 26
Figure 3.5 Verification model of regulatory principles ...................................... 27
Figure 3.6 The distribution of correlation coefficients ....................................... 28
Figure 3.7 RRPE and PAC relationship case study ............................................ 29
Figure 4.1 MEDSoft layout ................................................................................. 30
Figure 4.2 MEDSoft workflow ........................................................................... 31
Figure 4.3 Genes and motifs query ..................................................................... 32
Figure 4.4 Single motif analyzing ....................................................................... 33
Figure 4.5 Pair of motifs analyzing ..................................................................... 33
Figure 4.1 Transcriptional regulatory principle of RPN4 motif (short-range) ... 36
Figure 4.2 Transcriptional regulatory principle of MERE4 motif (short-range) 36
Figure 4.3 Transcriptional regulatory principle of GCR1 motif (middle-range) 36
Figure 4.4 Transcriptional regulatory principle of HAP234 motif (middle-range)
............................................................................................................................. 37
Figure 4.5 Transcriptional regulatory principle of BAS1 motif (long-range) .... 37
Figure 4.6 Transcriptional regulatory principle of GAL motif (long-range)...... 37
Figure 4.7 Transcriptional regulatory principle of PROTEOL1 motif
(orientation-dependence)..................................................................................... 38
Figure 4.8 Transcriptional regulatory principle of STRE’ motif (orientationdependence) ......................................................................................................... 38
Figure 4.9 Transcriptional regulatory principle of MIG1 motif (super-longrange) ................................................................................................................... 38
Figure 4.10 Transcriptional regulatory principle of MERE17 motif (super-longrange) ................................................................................................................... 39
Figure 4.11 Transcriptional regulatory principle of RPE11 motif (spread-out) . 39
6
Glossary and abbreviations
Activator: protein product of a regulatory gene that induces expression of
a target gene(s) usually by binding to the activation sequence of that gene
or by interaction with transcription factors.
Basal transcription: transcription in in vitro systems consisting of RNA
polymerase, the basal transcription factors and naked DNA template; also
used to describe in vivo transcription observed in the absence of known
activators.
Chromatin: the packaged eukaryotic chromosome in which the DNA is
highly organized into chromatosomes; see higher-order structure.
Footprinting: technique to identify position of DNA sequences bound by
particular proteins.
Higher-order structure: nucleosomal organization of the chromatin;
DNA, organized into nucleosomes joined by linker DNA and associated
histone H1, i.e., chromatosome, is further condensed into a fibre of
diameter 30 nm, which itself is folded in some manner.
Histones: small group of highly conserved basic proteins that bind DNA
and form core of a nucleosome.
K-mer: denotes K units in a molecule; e.g., 12-mer oligonucleotide
indicates a molecule with 12 nucleotides.
MED: Motif Expression Decomposition
Pre-initiation complex: complex of general transcription factors, i.e.,
TFIID, TFIIB, TFIIF and TFIIE, and RNA polymerase II assembled at the
promoter sufficient for basal transcription; the complex can support a low
level of transcription without activators.
TBP: TATA-box binding protein.
TF: Transcription Factor, a protein that participates in gene transcription,
often by binding to a specific DNA sequence, e.g., TFIID.
TFII: transcription factor for Pol II.
7
Chapter 1 Introduction
1.1 Motivation
Transcription is known as the first step (from DNA to RNA) in the universal
pipeline of the biological information flow from genome to proteome (Figure
1.1). As a result, transcriptional regulation plays a vital role for the complexity,
variety, and development of all organisms [7, 30]. Transcription can be
regulated at various levels, but there is one level found by Jacob and Monod
[22] that has been attracting many attentions. The level indicates that the output
of transcription on a given gene is controlled by the set of motifs present in the
gene’s promoter region (also known as binding sites), and associated
transcription factors (TFs) present in the cell. Interestingly, TFs are proteins that
bind to specific parts of DNA, and proteins are products of gene. Transcription
of a gene is therefore basically regulated by the set of motifs which belong to the
promoter of the gene.
Figure 1.1 Central dogma
8
Recently, various methods developed for finding motifs and TFs have used yeast
Saccharomyces cerevisiae as the model organism due to the availability of
multiple yeast genomes and high-quality mRNA [see 4, 5, 44, 53]. Nevertheless,
studying the effects of motifs on gene expression as a function of promoter
context still remains poorly investigated. Methods of Pilpel et al [42] and
Sudarsanam et al [52] studied the impacts of motif co-occurrence in the set of
genes holding motif combinations of interest. Even though their study could
obtain the combinatorial impacts of motif-motif interactions on gene expression,
it did not answer how such impacts are governed by other factors such as
geometric features of promoter context. A recent interesting method of Beer and
Tavazoie [1] did take geometric features into consideration by using a Bayesian
network of yeast expression profiles to discover the effect of motif position and
orientation on gene expression. More specifically, their method used a
probabilistic model: after finding sets of co-expressed genes, they identify the
DNA sequence (motif) features being responsible for regulation. They applied a
clustering method for a set of microarray expression data to find different sets of
genes that are coexpressed across a set of conditions. Each of these sets of genes
describes an expression pattern across experimental conditions. Then, they
found out a large set of putative motifs that are overrepresented in each
expression pattern [28]. A Bayesian network is then used to derive the mapping
between these motifs and the expression patterns. The network uses each motif
and its related features such as position and orientation as input variables, to
measure the probability of having a particular expression pattern. Obviously,
their method thus has a drawback that it does not consider the individual
expression patterns of each single gene, but analyze the expression profiles of
gene clusters, a process that might cause loss of information. Moreover, even
though the metrics to measure the degree of gene expression using expression
coherence [42, 52] or average pairwise correlation [1] taken up in their works
can discover the impacts of motif on gene expression quite well, these metrics
cannot provide a quantitative measure of motif influence on gene expression.
Compared to the previous methods, the method of Nguyen and D’haeseleer –
Motif Expression Decomposition (MED) has more advanced features: (a) It
operates on all genes, at the single gene level; (b) There are no assumptions
about gene cluster/module memberships, and no manual tunings of parameters;
9
(c) It bases on a deterministic mathematical strategy that is biologically intuitive,
and simple.
1.2 Thesis works and structure
In this thesis, we will focus on studying of the MED method to develop an
efficient, flexible, reliable and user-friendly software, MEDSoft, to provide
worldwide biology community a way of analyzing and studying the motifs data.
MEDSoft is a website based on Microsoft ASP.NET technology. Therefore,
MEDSoft not only inherits advanced properties of MED method, but it also
more powerful by new implementation.
Except the introduction and conclusion, the thesis is organized into 3 chapters.
The second chapter sketches the basic concepts of transcriptional regulation in
Eukaryotes. The third chapter will illustrate the typical methods to obtaining
principles of transcriptional regulation. The fourth chapter describes the main
outcome of the thesis, MEDSoft. It will show the workflow and features of
MEDSoft, as well as discuss some interesting results derived from MEDSoft.
10
Chapter 2 Transcription regulation in eukaryotic genomes
2.1 Introduction
Transcription regulation is known as an extremely complicated problem in
molecular biology. It has been investigated by the vastly majority of scientific
researchers on the globe. However, many things inside it are still remaining a
mystery.
One of the main aims of the gene expression problem is to study how a living
organism regulates transcription of approximately thousands of genes in the
proper spatial and temporal patterns. Knowledge of how transcription factors
function throughout gene expression can be applied to fundamental issues in the
fields of biology and medicine. To decipher these mechanisms, we need to
understand a large number of processes influencing transcription and develop
technical and strategic approaches for tackling them. This chapter sketches an
introduction to basic aspects of transcriptional regulation.
In eukaryotic genomes, DNA sequences are assembled into chromatin to keep
genes in an inactive state by restricting access to RNA polymerase and its
accessory factors. Chromatin is composed of histones, which form a structure
called a nucleosome. Nucleosomes themselves are assembled into higher-order
structures with different properties depending on the regulatory context.
Throughout the development, genes are turned on and off in a pre-programmed
manner controlled by TFs, which bind to specific DNA sites near genes they
control. However, a particular TF is not committed to each regulatory event.
Instead, a mechanism called combinatorial control is applied, in which different
combinations of regulatory proteins are used to turn genes on (activate) and off
(deactivate) in different regulatory contexts [3].
2.1.1 Gene activation
In a typical gene, the core promoter in the form of a DNA sequence is located
immediately nearby and upstream of the gene. The core promoter binds RNA
11
polymerase II (Pol II) and its accessory factors (basal transcription machinery)
and guides the Pol II to begin transcribing at the proper start site. In vivo, in the
absence of regulatory proteins, the core promoter is normally inactive and fails
to interact with the basal machinery. Immediately upstream of the core promoter
is a regulatory promoter, and further away either upstream or downstream are
enhancer sequences (as shown in Figure 2.1 A). Regulatory promoters and
enhancers are termed activators responsible for activating transcription of gene.
When the interactions between the activator and the basal machinery happen, the
gene activation commonly occurs. Some activators are ubiquitously expressed,
whilst others are restricted to certain cell types, regulating genes necessary for a
particular function of cell.
Figure 2.1 Gene activation model
(A) Model of typical gene and components involved in gene activation and
inactivation. (B) Activation of a gene and assembly of the Pol II pre-initiation
complex. ([copyright Cell Press])
12
When activating a gene, the chromatin enclosing that gene and its control
regions must be remodeled to allow transcription. Higher-order chromatin
structures comprising networks of attached nucleosomes must be decondensed,
nucleosomes over gene-specific enhancers and promoters must be made
reachable to cell-specific activators, and, eventually, nucleosomes inside the
gene itself must be remodeled to allow passage of the transcribing RNA
polymerases (see Figure 2.1 B). There are various types of enzymes concerned
chromatin remodeling and these are guided by a set of activators. These
enzymes are divided into two broad categories: ATP-dependent remodeling
enzymes and histone acetylases. Generally speaking, when these enzymes bind
to a gene, they will remodel the chromatin so that activators and the basal
machinery can bind. The mechanisms of remodeling contain changes in the
structure of chromatin and in modification of histones that in some way raise
accessibility to TFs. Transcription of a gene can be motivated once enhancers
are reachable. Nevertheless, enhancers could accidentally activate other
neighbour genes without suitable regulation due to the fact that enhancers are
able to activate transcription when they are located far from a gene.
It is also known that once the enhancer and promoter are reachable they bind to
combinations of activators. Binding of activators is commonly cooperative, in
which one protein only binds weakly, but multiple activators engage in protein–
protein interactions that enhance each of their affinities for the regulatory region.
The nucleoprotein structures including these combinatorial arrays of activators
are termed as enhanceosomes (as shown in Figure 2.1 B). The enhanceosome
interacts with the basal transcription machinery and recruits it to a core promoter
to construct the ―pre-initiation complex‖. The trio: enhanceosome, basal
machinery, and core promoter forms a network of protein–protein and protein–
DNA interactions that control the rate of transcription initiation. The interactions
between the enhanceosome and components of the basal machinery are rarely
direct but are linked by proteins called coactivators.
2.1.2 Gene deactivation
It is interesting to note that in many situations, genes are activated fleetingly and
then later turned off. In these cases, the sequence of events would comprise
inactivation of the pre-initiation complex and constructing of a repressive
13
chromatin environment over the gene and its regulatory regions. That
constructing concerns two enzymes ATP-dependent remodeling and histone
deacetylases. The methods to activate a gene often vary, but mostly they involve
the binding of sequence-specific repressors to silencer elements. Genes are often
methylated to maintain the inactive state. Methylation also leads to recruitment
of histone deacetylases.
Besides this introduction, we will describe the following sections. In section 2.2,
we summarize the fundamental mechanics of the transcription, including an
overview of core promoter structure and the composition of the basal machinery.
The basal machinery consists of TFs and Pol II that are vital for the catalytic
process of transcription. Also, the machinery contains coactivators and
corepressors permitting activators and repressors to communicate with the TFs
and chromatin. In section 2.3, we discuss regulatory DNA sequences, including
enhancers and silencers, and regulatory proteins, including activators and
repressors.
2.2 Core promoter and basal transcription machinery
The core promoter is known as the ―heart‖ transcription regulation and generally
includes DNA sequence elements that can extend approximately 35 bp upstream
and/or downstream of the transcription start site. Most core promoter elements
interact directly with components of the basal transcription machinery. The
basal machinery is composed of factors, including Poll II itself, that are vital for
transcription in vitro from an isolated core promoter. Many studies of the basal
machinery have been performed with promoters containing a TATA box as a
crucial core element. A pre-initiation complex can form in vitro on TATAdependent core promoters by association of the basal factors in this order:
TFIID/TFIIA, TFIIB, Poll II/TFIIF, TFIIE, and then TFIIH. The features of the
basal factors and the mechanisms by which they stimulate transcription start
from TATA-dependent promoters have been the topic of recent works [see 8,
15, 26, 29, 36, 39, 43, and 57]. The mechanisms by which sequence-specific
transcription factors and coregulators influence the frequency of transcription
initiation have also been discussed [6, 37, 43].
14
2.2.1 Structure of core promoter
Figure 2.2 clearly sketches some of the sequence elements that can contribute to
basal transcription from a typical core promoter. Each of these sequence motifs
is found in only a subset of core promoters. The TATA motif can function
without TFIIB recognition element (BRE), initiator element (Inr), and
downstream core promoter element (DPE) motifs. By contrast, the DPE motif
demands for the presence of an Inr. The BRE is located immediately upstream
of a subset of TATA box motifs. The DPE consensus was determined with
Drosophila core promoters. The Inr consensus is shown for both mammals and
Drosophila.
Figure 2.2 Sequence elements of core promoter
TATA motif
This element, with the consensus TATAAA, was discovered by David Hogness
and is used to be called the Hogness box. It is positioned 25–30 bp (base-pair)
upstream of the transcription start site. The TATA box is able to independently
guide basal transcription by Pol II on naked DNA templates in vitro. The box is
sufficient for leading activated transcription when an activator protein binds to a
next-door regulatory element. In Saccharomyces cerevisiae, TATA boxes were
also found to be essential for transcription initiation; but in this organism, the
element was located 40–120 bp from the start site [51].
15
Initiator element (Inr)
The initiator element (Inr) is a discrete core promoter element that is
functionally similar to the TATA box and can perform independently of a
TATA box in an analysis of the lymphocyte-specific terminal transferase (TdT)
promoter [46, 47]. Transcription from this promoter commences at a single start
site, yet the region between ~ 25 and ~30 is G/C-rich and is unimportant for
promoter activity. An extensive mutant analysis showed that the sequence
between ~3 and ~5 is necessary and sufficient for accurate transcription in vitro
and in vivo [23, 46]. By itself, the TdT Inr supports a very low level of specific
initiation by Poll II. In nuclear extracts, its activity is comparable to that of an
isolated TATA box without an Inr at the start site [46, 48]. Interestingly, when
an Inr is inserted into a synthetic promoter downstream of six binding sites for
transcription factor Sp1 (without an TATA box), the Inr supports high levels of
transcription that commence at a specific start site within the Inr. When the Inr
is inserted at a different location relative to Sp1 sites, RNA synthesis
consistently begins at the nucleotide dictated by the Inr. In the absence of the
Inr, transcription embarks on heterogeneous start sites at much lower
frequencies. Activity of Inr relies on a loose consensus of approximately
PyPyA+1NT/APyPy.
Downstream core promoter element (DPE)
DPE is a seven-nucleotide sequence originally discovered in Drosophila. DPE
bears the consensus sequence RGA/TCGTG and is centered in the region of 30
bp downstream of the Inr site. In Drosophila, DPE has been studied in the
greatest detail, DPE is found in TATA-less promoters and acts in conjunction
with the Inr element to direct specific initiation of transcription.
TFIIB recognition element (BRE)
BRE was discovered by Ebright et al [27], who identified the potential for DNA
binding by TFIIB dependent upon the position of TFIIB relative to the major
groove in the crystal structure of the TBP–TFIIB–TATA (TBP stands for
TATA-box binding protein) ternary complex. Recent binding-site-selection
experiments exposed that TFIIB bound specifically to a sequence with the
16
consensus G/C G/C G/AGGCC located from –32 to –38, just upstream of the TATA
box. The BRE is discovered in the majority of eukaryotic promoters.
Interestingly, however, it is missing in yeast and plants, which suggests that the
BRE may not contribute to gene regulation in these organisms.
2.2.2 Basal transcription machinery
Eukaryotic gene regulation causes to be concerned with a complicated interplay
within activators, repressors, the basal transcription machinery, and chromatin.
The basal transcription machinery consists of Pol II, and the TFs TFIIA, TFIIB,
TFIID, TFIIE, TFIIF, and TFIIH [39], and a complex of coactivators termed the
mediator. Pol II is a large enzyme. Interesting feature of Pol II is the heptapeptide (7-peptide) repeat constituting the carboxyl (-COOH) terminus of the
largest subunit. This carboxyl-terminal domain is involved in transcription
regulation. Recent biochemical studies indicate that the TFs support basal
transcription and perform lots of the catalytic functions required for initiation.
Coactivators with the mediator are thought to link the activators and TFs.
Analysis of genome-wide transcription using DNA microarray technology
implies that coactivators are only required for transcription from subsets of
genes [20]. Coactivators and TFs are both part of a complex called the
holoenzyme. We consider coactivators that dwell in the holoenzyme as
components of the basal machinery.
The subunits including all TFs have been cloned, and a basic knowledge of their
function and mechanism has appeared from studies in almost eukaryotic
organisms. Researches in yeast have provided perceptive data on basal factor
mechanism and have been essential for evaluating the validity and ramifications
of biochemical and functional studies performed in mammalian systems [15].
Here we argue two problems: (i) how the basal factors themselves assemble into
transcription complexes; (ii) their association with coactivators and mediators in
the form of the holoenzyme.
2.2.2.1 Basal transcription complex assembly
Purified TFs and Pol II mediate basal transcription on the core promoter in vitro
but are unable to support activated transcription without coactivators. In the first
studies of transcription complex assembly, TBP was used instead of TFIID
because TBP was small; it is able to sustain basal transcription in the presence of
17
the other TFs. Moreover, TFIID was not sufficiently purified at that time to
analyze basal complexes containing it. Earlier studies designated that purified
TFs and TBP assembled into a transcription pre-initiation complex on the DNA
in a stepwise framework. The complex is nucleated by the binding of TBP to the
TATA box assisted by TFIIA or TFIIB, which can bind in any order [39]. The
crystal structures of TBP and TBP–TFIIB and TBP–TFIIA complexes with
DNA have been solved, exposing insights into the process of promoter
recognition. Both TFIIA and TFIIB contacting with DNA and TBP can expand
the stability of TBP binding. When the binding of TFIIB to TBP is completed, a
complex of TFIIF in association with Pol II is recruited, followed by sequential
binding of TFIIE and TFIIH.
2.2.2.2 Holoenzyme and mediators
Conceptually, basal factors were assembled into transcription complexes in a
stepwise fashion were attractive from the view point that different steps could,
in regard to the truth, be regulated by activators and repressors. That mechanism
would support to explicate the diversity in gene expression patterns.
Nevertheless, the significance of the finding depends on TFs being differentially
limiting at promoters. Differential binding of TFs has not yet appeared as a
major regulatory theme, even though there are cases where TFs have different
affinities for core promoters (e.g., TBP binding to consensus and nonconsensus
TATAs; TFIIB binding to a consensus vs. a degenerate BRE; TFIID binding to
an Inr-containing vs. Inr-less promoter). Instead, studies have concentrated on
recruitment of a single large TF-containing complex termed the holoenzyme.
Contrary to the complexity of the stepwise pathway, the holoenzyme provides a
single target through which activators bound to an enhancer or promoter can
recruit the basal machinery in a concerted fashion [35].
2.3 Regulatory sequences
Transcription regulation is governed by the binding of sequence-specific DNAbinding proteins to regulatory promoters and enhancers. In this section, we
illustrate the features of activators/repressors and enhancers/silencers.
18
2.3.1 Enhancers and regulatory promoters
Conceptually, the regulatory promoter is the region nearby the core promoter
and within a few hundred base pairs of the transcription start site, and enhancer
is a control region found at a greater distance from the transcription start site,
either upstream or downstream of the gene or within an intron. Because
regulatory elements in an enhancer can also function in the context of a
promoter, so that the distinctions between promoters and enhancers have even
become unclearly. In contrast, promoter elements could affect enhancer activity
if multimers of the element are inserted at far position. Current compilation of
promoters, where the transcription start sites have been mapped, is accessible in
the eukaryotic promoter database (http://www.epd.isb-sib.ch/).
It is thought that enhancers bind activators and other sequence-specific proteins
concerned chromatin remodeling. Once bound, these activators loop out the
mediating DNA to interact with proteins bound to the regulatory and core
promoters (i.e., other activators and the basal machinery). These interactions are
known to make transcription complex assembly stable. The looping model is
crucial for the two following reasons. First, the energetics of DNA looping has
been studied widely in model systems by ligation of large DNA molecules and
by cooperation with far proteins [56]. Second, in a looping model, chromatin
could play a positive architectural role by condensing the DNA that is between
enhancer and promoter, aiding long-range interactions.
2.3.2 Activators
Activators are modular proteins with various domains for DNA binding and
activating the transcriptional process [25, 55]. It is widely believed that the
DNA-binding domain targets the activator to a specific site, maybe connected to
cooperativity domains that allow combinatorial interactions with other
activators. The activation domain, on the other side, interacts with the basal
machinery to recruit it to the promoter. In some situations, these domains
comprise a piece of the same polypeptide (i.e., the yeast GAL4 and GCN4
proteins), whilst, in others, the domains are located on isolated subunits of a
multiprotein complex. This multisubunit organization gives more chances for
combinatorial control and regulatory diversity. More specifically, here we
discuss two important domains.
19
DNA-binding Domains
In accord with the sequence and structure of DNA-binding domains, regulatory
proteins are often grouped into class. The goal function of the DNA-binding
domain rules out the site of activator action and the contribution of an activator
to gene regulation. As a result, to study how activators bind specific sites and
distinguish between related sites has been becoming a key focus of interest in
the gene expression area. Many classes of DNA-binding domains have been
described in eukaryotes [41]. Some DNA-binding proteins do not fit into any of
the defined class, whilst in others these classes have been further subdivided.
Certainly, members of some protein classes bind to similar DNA sequences.
Nevertheless, in other class, there is slight similarity between recognition sites
for the different class members, because the key recognition amino acids are
vastly variable among class members.
Activation Domains
The phrase ―activation domain‖ refers loosely to a broad variety of protein
domains that interact either with components of the basal transcription
machinery or with coactivators. It is widely defined that activation domain is a
region of protein that stimulates transcription when ascribed to a heterologous
DNA-binding domain. However, there are situations that residues necessary for
activation are interlocked with the DNA-binding domain, even though the
majority of activators are modular in structure.
In addition, most of activators contain multiple activation domains. For instance,
GAL4 contains one domain on the amino terminus nearby to the DNA binding
domain and another on the carboxyl terminus [12]. Organization in a domain is
substantially flexible. For example, deleting analysis of GCN4 shows a
functional redundancy, in which remove of one or the other segment elicits a
negligible effect on activation; remove the entire domain is need for abrogate
activity [21]. Other works [9, 19] indicate that activation domains within a
regulatory protein function additively or synergistically on activation potential.
20
2.3.3 Repressors and corepressors
It is commonly known that repressors and corepressors play a key role in
regulating gene expression. However, up until now, repression mechanisms are
poorer understood than activation mechanisms. Generally, transcriptional
repression can be divided into three big groups. In the first group, repression
could be happened by inactivation of an activator accomplished by some distinct
mechanisms: (i) posttranslational modification of the activator [34], (ii)
dimerization of the activator with a nonfunctional partner [2], (iii) competing for
binding site of the activator, or an interaction between repressor and activator
that outcomes in covering of the activator’s function [33]. In the second group,
repression could be mediated by proteins associating strongly with TFs and thus
inhibit the creation of a pre-initiation complex. In the final group, repression is
mediated by a specific DNA element and DNA-binding protein, which function
dominantly to repress both activated and basal transcription of a gene. Some
studies prove that in these situations interactions with the basal machinery [17]
or chromatin can cause gene inactivation.
21
Chapter 3 Methods to derive principles of transcription regulation
3.1 Principles of transcription regulation
As mentioned in the previous chapter, transcription regulation in eukaryotes is
an intricate field. In general, regulatory sequence in that system is mainly
composed of two components: activators/repressors and enhancers/silencers.
The former is made up of proteins binding to DNA sequences and the latter is
formed of DNA sequences. Transcription regulation is therefore commonly
controlled by two associated components: DNA sequences and their surrounding
binding proteins sequences. Moreover, as illustrated in Figure 1.1, because
proteins are the product of gene (DNA sequences) through two processes in
central dogma (transcription and translation), transcription is accordingly
fundamentally controlled by DNA sequences so called motifs.
For the scope of this chapter, we only consider a simpler model of transcription
regulation (see Figure 3.1). As can be seen from Figure 3.1, transcription will be
activated in the presence of input signals (e.g. signal A and signal B). Then
receptor proteins are responsible for receiving the signals. Throughout some
complex processes, the transcription factor (TFs) binding to cis-regulatory DNA
sequence elements (CREs) will become active to stimulate the transcription. In
this model, transcription output (egc) of a given gene is governed by two
components: CREs (motifs) (e.g. Mgi, Mgj) occurring in the promoter region of
such gene, and transcription factors (TFs, e.g. Aic, Ajc) presenting in the cellular
environment. Because TFs are gene products, their productions in principle are
controlled by motifs. Accordingly, transcription of a given gene is primarily
regulated by the motifs present in such gene’s promoter, operating as the gene’s
condition-independent signal receivers, and the set of functions describing the
dependency of motif binding strength—the quantitative level of motif’s
influence on gene expression–on promoter context forms the set of principles of
transcription regulation. For these explanations, it is clear to define that
principles of transcriptional regulation are a set of condition-independent rules
that cis-regulatory elements, or motifs, obey in order to regulate expression of
gene they control. Such rules can be a function of promoter context such as the
- Xem thêm -