Professional Documents
Culture Documents
GTAAGW
the observed patterns, only the genomic sequences of dened
WWW; 3 YAG
(
E
)
-
-
B
i
s
a
b
o
l
e
n
e
A
.
g
r
a
n
d
i
s
a
g
1
A
g
f
E
B
I
S
A
g
f
E
b
i
s
A
F
0
0
6
1
9
5
A
F
3
2
6
5
1
5
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
8
a
)
T
r
a
p
p
a
n
d
C
r
o
t
e
a
u
t
c
)
-
C
a
m
p
h
e
n
e
A
.
g
r
a
n
d
i
s
a
g
6
A
g
g
-
C
A
M
A
g
g
-
c
a
m
U
8
7
9
1
0
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
9
)
-
H
u
m
u
l
e
n
e
A
.
g
r
a
n
d
i
s
a
g
5
A
g
f
H
U
M
A
g
f
h
u
m
U
9
2
2
6
7
S
t
e
e
l
e
e
t
a
l
.
(
1
9
9
8
)
)
-
L
i
m
o
n
e
n
e
A
.
g
r
a
n
d
i
s
a
g
1
0
A
g
g
-
L
I
M
1
A
g
g
-
l
i
m
A
F
0
0
6
1
9
3
A
F
3
2
6
5
1
8
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
7
)
T
r
a
p
p
a
n
d
C
r
o
t
e
a
u
t
c
M
y
r
c
e
n
e
A
.
g
r
a
n
d
i
s
a
g
2
A
g
g
M
Y
R
A
g
g
m
y
r
U
8
7
9
0
8
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
7
)
)
-
-
P
i
n
e
n
e
A
.
g
r
a
n
d
i
s
a
g
3
A
g
g
-
P
I
N
1
A
g
g
-
p
i
n
1
U
8
7
9
0
9
A
F
3
2
6
5
1
7
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
7
)
T
r
a
p
p
a
n
d
C
r
o
t
e
a
u
t
c
)
-
-
P
i
n
e
n
e
/
(
)
-
l
i
m
o
n
e
n
e
A
.
g
r
a
n
d
i
s
a
g
1
1
A
g
g
-
P
I
N
2
A
g
g
-
p
i
n
2
A
F
1
3
9
2
0
7
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
9
)
)
-
-
P
h
e
l
l
a
n
d
r
e
n
e
A
.
g
r
a
n
d
i
s
a
g
8
A
g
g
-
P
H
E
A
g
g
-
p
h
e
A
F
1
3
9
2
0
5
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
9
)
-
S
e
l
i
n
e
n
e
A
.
g
r
a
n
d
i
s
a
g
4
A
g
f
S
E
L
1
A
g
f
s
e
l
1
U
9
2
2
6
6
A
F
3
2
6
5
1
3
S
t
e
e
l
e
e
t
a
l
.
(
1
9
9
8
)
T
r
a
p
p
a
n
d
C
r
o
t
e
a
u
t
c
A
g
f
S
E
L
2
A
g
f
s
e
l
2
A
F
3
2
6
5
1
4
T
a
x
a
d
i
e
n
e
T
.
b
r
e
v
i
f
o
l
i
a
T
b
1
T
b
g
g
T
A
X
T
b
g
g
t
a
x
U
4
8
7
9
6
A
F
3
2
6
5
1
9
W
i
l
d
u
n
g
a
n
d
C
r
o
t
e
a
u
T
r
a
p
p
a
n
d
C
r
o
t
e
a
u
t
c
(
1
9
9
6
)
T
e
r
p
i
n
o
l
e
n
e
A
.
g
r
a
n
d
i
s
a
g
9
A
g
g
T
E
O
A
g
g
t
e
o
A
F
1
3
9
2
0
6
B
o
h
l
m
a
n
n
e
t
a
l
.
(
1
9
9
9
)
5
-
e
p
i
-
A
r
i
s
t
o
l
o
c
h
e
n
e
N
i
c
o
t
i
a
n
a
t
a
b
a
c
u
m
T
E
A
S
3
N
t
f
e
A
R
I
3
N
t
f
e
a
r
i
3
L
0
4
6
8
0
L
0
4
6
8
0
F
a
c
c
h
i
n
i
a
n
d
C
h
a
p
p
e
l
l
F
a
c
c
h
i
n
i
a
n
d
(
1
9
9
2
)
C
h
a
p
p
e
l
l
(
1
9
9
2
)
T
E
A
S
4
N
t
f
e
A
R
I
4
N
t
f
e
a
r
i
4
L
0
4
6
8
0
L
0
4
6
8
0
5
-
e
p
i
-
A
r
i
s
t
o
l
o
c
h
e
n
e
p
A
.
t
h
a
l
i
a
n
a
A
t
e
A
R
I
A
t
e
a
r
i
A
L
0
2
2
2
2
4
B
e
v
a
n
e
t
a
l
.
d
s
C
h
r
o
m
o
s
o
m
e
4
B
A
C
F
1
C
1
2
(
E
S
S
A
)
n
t
4
4
0
5
4
3
8
8
2
0
-
C
a
d
i
n
e
n
e
G
.
a
r
b
o
r
e
u
m
C
A
D
1
-
A
G
a
f
C
A
D
1
A
G
a
f
c
a
d
1
a
X
9
6
4
2
9
Y
1
8
4
8
4
C
h
e
n
e
t
a
l
.
(
1
9
9
6
)
L
i
a
n
g
e
t
a
l
.
d
s
-
C
a
d
i
n
e
n
e
G
.
h
i
r
s
u
t
u
m
C
A
D
1
-
A
G
h
f
C
A
D
1
G
h
f
c
a
d
1
U
8
8
3
1
8
D
a
v
i
s
e
t
a
l
.
(
1
9
9
8
)
-
C
a
d
i
n
e
n
e
G
.
a
r
b
o
r
e
u
m
g
C
A
D
1
-
B
G
a
f
C
A
D
1
B
G
a
f
c
a
d
1
b
X
9
5
3
2
3
C
h
e
n
e
t
a
l
.
d
s
C
a
d
i
n
e
n
e
p
A
.
t
h
a
l
i
a
n
a
A
t
C
A
D
A
t
c
a
d
A
L
0
2
2
2
2
4
B
e
v
a
n
e
t
a
l
.
d
s
C
h
r
o
m
o
s
o
m
e
4
B
A
C
F
1
C
1
2
(
E
S
S
A
)
n
t
4
4
0
5
4
3
8
8
2
0
C
a
s
b
e
n
e
R
i
c
i
n
u
s
c
o
m
m
u
n
i
s
c
a
s
R
c
g
g
C
A
S
R
c
g
g
c
a
s
L
3
2
1
3
4
N
A
M
a
u
a
n
d
W
e
s
t
(
1
9
9
4
)
W
e
s
t
p
c
)
-
C
o
p
a
l
y
l
A
.
t
h
a
l
i
a
n
a
G
A
1
A
t
g
g
-
C
O
P
P
1
A
t
g
g
-
U
1
1
0
3
4
N
A
S
u
n
a
n
d
K
a
m
i
y
a
(
1
9
9
4
)
S
u
n
e
t
a
l
.
(
1
9
9
2
)
C
h
r
o
m
o
s
o
m
e
4
(
T
o
p
)
B
A
C
d
i
p
h
o
s
p
h
a
t
e
a
c
o
p
p
1
A
C
0
0
4
0
4
4
p
B
a
s
t
i
d
e
e
t
a
l
.
d
s
,
c
T
5
J
8
n
t
3
4
9
7
1
4
1
8
5
6
e
n
t
-
K
a
u
r
e
n
e
a
A
.
t
h
a
l
i
a
n
a
G
A
2
A
t
g
g
-
K
A
U
A
t
g
g
-
k
a
u
A
F
0
3
4
7
7
4
A
C
0
0
7
2
0
2
Y
a
m
a
g
u
c
h
i
e
t
a
l
.
(
1
9
9
8
)
V
y
s
o
t
s
k
a
i
a
e
t
a
l
.
d
s
,
c
C
h
r
o
m
o
s
o
m
e
1
B
A
C
T
8
K
1
4
n
t
4
3
5
5
2
4
7
4
2
0
(
)
-
L
i
m
o
n
e
n
e
P
e
r
i
l
l
a
f
r
u
t
e
s
c
e
n
s
P
F
L
C
1
P
f
g
-
L
I
M
1
P
f
g
-
l
i
m
1
D
4
9
3
6
8
A
B
0
0
5
7
4
4
Y
u
b
a
e
t
a
l
.
(
1
9
9
6
)
T
s
u
b
o
u
c
h
i
d
s
)
-
L
i
m
o
n
e
n
e
M
e
n
t
h
a
s
p
i
c
a
t
a
L
M
S
M
s
g
-
L
I
M
M
s
g
-
l
i
m
L
1
3
4
5
9
C
o
l
b
y
e
t
a
l
.
(
1
9
9
3
)
)
-
L
i
m
o
n
e
n
e
M
.
l
o
n
g
i
f
o
l
i
a
L
M
S
M
l
g
-
L
I
M
M
l
g
-
l
i
m
A
F
1
7
5
3
2
3
C
r
o
c
k
a
n
d
C
r
o
t
e
a
u
d
s
,
c
J
o
n
e
s
a
n
d
D
a
v
i
s
p
s
L
i
m
o
n
e
n
e
p
,
i
A
.
t
h
a
l
i
a
n
a
A
t
L
I
M
A
1
A
t
l
i
m
a
1
Z
9
7
3
4
1
B
e
v
a
n
e
t
a
l
.
p
s
C
h
r
o
m
o
s
o
m
e
4
C
F
6
(
E
S
S
A
I
)
A
t
L
I
M
A
2
A
t
l
i
m
a
2
n
t
1
6
4
9
8
3
1
7
0
5
0
5
(
c
o
n
t
i
n
u
e
d
)
817 Evolution of Plant Terpene Synthases
T
A
B
L
E
1
(
C
o
n
t
i
n
u
e
d
)
G
e
n
B
a
n
k
T
e
r
p
e
n
e
s
y
n
t
h
a
s
e
n
a
m
e
a
c
c
e
s
s
i
o
n
n
o
.
R
e
f
e
r
e
n
c
e
P
r
o
d
u
c
t
s
S
p
e
c
i
e
s
F
o
r
m
e
r
g
e
n
e
E
n
z
y
m
e
b
c
D
N
A
/
g
e
n
o
m
i
c
b
c
D
N
A
g
D
N
A
c
D
N
A
g
D
N
A
R
e
g
i
o
n
o
n
c
h
r
o
m
o
s
o
m
e
L
i
m
o
n
e
n
e
p
A
.
t
h
a
l
i
a
n
a
A
t
L
I
M
B
A
t
l
i
m
b
Z
9
7
3
4
1
B
e
v
a
n
e
t
a
l
.
p
s
C
h
r
o
m
o
s
o
m
e
4
C
F
6
(
E
S
S
A
I
)
n
t
1
7
2
5
9
8
1
7
5
3
4
4
(
S
)
-
L
i
n
a
l
o
o
l
C
l
a
r
k
i
a
c
o
n
c
i
n
n
a
L
I
S
C
c
g
L
I
N
O
H
C
c
g
l
i
n
o
h
A
F
0
6
7
6
0
2
C
s
e
k
e
e
t
a
l
.
(
1
9
9
8
)
C
s
e
k
e
e
t
a
l
.
(
1
9
9
8
)
L
i
n
a
l
o
o
l
p
A
.
t
h
a
l
i
a
n
a
A
t
g
L
I
N
O
H
A
t
g
l
i
n
o
h
A
C
0
2
2
9
4
F
e
d
e
r
s
p
i
e
l
d
s
C
h
r
o
m
o
s
o
m
e
1
B
A
C
F
I
I
P
1
7
n
t
7
3
9
9
6
7
8
9
0
5
V
e
t
i
s
p
i
r
a
d
i
e
n
e
H
y
o
s
c
y
a
m
u
s
m
u
t
i
c
u
s
C
h
i
m
e
r
a
H
m
f
V
E
T
H
m
f
v
e
t
U
2
0
1
8
7
N
A
B
a
c
k
a
n
d
C
h
a
p
p
e
l
l
C
h
a
p
p
e
l
l
p
c
(
1
9
9
5
)
V
e
t
i
s
p
i
r
a
d
i
e
n
e
p
A
.
t
h
a
l
i
a
n
a
A
t
V
E
T
A
t
v
e
t
A
L
0
2
2
2
2
4
B
e
v
a
n
e
t
a
l
.
d
s
C
h
r
o
m
o
s
o
m
e
4
B
A
C
F
1
2
C
1
2
(
E
S
S
A
)
n
t
5
4
6
9
2
5
6
8
9
3
t
c
,
g
e
n
o
m
i
c
s
e
q
u
e
n
c
e
s
b
y
T
r
a
p
p
a
n
d
C
r
o
t
e
a
u
(
a
c
c
e
s
s
i
o
n
n
o
s
.
p
e
n
d
i
n
g
)
;
N
A
,
s
e
q
u
e
n
c
e
s
u
n
a
v
a
i
l
a
b
l
e
i
n
t
h
e
p
u
b
l
i
c
d
a
t
a
b
a
s
e
s
b
u
t
d
i
s
c
l
o
s
e
d
i
n
j
o
u
r
n
a
l
r
e
f
e
r
e
n
c
e
;
p
c
,
s
e
q
u
e
n
c
e
s
o
b
t
a
i
n
e
d
b
y
p
e
r
s
o
n
a
l
c
o
m
m
u
n
i
c
a
t
i
o
n
s
;
d
s
,
s
e
q
u
e
n
c
e
s
i
n
p
u
b
l
i
c
d
a
t
a
b
a
s
e
b
y
d
i
r
e
c
t
s
u
b
m
i
s
s
i
o
n
b
u
t
n
o
t
p
u
b
l
i
s
h
e
d
;
p
,
s
e
q
u
e
n
c
e
s
i
n
d
a
t
a
b
a
s
e
w
i
t
h
p
u
t
a
t
i
v
e
f
u
n
c
t
i
o
n
;
c
,
c
o
n
r
m
e
d
g
e
n
e
b
y
e
x
p
e
r
i
m
e
n
t
a
l
d
e
t
e
r
m
i
n
a
t
i
o
n
s
t
a
t
e
d
i
n
d
a
t
a
b
a
s
e
;
i
,
t
w
o
p
o
s
s
i
b
l
e
i
s
o
z
y
m
e
s
r
e
p
o
r
t
e
d
f
o
r
t
h
e
s
a
m
e
r
e
g
i
o
n
r
e
f
e
r
r
e
d
t
o
a
s
A
1
a
n
d
A
2
;
,
n
o
f
o
r
m
e
r
g
e
n
e
n
a
m
e
o
r
a
c
c
e
s
s
i
o
n
n
u
m
b
e
r
.
S
p
e
c
i
e
s
n
a
m
e
s
a
r
e
:
A
b
i
e
s
g
r
a
n
d
i
s
,
A
r
a
b
i
d
o
p
s
i
s
t
h
a
l
i
a
n
a
,
C
l
a
r
k
i
a
c
o
n
c
i
n
n
a
,
G
o
s
s
y
p
i
u
m
a
r
b
o
r
e
u
m
,
H
y
o
s
c
y
a
m
u
s
m
u
t
i
c
u
s
,
M
e
n
t
h
a
l
o
n
g
i
f
o
l
i
a
,
M
e
n
t
h
a
s
p
i
c
a
t
a
,
N
i
c
o
t
i
a
n
a
t
a
b
a
c
u
m
,
R
i
c
i
n
u
s
c
o
m
m
u
n
i
s
,
P
e
r
i
l
l
a
f
r
u
t
e
s
c
e
n
s
,
T
a
x
u
s
b
r
e
v
i
f
o
l
i
a
,
Z
e
a
m
a
y
s
.
a
F
o
r
m
e
r
n
a
m
e
s
,
r
e
s
p
e
c
t
i
v
e
l
y
,
f
o
r
(
)
-
c
o
p
a
l
y
l
d
i
p
h
o
s
p
h
a
t
e
s
y
n
t
h
a
s
e
a
n
d
e
n
t
-
k
a
u
r
e
n
e
s
y
n
t
h
a
s
e
w
e
r
e
e
n
t
-
k
a
u
r
e
n
e
s
y
n
t
h
a
s
e
A
(
K
S
A
)
a
n
d
e
n
t
-
k
a
u
r
e
n
e
s
y
n
t
h
a
s
e
B
(
K
S
B
)
,
a
n
d
m
u
t
a
n
t
p
h
e
n
o
t
y
p
e
s
w
e
r
e
g
a
1
a
n
d
g
a
2
;
t
h
e
s
e
d
e
s
i
g
n
a
t
i
o
n
s
h
a
v
e
b
e
e
n
u
s
e
d
l
o
o
s
e
l
y
.
b
N
o
m
e
n
c
l
a
t
u
r
e
a
r
c
h
i
t
e
c
t
u
r
e
i
s
s
p
e
c
i
e
d
a
s
f
o
l
l
o
w
s
.
T
h
e
L
a
t
i
n
b
i
n
o
m
i
a
l
t
w
o
-
l
e
t
t
e
r
a
b
b
r
e
v
i
a
t
i
o
n
s
a
r
e
i
n
s
p
a
c
e
s
1
a
n
d
2
.
T
h
e
s
u
b
s
t
r
a
t
e
s
(
1
-
t
o
4
-
l
e
t
t
e
r
a
b
b
r
e
v
.
)
a
r
e
i
n
s
p
a
c
e
s
3
6
,
c
o
n
s
i
s
t
i
n
g
o
f
1
-
o
r
2
-
l
e
t
t
e
r
a
b
b
r
e
v
.
f
o
r
s
u
b
s
t
r
a
t
e
u
t
i
l
i
z
e
d
i
n
b
o
l
d
f
a
c
e
(
e
.
g
.
,
g
,
g
e
r
a
n
y
l
d
i
p
h
o
s
p
h
a
t
e
;
f
,
f
a
r
n
e
s
y
l
d
i
p
h
o
s
p
h
a
t
e
;
g
g
,
g
e
r
a
n
y
l
g
e
r
a
n
y
l
d
i
p
h
o
s
p
h
a
t
e
;
c
,
c
o
p
a
l
y
l
d
i
p
h
o
s
p
h
a
t
e
;
c
h
,
c
h
r
y
s
a
n
t
h
e
m
y
l
d
i
p
h
o
s
p
h
a
t
e
;
i
n
l
o
w
e
r
c
a
s
e
)
f
o
l
l
o
w
e
d
b
y
s
t
e
r
e
o
c
h
e
m
i
s
t
r
y
a
n
d
/
o
r
i
s
o
m
e
r
d
e
n
i
t
i
o
n
(
e
.
g
.
,
,
e
t
c
.
f
o
l
l
o
w
e
d
b
y
e
p
i
(
e
)
,
E
,
Z
,
-
,
,
e
t
c
.
)
.
T
h
e
3
-
l
e
t
t
e
r
p
r
o
d
u
c
t
a
b
b
r
e
v
.
i
n
d
i
c
a
t
e
s
t
h
e
m
a
j
o
r
p
r
o
d
u
c
t
i
s
a
n
o
l
e
n
;
o
t
h
e
r
w
i
s
e
t
h
e
q
u
e
n
c
h
i
n
g
n
u
c
l
e
o
p
h
i
l
e
i
s
i
n
d
i
c
a
t
e
d
,
(
e
.
g
.
,
A
B
I
,
a
b
i
e
t
a
d
i
e
n
e
s
y
n
t
h
a
s
e
;
B
O
R
P
P
,
b
o
r
n
y
l
d
i
p
h
o
s
p
h
a
t
e
s
y
n
t
h
a
s
e
;
C
E
D
O
H
,
c
e
d
r
o
l
s
y
n
t
h
a
s
e
)
;
u
p
p
e
r
c
a
s
e
s
p
e
c
i
e
s
p
r
o
t
e
i
n
a
n
d
l
o
w
e
r
c
a
s
e
s
p
e
c
i
e
s
c
D
N
A
o
r
g
D
N
A
.
A
l
l
l
e
t
t
e
r
s
e
x
c
e
p
t
s
p
e
c
i
e
s
n
a
m
e
s
a
r
e
i
n
i
t
a
l
i
c
s
f
o
r
c
D
N
A
a
n
d
g
e
n
e
.
D
i
s
t
i
n
c
t
i
o
n
b
e
t
w
e
e
n
c
D
N
A
a
n
d
g
D
N
A
m
u
s
t
b
e
s
t
a
t
e
d
o
r
a
g
i
s
a
d
d
e
d
b
e
f
o
r
e
t
h
e
a
b
b
r
e
v
i
a
t
i
o
n
,
e
.
g
.
,
T
b
g
g
t
a
x
c
D
N
A
a
n
d
g
T
b
g
g
t
a
x
,
o
r
T
b
g
g
t
a
x
g
e
n
e
(
n
o
m
e
n
c
l
a
t
u
r
e
s
y
s
t
e
m
d
e
v
i
s
e
d
b
y
S
.
T
r
a
p
p
,
E
.
D
a
v
i
s
,
J
.
C
r
o
c
k
,
a
n
d
R
.
C
r
o
t
e
a
u
)
.
818 S. C. Trapp and R. B. Croteau
TABLE 2
Primers used to isolate conifer genomic sequences from corresponding cDNAs
Gene name Primer name Primer sequence Position
a
AgfEbis
b
agc1.7F 5 ACTTCAAAGATGCCAATGGG 3 nt 11341114
AG1/06R 5 TGATTACAGTGGCAGCGGTTC 3 nt 24392429
AG1/05F 5 GCTGGCGTTTCTGCTGTATC 3 nt 321
AG1/14R 5 GCCAAGAAGTCTTCAGCGCG 3 nt 13611341
AG1/16R 5 TGTTGTTCCAGTCACTGG 3 nt 13141294
Agg-pin AG3/03F 5 TTCTACCGCACCGTTGGC 3 nt 1229
AG3/04R 5 AACCCGACATAGCATAGG 3 nt 19241907
Agfsel AG4/01F 5 ATGGCTGAGATTTCTGAATC 3 nt 120
AG4/02R 5 GACCATCACTATTCCTCC 3 nt 17531737
Agg-lim AG10/01F 5 GGCAGGAATCCATGGCTCTCCTT 3 nt 1 to 24
AG10/02R 5 GAATAGTCTAGATTATAGACTTCCCAC 3 nt 19851960
Agggabi AG22/03F 5 TGCTCATCATCTAACTGC 3 nt 4562
AG22/04R 5 ACACAATACCATGAGGGC 3 nt 26452628
Tbggtax tax1/01F 5 GAATTCCTTCCCCTGCCTCTCTGG 3
c
nt 21 to 5
tax1/02R 5 GCTCTAGAGCGCCAATACAATAATAAGTC 3
c
nt 26422624
a
Number one and positive numbers represent ATG start site and downstream nucleotides (nt), respectively,
designed from cDNA; minus numbers are upstream of the ATG start site.
b
Sequencing the bisabolene synthase genomic sequence was accomplished by sequencing two overlapping
fragments designated pAGg1-1 (1.7F and 06R) and pAGg1-11A (05F and 14R, 16R).
c
Restriction enzyme (EcoRI or XbaI) sites were incorporated at the termini of these primers for cloning
purposes.
appropriate genomic DNA as template. For AgfEbis, tity to the corresponding cDNA. For the exceptions, two
different products were observed in each case, corre- Touchdown PCR amplication (Don et al. 1991) of the
5 portion of two overlapping DNA templates was re- sponding to sizes of 3.3 and 2.8 kb for Agf sel, and
3.2 and 2.8 kb for Agg-pin. The Agf sel1 and Agf sel2 quired (Table 2). Once the correspondence of cDNA to
gDNA was conrmed, the latter was entirely sequenced. products were sequenced, and the deduced amino acid
sequences were 92 and 87% identical to that of the With the exception of Agg-pin1 and Agf sel, all genomic
sequences (after intron deletion) exhibited 98%iden- published Agf sel cDNA (formerly ag4; Steele et al.
TABLE 3
Terpene synthases sequenced in this study
Terpene synthase cDNA Genomic
Product Class Former name
a
Size
b
aa
b
Name Size
b
aa
b
Clone name
c
Introns
d
Bisabolene sesqui
in
ag1 2.541 817 AgfEbis 4.647 817 pAGg1-1/1-11A
e
11
()-Pinene mono
in
ag3 1.884 627 Agg-pin1 3.280 628 pAGg3-37 9
-Selinene sesqui
co
ag4 1.743 581 Agf sel1 3.346 579 pAGg4-31 9
Agf sel2 2.789 525 pAGg4-34 8
()-Limonene mono
in, co
ag10 1.913 637 Agg-lim1 1.913 637 pAGg10-1 9
Abietadiene di
in
ag22 2.604 868 Agggabi 4.664 868 pAGg22-3/22-4
f
14
c
Taxadiene di
un
tb1 2.586 862 Tbggtax 3.999 862 pTBgtax1-1 12
The superscript notations in, co, and un refer to inducible, constitutive, or unknown-type enzyme expression, respectively.
a
Former names for cDNAs are from Bohlmann et al. 1998b.
b
Size of cDNA and genomic sequences are in kilobases; deduced amino acid (aa) sequences are translated from the cDNA,
or from the genomic sequence without introns spliced.
c
Plasmid clones were used for full-length genomic sequencing, with some exceptions (see Table 2).
d
All introns are conserved in the same positions with reference to Agggabi, although the specic gene may not contain all 14
introns. Six introns are positionally conserved in all cases (Chappell 1995a).
e
Bisabolene synthase sequence determined with the overlapping plasmid clones pAgc1-1 (3-end) and pAgc1-11A (5-end)
containing inserts of 1.907 and 2.750 kb, respectively (see text).
f
Both plasmid clones pAgc22-4 and pAgc22-3 were used to determine the full genomic sequence.
819 Evolution of Plant Terpene Synthases
1998), indicating the presence of allelic variants, pseu- ing from698 to 2850 nt in length (Table 4). In addition,
the introns of the gymnosperm synthases are AT rich, dogenes or, possibly, distinct but related genes. Only
the 3.2-kb gDNA fragment of Agg-pin1 was completely with repetitive sequences rich in T (310 mers). Splice
sites of all of the terpene synthases in this study have sequenced and this version showed 99% identity to the
published Agg-pin1 cDNA (formerly ag3; Bohlmann et been compiled (see WebTables 1 and 2 at http://ibc.
wsu.edu/faculty/croteau.html). The 5-splice site con- al. 1997) at the deduced amino acid level.
Intron/exon structure of terpene synthase genes: In sensus sequence for the conifer terpene synthases is
NNNG
GTNNNN; however, there is a clear preference addition to the new genomic sequences of the conifer
terpene synthases (Table 3), genomic sequences were for G
).
A chart of amino acid sequence pair distances and align- (Atgg-copp1, Ccglinoh, Gaf cad, Hmfvet1, Ntfeari4, Rcggcas,
Pfg-lim1) and one is a chimera constructed from the ments (showing intron splice sites) for all of the terpene
synthases in this study is also available (see WebFigures published Msg-lim cDNA sequence (Colby et al. 1993)
and an unpublished Mlg-lim sequence. To complete the 1 and 2 at http://ibc.wsu.edu/faculty/croteau.html).
Classication of terpene synthase genes: Comparison analysis, six putative terpene synthase sequences from
Arabidopsis (pAteari4, pAtcad, pAtlim1a, pAtlim1b, pAtli- of genomic structures (Figures 3 and 5) indicates that
the plant terpene synthase genes consist of three classes noh, pAtvet) were acquired by database searching (Table
1). It is of note that the public genomic databases poorly based on intron/exon pattern; 1214 introns (class I),
9 introns (class II), or 6 introns (class III; Figure 3). identify putative terpene synthase genes and are not
very accurate in the prediction of intron splice sites Using this classication, based upon distinctive exon/
intron patterns, the seven conifer genes are assigned to and, thus, proper intron phase and exon denition; the
algorithms could easily be improved. It is also worth class I or class II (Figure 3C). Class I comprises conifer
diterpene synthase genes Agggabi and Tbggtax and ses- noting that the putative terpene synthase genes of Arabi-
dopsis cluster on a small portion of chromosome 4 (Ta- quiterpene synthase Agfbis and angiosperm synthase
genes specically involved in primary metabolism (Atgg- ble 1). Clustering of related pathways genes in plant
genomes has not received much attention, and the ter- copp1 and Ccglinoh). Terpene synthase class I genes con-
tain 1114 introns and 1215 of exons of characteristic penoid synthases and associated metabolic enzymes may
be representative of this phenomenon. size (Figure 3C), including the CDIS domain compris-
ing exons 4, 5, and 6, and the rst 20 amino acids of Each of the 21 terpene synthase genomic sequences
was analyzed for the number and size of exons and exon 7, and introns 4, 5, and 6 (this unusual sequence
element corresponds to a 215-amino-acid region [Pro
137
- introns, as well as intron placement and position
(phase). A distinct pattern of exon sizes emerged, and Leu
351
] of the Agggabi sequence). Class II Tps genes
comprise only conifer monoterpene and sesquiterpene introns were observed at a total of 14 positions (Figure
3; Table 4). Introns were numbered according to place- synthases, and these contain 9 introns and 10 exons;
introns 1 and 2 and the entire CDIS element have been ment starting with intron 1 closest to the 5 terminus,
and, in all of the observed introns, placement and phase lost, including introns 4, 5, and 6. Class III Tps genes
comprise only angiosperm monoterpene, sesquiter- are conserved (Figure 4). Intron phase is dened as the
placement of the intron before the rst, second, or pene, and diterpene synthases involved in secondary
metabolism, and they contain 6 introns and 7 exons. third nucleotide position of the proximate codon and
is referred to as phase 0, 1, or 2, respectively (Li 1997; Introns 1, 2, 7, 9, and 10 and CDIS domain have been
lost in the class III type. The introns of class III Tps e.g., all Tps genes that contain intron 3 have a phase of
0). The one exception to intron phase conservation is genes (introns 3, 8, 1114) are conserved among all
plant terpene synthase genes and were described as intron 11 of the Hmfvet gene, which has a phase of 0
instead of 2, as do all other Tps genes from gymno- introns 16, respectively, in previous analyses (Mau and
West 1994; Back and Chappell 1995; Chappell sperms and angiosperms (Table 4). This discrepancy
may represent an error in sequencing across the splice 1995b).
Of the class I Tps genes, introns 1 and 2 are observed site or an example of intron sliding (Mathews and
Trotman 1998). only in Agggabi and Atgg-copp1 (Figure 3C), in which
phase is also conserved. However, the placement of in- An obvious pattern of intron size or sequence of the
14 introns was not detected, although a more rigorous tron 1 is not conserved between the two genes; the slight
discrepancy in placement of intron 1 may reect poor comparison is required to determine this for certain.
The introns of the conifer genes are relatively small alignment in this portion of the terpene synthase pre-
proteins that denes the plastidial targeting sequence (150 nt on average) compared to those of the angio-
sperm genes (196 nt), especially those of Arabidopsis (Figures 3C and 4; Bohlmann et al. 1998b). (For more
detail, see amino acid alignments at http://ibc.wsu.edu/ (266 nt). Moreover, Pfg-lim1 and the putative pAtlimA1
gene contain several exceptionally large introns, rang- faculty/croteau.html.) It is also notable that Agfbis (a
820 S. C. Trapp and R. B. Croteau
821 Evolution of Plant Terpene Synthases
Figure 3.Genomic organization of plant terpenoid synthase genes. Black vertical bars represent introns 114 (Roman
numerals in gure) and are separated by colored blocks with specied lengths representing exons 115. The terpenoid synthase
genes are divided into three classes (class I, class II, and class III), which appear to have evolved sequentially from class I to class
III by intron loss and loss of the conifer diterpene internal sequence domain (CDIS; see Figure 4). (C) Class I Tps genes comprise
1214 introns and 1315 exons and consist primarily of diterpene synthases found in gymnosperms (secondary metabolism) and
angiosperms (primary metabolism). (B) Class II Tps genes comprise 9 introns and 10 exons and consist of only gymnosperm
monoterpene and sesquiterpene synthases involved in secondary metabolism. (A) Class III Tps genes comprise 6 introns and 7
exons and consist of angiosperm monoterpene, sesquiterpene, and diterpene synthases involved in secondary metabolism. Exons
that are identically colored illustrate sequential loss of introns and the CDIS domain, over evolutionary time, from class I through
class III. The methionine at the translational start site of the coding region (and alternatives), highly conserved histidines, and
single or double arginines indicating the minimum mature protein (Williams et al. 1998) are represented by M, H, RR, or RX
(X representing other amino acids that are sometimes substituted), respectively. The enzymatic classication as a monoterpene,
sesquiterpene, or diterpene synthase is represented by C
10
, C
15
, C
20
, respectively. Conifer terpene synthases were isolated and
sequenced to determine genomic structure; all other terpene synthase sequences were obtained from public databases or by
personal communication (see Table 1). Putative terpene synthases are referred to as putative proteins and are illustrated based
upon predicted homology. Two different predictions of the same putative protein (accession no. Z97341) are shown as limonene
synthase A1 and A2; if A1 is correct, the genomic pattern suggests that Atlim (accession no. Z97341) is a sesquiterpene synthase;
if A2 is correct, then Atlim (accession no. Z97341) is a monoterpene synthase. In the analysis of intron borders of the Msg-lim/
Mlg-lim chimera and Hmfvet1 genes (see Table 1), only a single intron border (5 or 3) was sequenced to determine intron
placement; size was not determined. The intron/exon borders predicted for a number of terpene synthases identied in the
Arabidopsis database were determined to be incorrect; these data were reanalyzed and new predictions used. The number in
parentheses represents the deduced size (in amino acid residues) of the corresponding protein or preprotein, as appropriate.
sesquiterpene synthase) and Ccglinoh (a monoterpene involved in primary metabolism, with the exception of
Ccglinoh. synthase) are class I genes, although all other class I
Tps genes are diterpene synthases. Furthermore, Ccgli- Evolutionary history of the Tps gene family by gene
architectural comparison: A schematic ow chart for noh and Agfbis are the only monoterpene or sesquiter-
pene synthase genes that contain the CDIS domain. the evolution of terpene synthase genes (Figure 5) was
proposed on the basis of the data presented in Figure Moreover, Agfbis and Ccglinoh genes are exceptions,
even within the class I group, in that they both are 3 and Table 4. Figure 5 provides the simplest account
of ancestry derived by charting the physical patterns of devoid of introns at the extremes of the coding region;
Agfbis lacks intron 14, and Ccglinoh lacks intron 3 (as proposed intron and domain loss and the consideration
of additional conserved patterns of gene architecture well as 1 and 2). Finally, the angiosperm terpene syn-
thase genes that fall within class I all encode enzymes that are not explicitly shown. Other possible mecha-
822 S. C. Trapp and R. B. Croteau
T
A
B
L
E
4
C
o
m
p
a
r
i
s
o
n
o
f
t
e
r
p
e
n
e
s
y
n
t
h
a
s
e
i
n
t
r
o
n
s
:
n
u
m
b
e
r
,
s
i
z
e
,
p
l
a
c
e
m
e
n
t
,
a
n
d
p
h
a
s
e
R
c
g
g
c
a
s
H
m
f
v
e
t
1
b
,
c
N
t
f
e
a
r
i
4
G
a
f
c
a
d
M
l
g
-
l
i
m
c
,
d
P
f
g
-
l
i
m
A
g
g
-
l
i
m
A
g
g
-
p
i
n
1
A
g
f
s
e
l
2
A
g
f
s
e
l
1
A
g
g
g
a
b
i
A
g
f
E
b
i
s
T
b
g
g
t
a
x
A
t
g
l
i
n
o
h
a
C
b
g
l
i
n
o
h
A
t
g
g
-
c
o
p
p
1
I
n
t
r
o
n
C
2
0
A
t
f
v
e
t
1
a
C
1
5
A
t
f
e
a
r
i
1
a
C
1
5
C
1
5
A
t
c
a
d
1
a
C
1
0
C
1
0
A
t
l
i
m
b
a
A
t
l
i
m
a
1
a
,
e
A
t
l
i
m
a
2
a
,
e
C
1
0
C
1
0
C
1
5
C
1
5
C
2
0
C
1
5
C
2
0
C
1
0
C
1
0
C
2
0
I
K
5
1
*
S
3
2
*
(
1
5
1
)
(
8
2
2
)
I
I
N
7
6
*
Q
5
6
*
(
9
6
)
(
1
1
5
)
I
I
I
S
8
6
S
8
4
Q
3
9
S
8
4
Q
3
7
K
4
3
S
8
0
9
1
U
K
8
4
V
4
9
L
3
9
F
7
8
G
9
3
E
8
7
G
3
4
G
3
4
K
1
0
7
I
4
8
E
1
0
8
Q
8
4
(
1
1
1
)
(
7
1
)
(
N
D
)
(
9
1
)
(
1
2
7
)
(
1
0
0
)
(
1
1
6
)
(
8
7
)
(
6
9
8
)
(
1
0
0
)
(
2
.
8
5
0
)
(
2
.
7
3
4
)
(
8
3
)
(
1
0
7
)
(
1
2
2
)
(
1
2
0
)
(
9
6
)
(
2
0
5
)
(
8
8
)
(
2
8
3
)
I
V
K
2
1
0
*
Q
1
5
4
*
Q
2
1
2
*
K
1
2
8
*
K
1
2
3
U
K
1
8
6
*
(
1
8
8
)
(
1
6
4
)
(
1
9
7
)
(
9
1
2
)
(
4
4
0
)
(
2
5
3
)
V
K
2
7
1
*
*
A
2
1
6
*
*
T
2
7
1
*
*
Q
1
9
5
*
*
R
1
8
9
U
T
2
4
7
*
*
(
1
0
6
)
(
4
2
8
)
(
8
8
)
(
9
2
)
(
1
8
1
)
(
1
9
8
)
V
I
H
3
4
4
*
F
2
8
9
*
C
3
4
3
*
G
2
7
2
*
G
2
6
6
U
G
3
1
9
*
(
1
2
1
)
(
5
4
6
)
(
9
8
)
(
1
0
8
)
(
2
4
6
)
(
1
2
1
)
V
I
I
Y
1
5
9
*
*
Y
1
5
4
*
*
Y
9
6
*
*
Y
9
8
*
*
Y
3
8
4
*
*
Y
3
2
9
*
*
Y
3
8
3
*
*
Y
3
1
3
*
*
Q
3
0
7
U
H
3
5
9
*
*
(
2
7
8
)
(
8
5
)
(
8
4
)
(
1
0
2
)
(
8
7
)
(
1
5
6
)
(
8
3
)
(
1
0
2
)
(
1
8
1
)
(
7
9
)
V
I
I
I
S
1
7
7
*
S
1
7
6
*
P
1
3
1
*
Y
1
7
6
*
P
1
2
5
*
C
1
3
0
*
S
1
7
2
*
1
8
3
U
Q
1
7
1
*
Q
1
4
3
*
Q
1
2
9
*
Q
1
6
7
*
S
1
9
8
*
S
1
9
2
*
S
1
3
5
*
S
1
3
7
*
S
4
2
2
*
P
3
6
7
*
S
4
2
1
*
P
3
5
2
*
P
3
4
6
U
A
3
9
7
*
(
1
3
3
)
(
9
1
)
(
N
D
)
(
1
1
5
)
(
8
7
)
(
1
0
1
)
(
9
9
)
(
N
D
)
(
8
2
6
)
(
2
1
2
)
(
2
3
1
)
(
2
3
1
)
(
8
3
)
(
1
4
6
)
(
1
4
7
)
(
1
4
7
)
(
8
8
)
(
1
3
4
)
(
1
9
7
)
(
2
6
2
)
(
1
2
9
)
(
8
2
)
I
X
E
2
6
9
E
2
6
3
E
2
1
4
E
2
1
6
E
4
9
9
E
4
4
4
E
4
9
3
K
4
2
4
M
4
2
4
U
E
4
7
3
(
8
4
)
(
9
2
)
(
1
0
1
)
(
9
8
)
(
1
5
6
)
(
1
0
8
)
(
1
3
0
)
(
5
5
)
(
8
4
)
(
2
1
2
)
X
N
2
9
4
*
*
L
2
4
8
*
*
Y
5
3
6
*
*
Y
4
8
1
*
*
Y
5
3
0
*
*
L
4
6
2
*
*
L
4
6
2
U
Y
5
1
1
*
*
(
5
3
7
)
(
8
4
)
(
5
1
0
)
(
9
6
)
(
1
1
0
)
(
1
0
0
)
(
9
6
)
(
3
5
7
)
X
I
F
3
0
2
*
*
T
3
0
2
*
*
S
2
5
5
T
3
0
2
*
*
S
2
5
1
*
*
F
2
5
6
*
*
T
2
9
8
*
*
3
0
9
U
L
3
0
3
*
*
S
2
6
7
*
*
S
2
6
0
*
*
S
2
9
9
*
*
S
3
3
6
*
*
Y
3
2
7
*
*
R
2
2
4
*
*
T
2
7
9
*
*
R
5
6
9
*
*
T
5
1
4
*
*
T
5
6
3
*
*
T
4
9
5
*
*
E
4
9
5
U
Q
5
4
4
*
*
(
4
2
8
)
(
9
8
)
(
N
D
)
(
9
0
)
(
7
6
)
(
3
2
2
)
(
1
0
3
)
(
N
D
)
(
1
2
5
)
(
4
3
7
)
(
9
5
)
(
9
5
)
(
1
0
1
)
(
9
9
)
(
5
5
)
(
4
7
2
)
(
9
0
)
(
1
1
6
)
(
1
0
6
)
(
1
4
0
)
(
7
3
)
(
2
6
3
)
X
I
I
A
3
7
5
*
*
K
3
7
6
*
*
Q
3
2
8
*
*
E
3
7
6
*
*
Q
3
2
4
*
*
E
3
2
9
*
*
E
3
7
1
*
*
3
8
3
U
R
3
7
6
*
*
E
3
4
0
*
*
K
3
3
4
*
*
K
3
7
2
*
*
K
4
0
8
*
*
K
3
9
9
*
*
R
2
9
5
*
*
R
3
5
1
*
*
K
6
4
1
*
*
R
5
8
6
*
*
K
6
3
3
*
*
L
5
6
7
*
*
R
5
6
7
U
D
6
2
9
*
*
(
9
3
)
(
8
6
)
(
9
3
)
(
9
2
)
(
1
3
1
)
(
1
0
3
)
(
1
2
2
)
(
N
D
)
(
3
2
6
)
(
8
9
)
(
1
7
4
)
(
1
7
4
)
(
1
0
8
)
(
9
4
)
(
1
4
2
)
(
1
4
6
)
(
1
9
0
)
(
1
4
1
)
(
9
9
)
(
2
2
2
)
(
1
1
9
)
(
9
2
7
)
X
I
I
I
A
4
2
2
E
4
2
4
R
3
7
5
E
4
2
3
R
3
7
0
A
3
7
6
E
4
1
9
4
2
9
U
S
4
2
3
S
3
8
7
V
3
8
0
V
4
1
9
A
4
5
5
A
4
4
6
C
3
4
2
C
3
9
8
V
6
8
8
G
6
3
3
P
6
8
0
I
6
1
1
L
6
1
1
U
S
6
7
5
(
7
9
)
(
2
4
8
)
(
N
D
)
(
1
9
3
)
(
1
5
5
)
(
1
6
5
)
(
1
8
4
4
)
(
N
D
)
(
9
1
)
(
3
0
8
)
(
7
3
)
(
7
3
)
(
1
0
2
)
(
1
0
6
)
(
3
6
1
)
(
3
6
6
)
(
8
2
)
(
1
0
3
)
(
1
1
8
)
(
1
3
9
)
(
8
2
)
(
4
9
4
)
X
I
V
V
5
0
5
E
5
0
7
E
4
5
7
E
5
0
7
E
4
5
2
K
4
5
9
E
5
0
1
5
1
2
U
L
5
0
6
S
4
6
7
T
4
6
3
T
5
0
2
K
5
3
8
K
5
2
9
E
4
2
5
E
4
8
1
Q
7
7
1
Q
7
6
3
Q
6
9
3
L
6
9
5
U
K
7
3
4
(
8
1
)
(
8
7
)
(
N
D
)
(
8
4
)
(
1
1
3
)
(
2
8
6
)
(
5
8
3
)
(
1
0
1
)
(
1
1
9
)
(
1
7
8
)
(
4
0
2
)
(
4
0
2
)
(
1
2
7
)
(
1
2
0
)
(
7
6
)
(
7
8
)
(
9
8
)
(
9
9
)
(
1
2
4
)
(
8
4
)
(
2
7
0
)
N
o
.
o
f
I
n
t
r
o
n
s
6
6
6
6
6
6
6
6
6
6
6
6
9
9
8
9
1
4
1
1
1
2
1
1
1
1
1
4
a
a
6
0
1
6
0
7
5
5
5
6
0
4
5
5
0
5
5
6
6
0
0
5
9
9
6
0
3
5
1
6
5
6
3
6
0
2
6
3
7
6
2
8
5
2
5
5
8
2
8
6
8
8
1
4
8
6
2
8
8
4
8
7
0
8
0
2
N
o
a
s
t
e
r
i
s
k
,
*
,
*
*
,
i
n
d
i
c
a
t
e
s
i
n
t
r
o
n
i
n
s
e
r
t
e
d
,
r
e
s
p
e
c
t
i
v
e
l
y
,
b
e
t
w
e
e
n
c
o
d
o
n
s
(
d
o
e
s
n
o
t
i
n
t
e
r
r
u
p
t
c
o
d
o
n
)
;
b
e
t
w
e
e
n
t
h
e
r
s
t
a
n
d
s
e
c
o
n
d
n
u
c
l
e
o
t
i
d
e
;
b
e
t
w
e
e
n
s
e
c
o
n
d
a
n
d
t
h
i
r
d
n
u
c
l
e
o
t
i
d
e
.
,
i
n
t
r
o
n
n
o
t
p
r
e
s
e
n
t
;
N
D
,
i
n
t
r
o
n
n
o
t
d
e
t
e
r
m
i
n
e
d
;
U
,
i
n
t
r
o
n
p
h
a
s
e
n
o
t
d
e
t
e
r
m
i
n
e
d
;
i
n
t
r
o
n
n
u
m
b
e
r
s
i
n
b
o
l
d
f
a
c
e
r
e
p
r
e
s
e
n
t
h
i
g
h
l
y
c
o
n
s
e
r
v
e
d
i
n
t
r
o
n
s
t
h
r
o
u
g
h
o
u
t
.
a
A
l
l
A
.
t
h
a
l
i
a
n
a
g
e
n
o
m
i
c
s
e
q
u
e
n
c
e
s
r
e
p
o
r
t
e
d
h
e
r
e
a
r
e
p
u
t
a
t
i
v
e
w
i
t
h
t
h
e
e
x
c
e
p
t
i
o
n
o
f
c
o
p
a
l
y
l
d
i
p
h
o
s
p
h
a
t
e
s
y
n
t
h
a
s
e
f
o
u
n
d
i
n
t
h
e
d
a
t
a
b
a
s
e
c
o
r
r
e
s
p
o
n
d
i
n
g
t
o
a
p
u
b
l
i
s
h
e
d
c
D
N
A
f
o
r
m
e
r
l
y
c
a
l
l
e
d
G
A
1
(
s
e
e
T
a
b
l
e
1
)
.
b
I
n
f
o
r
m
a
t
i
o
n
o
b
t
a
i
n
e
d
f
r
o
m
J
.
C
h
a
p
p
e
l
l
(
p
e
r
s
o
n
a
l
c
o
m
m
u
n
i
c
a
t
i
o
n
)
.
c
S
e
q
u
e
n
c
e
s
a
r
e
n
o
t
c
o
m
p
l
e
t
e
;
t
h
e
a
u
t
h
o
r
s
s
e
q
u
e
n
c
e
d
t
h
e
r
e
g
i
o
n
s
a
c
r
o
s
s
t
h
e
5
-
a
n
d
3
-
s
p
l
i
c
e
s
i
t
e
s
(
e
i
t
h
e
r
o
r
b
o
t
h
)
t
o
v
e
r
i
f
y
t
h
e
i
n
t
r
o
n
p
l
a
c
e
m
e
n
t
.
d
M
l
g
-
l
i
m
w
a
s
a
n
a
l
y
z
e
d
b
y
p
r
e
p
a
r
i
n
g
a
c
h
i
m
e
r
i
c
g
e
n
o
m
e
s
e
q
u
e
n
c
e
u
t
i
l
i
z
i
n
g
c
D
N
A
f
r
o
m
M
.
s
p
i
c
a
t
a
a
n
d
i
n
c
o
m
p
l
e
t
e
g
D
N
A
s
e
q
u
e
n
c
e
f
r
o
m
M
.
l
o
n
g
i
f
o
l
i
a
(
T
.
D
a
v
i
s
,
u
n
p
u
b
-
l
i
s
h
e
d
d
a
t
a
)
.
e
A
l
l
t
h
r
e
e
A
t
l
i
m
A
1
,
A
2
,
a
n
d
B
s
e
q
u
e
n
c
e
s
a
r
e
p
u
t
a
t
i
v
e
l
i
m
o
n
e
n
e
s
y
n
t
h
a
s
e
h
o
m
o
l
o
g
s
f
r
o
m
t
h
e
A
.
t
h
a
l
i
a
n
a
d
a
t
a
b
a
s
e
t
h
a
t
a
r
e
a
d
j
a
c
e
n
t
t
o
e
a
c
h
o
t
h
e
r
o
n
c
h
r
o
m
o
s
o
m
e
4
(
s
e
e
T
a
b
l
e
1
)
.
A
t
l
i
m
a
1
a
n
d
A
t
l
i
m
a
2
a
r
e
p
r
e
d
i
c
t
i
o
n
s
o
f
t
w
o
p
o
s
s
i
b
l
e
i
s
o
z
y
m
e
s
.
I
n
a
r
e
c
e
n
t
r
e
p
o
r
t
A
t
l
i
m
a
a
n
d
A
t
l
i
m
b
a
r
e
r
e
f
e
r
r
e
d
t
o
a
s
A
t
T
P
S
0
2
a
n
d
A
t
T
P
S
0
3
w
i
t
h
5
2
a
n
d
5
6
%
i
d
e
n
t
i
t
y
,
r
e
s
p
e
c
t
i
v
e
l
y
,
t
o
a
n
e
w
l
y
d
e
n
e
d
A
r
a
b
i
d
o
p
s
i
s
m
o
n
o
t
e
r
p
e
n
e
s
y
n
t
h
a
s
e
M
y
r
c
e
n
e
/
(
E
)
-
-
O
c
i
m
e
n
e
S
y
n
t
h
a
s
e
,
A
t
T
P
S
1
0
c
D
N
A
(
B
o
h
l
m
a
n
n
e
t
a
l
.
2
0
0
0
)
.
823 Evolution of Plant Terpene Synthases
Figure 4.General struc-
tural features of plant terpene
synthase genes. Ageneric par-
ent terpene synthase gene
(class I type) illustrates the
pattern of intron placement
and phase conservation
among all genes analyzed. In-
trons 114 (Roman numerals
in gure) are represented by
black vertical bars, and exons
1 through 15 are depicted by color boxes (with color coding as in Figure 3). Introns 1 and 2 are found in only two class I terpene
synthase genes (Agggabi and gAtggcopp1) and are boxed in peach to indicate that their presence is not a class I Tps gene
requirement. Introns 3, 8, and 1114 are conserved in all plant terpene synthases, with the exception of AgfEbis, which has lost
intron 14. The number above the intron Roman numeral represents the intron phase number and demonstrates conservation
throughout this gene family. Introns are classied into three phase types according to Li (1997). General structural domains
are labeled. The RX (representative of RR; see Figure 3) and DDXXD motifs are shown in boldface. The consecutive gray, pink,
and light green boxes, representing exons 13 in Agggabi and Atggcopp1, comprise a single exon of variable size in all other
terpene synthases; monoterpene and diterpene synthases comprise an exon size of 80107 aa, whereas sesquiterpene synthases
comprise an exon size of 3050 aa due to the absence of a plastidial targeting sequence (depicted by the green bar). The conifer
diterpene internal sequence domain (CDIS, red bar), identied by Bohlmann et al. (1998a,b), is present only in class I type
terpene synthase genes. The glycosyl hydrolase-like domain (yellow bar), the active site domain (orange bar), and the intradomain
region (white space) are predicted (Bohlmann et al. 1998b) on the basis of the crystal structure of NtfeARI4 (Starks et al. 1997).
nisms of derivation are less suitable on the basis of the between the gene architecture-based and protein ho-
mology-based models, in terms of placement of the class assumption that the gymnosperms predate angio-
sperms, that intron gain with conservation in placement III Tps types, will likely be resolved with the acquisition
of a larger, more diverse data set, including better out- and phase in both classes is less likely, and that terpene
synthases of secondary metabolism almost certainly de- groups to aid in inferring branching order. Although
the protein-based phylogeny cannot be ignored, the rived from those of primary metabolism. The schematic
model (Figure 5) thus represents the most parsimoni- gene architectural data would appear to be the more
compelling at present. ous, and biochemically consistent, explanation for Tps
evolution when intron and CDIS alterations are taken
into account.
DISCUSSION
This model of evolution was tested by computer-simu-
lated analysis utilizing a simple algorithm, with which a Historically, the evolution and complexity of natural
products has puzzled scientists across disciplines, from satisfactory phylogentic tree was obtained (Figure 6A).
By this analysis, class III type Tps genes are placed in a ecology to chemistry, who have been fascinated by the
question as to why these chemicals, which are not essen- clade branching directly from class II types. However,
when the model was tested more rigorously utilizing tial for viability, are biosynthesized (Mapleston et al.
1992; Christophersen 1996; Jarvis and Miller 1996; standard tree-searching computer-generated methods,
we obtained a tree that is very similar to the previously Jarvis 2000). It seems most reasonable to assume that
the precursors and pathways for the generation of natu- published tree based on protein phylogeny (Bohlmann
et al. 1998b; Figure 6B). These protein sequence-based ral products arose from mutations of enzymes involved
in the synthesis of primary metabolites. Under the pres- trees place the class III Tps genes as branching sepa-
rately from both gymnosperm class I and class II Tps sure of natural selection, natural products evolved as
messages, broadly dened (Jarvis and Miller 1996; genes involved in secondary metabolism (Figures 5 and
6B; the group of class III Tps genes are highlighted for Jarvis 2000), which increased the survival tness of the
producing organism (Williams et al. 1989; Mapleston clarity in Figure 6, A and B). The latter derivation is
not the most parsimonious when gene architecture is et al. 1992; Jarvis and Miller 1996). Addressing the
evolutionary origins of natural products and their com- considered. It is possible, nevertheless, that the nucleo-
tide sequence comparison of terpene synthases could plex biosynthetic pathways is an arduous undertaking,
because the genetic and functional bases for such a generate a phylogenetic tree that differs from the pro-
tein sequence-based reconstruction. Preliminary analy- diverse repertoire of compounds are difcult to analyze,
as many of the relevant biosynthetic enzymes are un- sis of the coding sequences of the dened terpene syn-
thases to investigate this possibility was inconclusive and known; the task is often further complicated by the
inability to distinguish between compounds involved in will require further investigation. Furthermore, the
number of introns and variability in sequence made primary and secondary metabolism based on structure
alone (Jarvis and Miller 1996; Jarvis 2000). Few stud- the alignment and phylogenetic analysis of genomic
nucleotide sequences difcult. The lack of consonance ies have focused on the evolution of natural products,
824 S. C. Trapp and R. B. Croteau
825 Evolution of Plant Terpene Synthases
although a number of papers have discussed observa- serve a multitude of functions in their internal environ-
ment (primary metabolism) and external environment tions within specic classes of natural products from a
variety of kingdoms (e.g., fungi, plants, bacteria; (ecological interactions). The biosynthetic requirements
for terpene production are the same for all organisms (a Mapleston et al. 1992; Jarvis 2000). A notable example
is the superfamily of chalcone synthase-related enzymes, source of isopentenyl diphosphate, isopentyl diphosphate
isomerase or other source of dimethylallyl diphosphate, which catalyze the rst committed step in the biosynthe-
sis of a number of biologically important plant products, prenyltransferases, and terpene synthases). The terpene
synthases (regardless of phylogenetic origin) provide a e.g., avonoids (ower color) and phytoalexins (antimi-
crobials; Schrsder 1999), but which appear to be in- unique focus since they are mechanistically very closely
related yet are capable of producing a diverse array of volved in the biosynthesis of a much larger range of
substances than was previously realized (Eckermann et structural types and derivatives. The conservation of geno-
mic organization throughout the large multigene super- al. 1998). Recent analysis has revealed related sequences
in bacteria, suggesting that this protein family is much family encoding plant monoterpene, sesquiterpene, and
diterpene synthases, especially intron architecture, pro- older than was previously assumed (Schrsder 1997).
The majority of plant molecular evolutionary studies vides a compelling argument (Figures 3 and 4; Table 4)
for reconstructing the evolutionary history of the terpene have focused on the chloroplast genome, and few have
examined the molecular evolution of plant nuclear genes synthases from primary to secondary metabolism on the
basis of the proposed pattern of gene sequence loss (in- (Clegg et al. 1997). Most often, the gene families se-
lected for such studies encode enzymes that play a well- trons and CDIS domain; Figure 5).
Current model for the evolution of terpene synthases: dened role in a limited set of primary metabolic path-
ways, such as the small subunit of ribulose-bisphosphate The three classes of terpene synthase genes exhibit clear
intron phase conservation coupled to a distinct, and carboxylase/oxygenase and alcohol dehydrogenase
(Clegg et al. 1997). Very few studies have examined the seemingly sequential, pattern of intron (and CDIS) loss,
thereby suggesting the derivation of this gene family origins of nuclear multigene families, e.g., phytochrome,
heat shock, knox, and catalase genes (Mathews et al. from an ancestral class I type terpene synthase of pri-
mary metabolism common to both gymnosperms and 1995; Frugoli et al. 1998; Bharathan et al. 1999;
Waters and Vierling 1999). Even fewer studies have angiosperms. This proposed ancestral terpene synthase
contained 1214 introns, 1315 exons, and the CDIS analyzed gene superfamilies such as that encoding chal-
cone synthase, which participates in a range of function- domain, as do all class I type Tps genes. The most obvi-
ous modern candidate that resembles this ancestral ally diverse pathways. The terpene synthase genes, which
encode multiple classes of mechanistically related en- gene is postulated to be a contemporary Tps gene that
contains the largest number of introns. This candidate zymes, are arguably among the most functionally diverse
group of genes and furthermore, as a group, are in- comprises a genomic architecture most similar to either
the conifer Agggabi gene or the angiosperm Atgg-copp1 volved in both primary and secondary metabolism in
many phyla. gene, both of which contain 14 introns and 15 exons
as illustrated at the A1 and A2 branchpoints of Figure The evolution of the terpene synthase gene superfam-
ily is an instructive model to address the complex ques- 7, with Atgg-copp1 most likely because it is involved in
primary metabolism. Branchpoint B(Figure 7) indicates tions surrounding the origins of natural products. Ter-
penoids are the largest class of natural products, they the loss of introns 1 and 2 within the class I type Tps
genes to yield those containing 12 introns, 13 exons, are present and often abundant in all phyla, and they
Figure 5.Schematic proposal for evolution of the terpene synthase gene family in plants. This theory is based upon the
most parsimonious account for the loss of introns and the conifer diterpene internal sequence domain (CDIS) over evolutionary
time, sequentially from gymnosperms to angiosperms, and it takes into account the conserved pattern of exon domain size and
intron phase data (not shown in schematic). Step A represents the predicted terpene synthase gene architecture for the ancestral
gene of terpene synthases of both primary and secondary metabolism. Throughout, each colored block represents an exon
(numbered 115) that is charted for dened terpene synthase genes. The downward facing open arrowhead (V) between each
exon represents the positional placement of each intron for introns 114 (Roman numerals in gure). Steps BH are hypothetical
branchpoints in the proposed evolution of the Tps family, in which duplication of the depicted gene occurred. In steps CH,
the duplicated gene is not shown; a side arrow is used to represent the difference between the two duplicated genes illustrating
how the second gene diverged. In steps BF, divergence (illustrated by loss of introns or CDIS) of duplicated gene leads to novel
terpene synthases. Step G is hypothetical, illustrating that the ancestral gene remained conserved in gene architecture within
primary metabolism of gymnosperms, plausibly in gibberellin biosynthesis, and has remained conserved in gene architecture
and function in angiosperms. The genomic sequence of gymnosperm gibberellin biosynthetic pathway genes is unknown and
is depicted by the question marks. Steps E and H indicate exceptions to the otherwise conserved architecture and asterisks have
been placed after the corresponding gene name. Bisabolene synthase has independently lost intron 14 (E), and a simple
explanation for gene architecture and the origin of linalool synthase (Ccglinoh) is presented (H). Each class of Tps gene is
identied as type I, II, or III.
826 S. C. Trapp and R. B. Croteau
Figure 6.Comparison of computer-generated phylogenetic trees for plant terpene synthases. Qualitative characters (intron
and CDIS domain loss) are physically mapped on the trees. Amino acid sequences of dened terpene synthases were used to
generate both trees by using a distance substitution method (A) or Dayhoff method previously published (Bohlmann et al.
1998b) (B). Class III terpene synthases, consisting of all angiosperm terpene synthases of secondary metabolism, are highlighted
in A and B by shaded ovals. The number scale on the bottom of the tree (A) represents amino acid substitution events.
and the CDIS domain (e.g., Tbggtax gene). It is also synthase genes. The latter interpretation, however,
seems less likely because Atgg-copp1 encodes an enzyme plausible that the ancestral candidate gene resembles
Tbggtax, and that introns 1 and 2 (as found in Agggabi that is essential for plant hormone production, which
can be assumed to predate genes encoding enzymes of and Atgg-copp1) resulted from recent intron acquisition.
This rationale could explain why only Agggabi and Atgg- secondary metabolism (i.e., Agggabi, Tbggtax, and Ag-
fEbis). copp1 contain intron 2 and the positionally noncon-
served intron 1 compared to all other class I terpene In our model for the molecular evolution of plant
827 Evolution of Plant Terpene Synthases
Figure 7.A t heoreti cal
model for the evolution of
plant terpene synthase genes
with observed changes in gene
architecture noted (from data
summarized in Figure 5). Let-
ters AGof the dendrogramin-
dicate inferred branchpoints,
with A1 and A2 representing
the progeny of the duplicated
ancestral gene predicted to be
similar to present-day gymno
s perm di t erpene synt has e
genes with 14 introns and 15
exons (class I Tps gene type).
The duplication events to cre-
ate present-day gymnosperm
diterpene synthases (e.g., Atgg-
copp1 and Agggabi each con-
taining 14 introns) occurred
prior to the separation of gym-
nosperms and angiosperms
and preceded the specializa-
tion of terpenoid synthases in-
volved in primary or secondary
metabolism. Subsequently, A1
progeny are predicted to have
remained conserved in geno-
mic structure and function,
whereas A2 progeny have con-
tinuously specialized by gene
duplication and divergence
(involving sequential loss of in-
trons and CDIS domain) to
produce a superfamily of ter-
pene synthases involved in sec-
ondary metabolic processes.
The mechanism of concerted
intron loss can account for the
rst (B) and third (F) loss
events. The second loss event
(D), involving an entire region
spanning exons 4, 5, and 6, and introns 46, gave rise to the class II terpene synthase types and could have occurred by a number
of recombination mechanisms. The third loss event gave rise to class III terpene synthase genes. The loss of single intron 14 in
Agfbis (C) and introns 13 in Ccglinoh (G) appears to have occurred independently of the global evolutionary pattern by
concerted loss or other mechanism and may account for the unusual classication of these genes. The question mark (?) indicates
the uncertain placement of the linalool synthase gene within this scheme (G). The dotted line indicates the predicted placement
of an angiosperm or gymnosperm kaurene synthase for which a genomic structure has not yet been described. The symbols C
10
,
C
15
, and C
20
represent the enzyme class corresponding to monoterpene synthases, sesquiterpene synthases, and diterpene synthases,
respectively.
terpene synthase genes, based on both theoretical (Fig- depicted at branchpoint F (Figure 7). All class III type
Tps genes contain the 6 conserved introns that are ure 7) and computer-generated phylogenetic trees (Fig-
ure 6A), class I type terpene synthase genes gave rise at found in all terpene synthase genes (3, 8, 1114), [these
were previously described as introns 16 (Mau and branchpoint D to class II type terpene synthase genes
(Agg-lim, Agg-pin1, and Agf sel) comprising 9 introns West 1994; Back and Chappell 1995; Chappell
1995a)], and class III types comprise all angiosperm and 10 exons. Characteristically, class II type terpene
synthase genes encode conifer monoterpene andsesqui- monoterpene, sesquiterpene, and diterpene synthases
involved in natural products biosynthesis. terpene synthases involved in secondary metabolism,
and they have lost the entire CDIS domain spanning The current evolutionary hypothesis and the previous
phylogenetic tree (Bohlmann et al. 1998b) both afrm exon regions 4, 5, 6, and a small portion of exon 7
that includes introns 4, 5, and 6. Class III type terpene that plant terpene synthases involved in secondary me-
tabolismhave a commonancestral originfroma terpene synthase genes (Gaf cad, Hmfvet1, Ntfeari4, Rcggcas, and
Pfg-lim) derive from class II types by a further loss of synthase involved in primary metabolism. However,
these models differ in the placement of the class III intron 7 and sequential loss of introns 9 and 10, as
828 S. C. Trapp and R. B. Croteau
terpene synthases (Figures 6, A and B, and 7). The chronology of loss of introns 7, 9, and 10 or the interde-
pendence of these loss events. Nevertheless, the prece- inferred phylogenetic tree in Figure 6A suggests that
the class III types (angiosperm terpene synthases) in- dented mechanisms of concerted intron loss provide a
most reasonable explanation for the observed differ- volved in secondary metabolism are descendants (evolv-
ing by divergent evolution) of the class II types (gymno- ences in terpenoid synthase genes and the derivation
of terpenoid synthases of natural products biosynthesis sperm monoterpene and/or sesquiterpene synthases)
involved in secondary metabolism, whereas Figure 6B from those of primary metabolism.
Mechanism of evolution of terpene synthases genes: indicates that class III types are not direct descendants
of class II types, yet share a common ancestor. The latter The derivation of large gene families presumably in-
volved repeated gene duplication of the ancestral gene (including the gene architecture information as valid)
infers that gymnosperm and angiosperm terpene syn- and divergence by functional and structural specializa-
tion, an evolutionary process now viewed as quite com- thases involved in secondary metabolism diverged prior
to the three loss events illustrated in Figure 6A and that mon (Fryxell 1996; Clegg et al. 1997). Tandem dupli-
cation occurs spontaneously in bacteria at a frequency these subsequent loss events occurred in parallel after
the split between the two groups. This further implies of 10
3
10
5
per locus per generation and is docu-
mented to occur with similar frequency in insect and that terpene synthases underwent gene duplication and
divergence several times to create enzymes of both pri- mammaliangenomes. If spontaneous mutationof dupli-
cated genes occurs frequently, the rate-limiting step in mary and secondary metabolism (divergent evolution),
which subsequently evolved by convergent (or con- retention occurs at the level of natural selection (or
genetic drift) since the altered gene will be lost rapidly certed) evolution. Thus, if the latter holds true (as in
6B), the same set of introns (7, 9, 10), noting that intron unless it acquires a new (functional mutation) and use-
ful function (divergence; Fryxell 1996). These consid- 7 is nonadjacent to 9 and 10, and a complete domain
(including 3 exons and 3 introns) have been lost at least erations suggest that family trees of functionally related
genes coevolved because functionally complementary twice. This possibility cannot be eliminated; yet it seems
unlikely that at least three separate events of sequence gene duplications and divergence events tended to be
retained by natural selection (Fryxell 1996). This hy- loss (event 1: loss of introns 1 and 2; event 2: loss of
introns 9, 10, and 7; event 3: domain loss consisting of pothesis provides a plausible mechanism for the origin
and evolutionary relationships of plant terpene synthase 3 exons and 3 introns) occurred at least twice (if not
more) in the two groups. Thus, Figure 6Aappears closer genes and their encoded enzymes.
On the basis of limited data, several groups have sug- to the true tree based upon maximumparsimony for the
patterns observed in gene architecture of the terpene gested that all plant terpene synthases share a common
evolutionary origin (Colby et al. 1993; Mau and West synthases. Nevertheless, to resolve the conict in topol-
ogy will require improvements in the data set, including 1994; Back and Chappell 1995), with the protein-based
phylogenetic analysis of 33 plant terpene synthases pro- the inclusion of better outgroups (such as ferns, mosses,
and cycads) and additional phyla (other gymnosperms viding the most compelling evidence thus far (Bohl-
mannet al. 1998b). The conservationof genomic organi- and monocots) to improve the inferred phylogenetic
tree for rooting as well as biases. Sequences for gymno- zation that underlies the current evolutionary model
presented here provides substantial further evidence sperm terpene synthases involved in primary metabo-
lism, such as kaurene synthase and copalyl diphosphate that the terpene synthases from gymnosperms and an-
giosperms constitute a superfamily of genes derived synthase, could greatly aid in rooting the tree.
Mechanism of intron loss: The patterns of structural from a shared ancestor. Prior to the divergence of gym-
nosperms and angiosperms, during the carboniferous change observed in the terpene synthase genes, as the
basis of the maximum parsimony model, is concordant period 300 mya (Doyle 1998), the duplication of an
ancestral terpene synthase gene, resembling most with an experimentally demonstrated and presumably
common (Frugoli et al. 1998) process of concerted closely a contemporary diterpene synthase of primary
metabolism, occurred. One copy of the duplicated an- intron loss (Derr et al. 1991). Plausible mechanisms for
concerted intron loss and individual intron loss have cestral gene remained highly conserved in structure and
function, and this gene may have contemporary descen- been proposed (Baltimore 1985; Fink 1987) and have
been demonstrated for other plant gene families (Huang dants in the terpene synthases involved in gibberellin
biosynthesis. The second ancestral gene copy diverged et al. 1990; Kumar and Trick 1993; Hager et al. 1996;
Frugoli et al. 1998). For the evolution of terpene syn- in structure and function, by adaptive evolutionary pro-
cesses, to yield the large superfamily of terpene syn- thase genes, such mechanisms provide an established
rationale by which contiguous blocks of introns can be thases involved in secondary metabolic pathways. Al-
though speculative, it is plausible that the early terpene lost in a single event, such as the concurrent loss of
introns 9 and10, as well as the loss of nonadjacent intron synthase ancestors were functionally less specialized
than modern forms and perhaps able to utilize several 7 in class III Tps types (Figures 5 and 6). Presently,
the data are insufcient to determine unequivocally the prenyl diphosphate substrates for the production of
829 Evolution of Plant Terpene Synthases
multiple terpene types, the specialization into different amino acids (aa), respectively. These additions to exon
15 (an increase from an average size of 100 aa) might synthase classes (for monoterpenes, sesquiterpenes, and
diterpenes) having evolved much later. be explained by a number of mechanisms, including
internal duplications, a mutational change that converts The genetic changes (including distinct patterns of
conservation in exon size, intron placement and intron a stop codon into a sense codon, insertion of a DNA
segment into the exon, or a mutation obliterating a phase, and the apparent sequential loss of introns and
the CDIS element) observed among the plant terpene splice site [also a plausible explanation (Li 1997) for
intron 14 loss in AgfEbis]. These mechanisms, and synthases suggest that gene organization may have been
a greater driving force in the evolution of these enzymes other discrete recombination events such as gene con-
version, unequal crossing, and mutations leading to loss than was previously thought. Gene organization may
have played an important role in diversifying terpene of amino acids within an exon without functional loss
(Li 1997; Cseke et al. 1998), could account for the exon structures and the ecological interactions that they me-
diate. Although the evolutionary connections are un- variation observed in Atgg-copp1 (exons 12, 13, and 15;
see Figure 3) and for the partial N-terminal domain loss clear, the absence of the CDIS domain in the angio-
sperm and gymnosperm monoterpene and sesquiterpene observed in Cbglinoh (including consecutive introns 1,
2, and 3, assuming the ancestral gene was similar in synthase could be signicant for terpene structural di-
versication, and this apparent loss conrms the previ- architecture to Atgg-copp1 or Agggabi). Some uncer-
tainty exists in the evolutionary placement of Ccglinoh ous view(Bohlmann et al. 1998b) that the CDIS domain
is not required for protein function. Furthermore, there (branchpoint G; Figure 7) because it is the only mono-
terpene synthase of angiosperm origin that is included is an obvious correlation between the genetic changes
and an important structural aspect of the terpene syn- in class I (which contains diterpene synthases involved
in gibberellin biosynthesis). This unusual placement thases. Thus, considerable genetic variation occurs in
the 5 portion (N-terminal region of uncertain func- suggests that Ccglinoh has remained more conserved
in structure than any other angiosperm monoterpene tion) of the terpene synthase genes (mutations in the
form of intron and CDIS loss), whereas the 3 portion synthase, or, as proposed by Cseke et al. (1998), that
the gene contains a fragment of a copalyl diphosphate (which encodes the C-terminal active site including ex-
ons 1215; see Figure 4) remains highly conserved in synthase (class I) that resembles the ancestral type.
The monoterpene synthase Ccglinoh and the sesqui- organization and catalytic function (no intron losses or
change in exon size over evolutionary time). terpene synthase AgfEbis most closely resemble conifer
diterpene synthase genes, all of which contain the CDIS Variations and exceptions: It was recently suggested
that linalool synthase from several Clarkia species is a element. Intriguingly, both genes have also lost terminal
introns 3 and 14 that are conserved among all other composite gene resulting froma discrete recombination
event (e.g., domain swapping) between the 5 half of a plant terpene synthases. Most likely, both of these genes
are defunct diterpene synthases that have retained suf- copalyl diphosphate synthase type gene (class I type)
and the 3 half of a limonene synthase type gene (class cient 3 sequence to encode a functional carboxy-ter-
minal active site domain (Starks et al. 1997) and have III type; Cseke et al. 1998). On the basis of this apparent
chimeric structure, Cseke et al. (1998) postulated that undergone nondeleterious mutations that have altered
substrate preference. other terpene synthases could have arisen by similar
recombination between extant terpene synthase types. Predictions, prospect, and signicance: Given the
substantial primary sequence differences between gene Although this mechanism provides a credible explana-
tion for the derivation of linalool synthase, it does not types, the evolutionary relationship of plant terpene
synthase genes to microbial terpene synthase genes and account for the global evolutionary history of the terpe-
noid synthases, as does the present model, whichimplies to the mechanistically related prenyl transferases is un-
clear, although common ancestry has been suggested that all Tps genes share a common ancestor and are
sequentially derived from class I to class III, regardless (Lesburg et al. 1997; Wendt et al. 1997; Bohlmann et
al. 1998b; Cane 1999a; Hohn 1999). Predictions based of precise enzyme mechanism or the presence or place-
ment of specic primary structural elements. upon genomic structural organization (data not shown)
suggest that prenyl transferases and microbial terpene There are several terpene synthase genes of the class
I type that vary in structure from the general classica- synthases do not share a recognizable, common ances-
tor with plant terpene synthases. Thus, two farnesyl di- tion. Ccglinoh and Atgg-copp1 differ in exon size at the
3 termini; Ccglinoh has lost a signicant portion of the phosphate synthase genes from the Arabidopsis data-
base (bothcontain10 introns) were compared to several distal 5 region (exons 13 including the conserved
intron 3), and AgfEbis apparently has independently terpene synthases from the present study, and no sig-
nicant alignment, or conservation of intron placement lost conserved intron 14 (Figure 3). In linalool synthase,
exons 414 have similar exon sizes as do all other Tps or phase, was noted. Aside from the conservation of the
mechanistically relevant DDXXD sequence motif, prior class I types (Figure 3); however, exon 15 of Ccglinoh
and putative Atglinoh contain an additional 78 and 91 comparisons of plant terpene synthase genes to micro-
830 S. C. Trapp and R. B. Croteau
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang
bial terpene synthase genes have not demonstrated sig-
et al., 1997 Gapped BLAST and PSI-BLAST: a new generation
nicant conservation in deduced amino acid sequence
of proteindatabase searchprograms. Nucleic Acids Res. 25: 3389
3402. (Bohlmann et al. 1997). The genomic organization of
Asakawa, Y., 1995 Chemical Constituents of the Bryophytes (Progress in
microbial terpene synthase genes (Hohn and Bere-
the Chemistry of Organic Natural Products, Vol. 65, edited by
mand 1989; Trapp et al. 1998; Hohn 1999) also differs
W. Hertz, G. W. Kirby, R. E. Moore, W. Steglich and C.
Tamm). Springer, Vienna. signicantly from that of plants, in that the former are
Back, K., and J. Chappell, 1995 Cloning and bacterial expression of
only half the length of the latter and contain only 12
a sesquiterpene cyclase fromHyoscyamus muticus and its molecular
introns. The clear implication from these preliminary
comparison to related terpene cyclases. J. Biol. Chem. 270: 7375
7381. comparisons is that the mechanistically related terpene
Baltimore, D., 1985 Retroviruses and retrotransposons: the role of
synthases from these very distant phyla arose by conver-
reverse transcriptase in shaping the eukaryotic genome. Cell 40:
gent evolution.
481482.
Bharathan, G., B.-J. Janssen, E. A. Kellogg and N. Sinha, 1999 This study represents the rst attempt to trace the
Phylogenetic relationships and evolution of the KNOTTED class
molecular evolutionary history of the large multigene
of plant homeodomain proteins. Mol. Biol. Evol. 16: 553563.
family of plant terpene synthases by comparisonof geno-
Bohlmann, J., C. L. Steele and R. Croteau, 1997 Monoterpene
synthases from grand r (Abies grandis): cDNA isolation, charac- mic architecture and has predicted the appearance of
terization and functional expression of myrcene synthase, ()-
a plant terpene synthase ancestor that existed prior to
4S-limonene synthase and ()-(1S,5S)-pinene synthase. J. Biol.
the division of angiosperms and gymnosperms and to
Chem. 272: 2178421792.
Bohlmann, J., J. Crock, R. Jetter and R. Croteau, 1998a Terpen- the separation between primary and secondary terpe-
oid-based defenses in conifers: cDNA cloning, characterization
noid metabolism. Most likely, this ancestral terpene syn-
and functional expression of wound-inducible (E)--bisabolene
thase gene resembled an extant relative of a conifer
synthase from grand r (Abies grandis). Proc. Natl. Acad. Sci. USA
95: 67566761. diterpene synthase of primary metabolism, prior to du-
Bohlmann, J., G. Meyer-Gauen and R. Croteau, 1998b Plant ter-
plication and differentiation in which conservation of
penoid synthases: molecular biology and phylogenetic analysis.
gene organization was maintained for most descendant
Proc. Natl. Acad. Sci. USA 95: 41264133.
Bohlmann, J., M. Phillips, V. Ramachandiran, S. Katoh and R. monoterpene, sesquiterpene, and diterpene synthases.
Croteau, 1999 cDNA cloning, characterization, and functional
To rene the ancestry and mechanism of evolutionary
expression of four new monoterpene synthase members of the
descent proposed here and to verify the generality of
Tpsd gene family from grand r (Abies grandis). Arch. Biochem.
Biophys. 368: 232243. the predictions made will require evaluation of a larger
Bohlmann, J., D. Martin, N. J. Oldham and J. Gershenzon, 2000
sample size of terpene synthases from throughout the
Terpenoid secondary metabolism in Arabidopsis thaliana: cDNA
plant kingdom, including triterpene and tetraterpene
cloning, characterization, and functional expressoin of a myr-
cene/E--ocimene synthase. Arch. Biochem. Biophys. 375: 261 synthases. The nonvascular plants are of particular inter-
269.
est in this regard, especially the liverworts as an ancient
Buckingham, J., 1998 Dictionary of Natural Products on CD-ROM, Ver-
group of land plants that are a rich source of terpenoid
sion 6.1. Chapman & Hall, London.
Cane, D. E., 1990 Enzymatic formation of sesquiterpenes. Chem. natural products (Asakawa 1995) and for which, consis-
Rev. 90: 10891103.
tent withthe intron-early hypothesis (Li 1997; Mathews
Cane, D. E., 1999a Isoprenoid biosynthesis: overview, pp. 113 in
and Trotman 1998), a Tps class I gene architecture
Comprehensive Natural Products Chemistry: Isoprenoids Including Ste-
would be predicted. Data from a wider range of plant roids and Carotenoids, Vol. 2, edited by D. E. Cane. Pergamon,
Oxford.
species will be important also for more accurately de-
Cane, D. E., 1999b Sesquiterpene biosynthesis: cyclization mecha-
signing domain swapping and site-directed mutagenesis
nisms, pp. 155200 in Comprehensive Natural Products Chemistry
experiments that could permit more detailed under- Isoprenoids: Including Steroids and Carotenoids, Vol. 2, edited by D. E.
Cane. Pergamon, Oxford.
standing of structure-function relationships in this en-
Chappell, J., 1995a The biochemistry and molecular biology of
zyme class. Finally, this new approach to understanding
isoprenoid metabolism. Plant Physiol. 107: 16.
the origins of natural product diversity may have broad Chappell, J., 1995b Biochemistry and molecular biology of the iso-
prenoid biosynthetic pathway in plants. Annu. Rev. Plant Physiol.
implications for plant molecular phylogenetics in gen-
Plant Mol. Biol. 46: 521547.
eral, particularly in the provision of novel molecular
Chen, X. Y., M. Wang, Y. Chen, V. J. Davisson and P. Heinstein,
markers and criteria for assessing relatedness.
1996 Cloning and heterologous expression of a second ()-
delta-cadinene synthase from Gossypium arboreum. J. Nat. Prod.
We thank P. Soltis, J. Bohlmann, G. Turner, E. Davis, R. Peters,
59: 944951.
and M. Wise for helpful technical and editorial suggestions; T. Davis,
Christophersen, C., 1996 Theory of the origin, function, and evo-
J. Chappell, and C. A. West for providing access to unpublished se-
lution of secondary metabolites, pp. 677733 in Studies in Natural
quence information; J. Crock, E. Stauber, J. Davis, and D. Pouchnik
Products, Vol. 18, edited by Atta-ur-Rahman. Elsevier, Am-
for technical assistance; and Joyce Tamura for assistance inmanuscript sterdam.
Clegg, M. T., M. P. Cummings and M. L. Durbin, 1997 The evolu- preparation. This work was supported in part by a U.S. Department
tion of plant nuclear genes. Proc. Natl. Acad. Sci. USA 94: 7791 of Agriculture National Research Initiative grant and by grants from
7798.
the National Institutes of Health and the U.S. Department of Energy.
Colby, S. M., W. R. Alonso, E. J. Katahira, D. J. McGarvey and
R. Croteau, 1993 4S-Limonene synthase from the oil glands
of spearmint (Mentha spicata): cDNA isolation, characterization
and bacterial expression of the catalytically active monoterpene
LITERATURE CITED
cyclase. J. Biol. Chem. 268: 2301623024.
Croteau, R., 1987 Biosynthesis and catabolism of monoterpenoids. Altschul, S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman,
Chem. Rev. 87: 929954. 1990 Basic local alignment search tool. J. Mol. Biol. 215: 403
410. Crowell, P. L., and M. N. Gould, 1994 Chemoprevention and
831 Evolution of Plant Terpene Synthases
therapy of cancer by d-limonene. CRC Crit. Rev. Oncogenesis 5: Products Chemistry: Isoprenoids Including Steroids and Cartenoids, Vol.
2, edited by D. E. Cane. Pergamon, Oxford. 122.
Cseke, L., N. Dudareva and E. Pichersky, 1998 Structure and Kumar, V., and M. Trick, 1993 Sequence of the S receptor kinase
gene family in Brassica. Mol. Gen. Genet. 241: 440446. evolution of linalool synthase. Mol. Biol. Evol. 15: 14911498.
Davis, E. M., Y.-S. Chen, M. Essenberg and M. L. Pierce, 1998 Lesburg, C. A., G. Zhai, D. E. Cane and D. W. Christianson, 1997
Crystal structure of pentalenene synthase: mechanistic insights cDNA sequence of a ()-delta-cadinene synthase gene induced
in Gossypium hirsutum L. by bacterial infection. Plant Physiol. 116: on terpenoid cyclization reactions in biology. Science 277: 1820
1824. 1192.
Dawson, F. A., 1994 The Amazing Terpenes. Naval Stores Rev. Lewinsohn, E., T. Savage, M. Gijzen and R. Croteau, 1993 Simul-
taneous analysis of monoterpenes and diterpenoids of conifer March/April: 612.
Derr, L. K., J. N. Strathern and D. J. Garnkel, 1991 RNA- oleoresin. Phytochem. Anal. 4: 220225.
Li, W., 1997 Molecular Evolution. Sinauer Associates, Sunderland, mediated recombination in S. cerevisiae. Cell 67: 355364.
Don, R. H., P. T. Cox, B. J. Wainwright, K. Baker and J. S. Mattick, MA.
Lichtenthaler, H. K., 1999 The 1-deoxy-D-xylulose-5-phosphate 1991 Touchdown PCR to circumvent spurious priming during
gene amplication. Nucleic Acids Res. 19: 4008. pathway of isoprenoid biosynthesis in plants. Annu. Rev. Plant
Physiol. Plant Mol. Biol. 50: 4766. Doyle, J. A., 1998 Phylogeny of vascular plants. Annu. Rev. Ecol.
Syst. 29: 567599. MacMillan, J., and M. Beale, 1999 Diterpene biosynthesis, pp.
217243 in Comprehensive Natural Products Chemistry: Isoprenoids Eckermann, S., G. Schrsder, J. Schmidt, D. Strack, R. A. Edrada
et al., 1998 New pathway to polyketides in plants. Nature 396: Including Steroids and Carotenoids, Vol. 2, edited by D. E. Cane.
Pergamon, Oxford. 387390.
Eisenreich, W., M. Schwarz, A. Cartayrade, D. Arigoni, M. H. Mapleston, R. A., M. J. Stone and D. H. Williams, 1992 The
evolutionary role of secondary metabolitesa review. Gene 115: Zenk et al., 1998 The deoxyxylulose phosphate pathway of terpe-
noid biosynthesis in plants and microorganisms. Chem. Biol. 5: 151157.
Mathews, C. M., and C. A. N. Trotman, 1998 Ancient and recent R221R233.
Facchini, P. J., and J. Chappell, 1992 Gene family for an elicitor- intron stability in Artemia hemoglobin gene. J. Mol. Evol. 47:
763771. induced sesquiterpene cyclase in tobacco. Proc. Natl. Acad. Sci.
USA 89: 1108811092. Mathews, S., M. Lavin and R. A. Sharrock, 1995 Evolution of the
phytochrome gene family and its utility for phylogenetic analysis Fink, G. R., 1987 Pseudogenes in yeast? Cell 49: 56.
Frugoli, J. A., M. A. McPeek, T. L. Thomas and C. R. McClung, of angiosperms. Ann. Mo. Bot. Gard. 82: 296321.
Mau, C. J. D., and C. A. West, 1994 Cloning of casbene synthase 1998 Intron loss and gain during evolution of the catalase gene
family in angiosperms. Genetics 149: 355365. cDNA: evidence for conserved structural features among terpe-
noid cyclases in plants. Proc. Natl. Acad. Sci. USA 91: 84798501. Fryxell, K. J., 1996 The coevolution of gene family trees. Trends
Genet. 12: 356369. Newman, J. D., and J. Chappell, 1999 Isoprenoid biosynthesis in
plants: carbon partitioning within the cytoplasmic pathway. Crit. Hager, K.-P., B. Mu ller, C. Wind, S. Erbach and H. Fischer, 1996
Evolution of legumin genes: loss of an ancestral intron at the Rev. Biochem. Mol. Biol. 34: 95106.
Ogura, K., and T. Koyama, 1998 Enzymatic aspects of isoprenoid beginning of angiosperm diversication. FEBS Lett. 387: 9498.
Hanley, B. A., and M. A. Schuler, 1988 Plant intron sequences: chain elongation. Chem. Rev. 98: 12631276.
Patterson, A. H., C. L. Brubaker and J. F. Wendel, 1993 A rapid evidence for distinct groups of introns. Nucleic Acids Res. 16:
71597176. method for extraction of cotton (Gossypium spp.) genomic DNA
suitable for RFLP or PCR analysis. Plant Mol. Biol. Rep. 11: Harborne, J. B., 1991 Recent advances in the ecological chemistry
of plant terpenoids, pp. 396426 in Ecological Chemistry and Bio- 122125.
Qureshi, N., and J. W. Porter, 1981 Conversionof acetyl-Coenzyme chemistry of Plant Terpenoids, edited by J. B. Harborne and F. A.
Tomas-Barberan. Clarendon Press, Oxford. A to isopentenyl pyrophosphate, pp. 4794 in Biosynthesis of Iso-
prenoid Compounds, Vol. 1, edited by J. W. Porter and S. L. Hohn, T. M., 1999 Cloning and expression of terpene synthase
genes, pp. 201215 in Comprehensive Natural Products Chemistry: Spurgeon. John Wiley & Sons, New York.
Ramos-Valdivia, A. C., R. van der Heiden and R. Verpoorte, 1997 Isoprenoids Including Steroids and Carotenoids, Vol. 2, edited by D. E.
Cane. Pergamon, Oxford. Isopentenyl diphosphate isomerase: a core enzyme in isoprenoid
biosynthesis. A review of its biochemistry and function. Nat. Prod. Hohn, T. M., and P. D. Beremand, 1989 Isolation and nucleotide
sequence of a sesquiterpene cyclase gene fromthe trichothecene- Rep. 14: 591603.
Sambrook, J., E. F. Fritsch and T. Maniatis, 1989 Molecular Clon- producing organism Fusarium sporotrichioides. Gene 79: 131138.
Holmes, F. A., A. P. Kudelka, J. J. Kavanagh, M. H. Huber, J. A. ing: A Laboratory Manual, Ed. 2. Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, NY. Ajani et al., 1995 Current status of clinical trials with paclitaxel
and docetaxel, pp. 3157 in Taxane Anticancer Agents: Basic Science Schrsder, J., 1997 A family of plant-specic polyketide synthases:
facts and predictions. Trends Plant Sci. 2: 373378. and Current Status, edited by G. I. Georg, T. T. Chen, I. Ojima
and D. M. Vyas. American Chemical Society Symposium Series Schrsder, J., 1999 The chalcone/stilbene synthase-type family of
condensing enzymes, pp. 749771 in Comprehensive Natural Prod- 583, Washington, DC.
Huang, N., T. D. Sutliff, J. C. Litts and R. L. Rodriguez, 1990 Clas- ucts Chemistry: Polyketides and Other Secondary Metabolites Including
Fatty Acids and Their Derivatives, Vol. 1, edited by U. Sankawa. sication and characterization of the rice -amylase multigene
family. Plant Mol. Biol. 14: 655668. Elsevier Science, Oxford.
Starks, C. M., K. Back, J. Chappell and J. P. Noel, 1997 Structural Jarvis, B. B., 2000 The role of natural products in evolution, pp.
124 in Recent Advances in Phytochemistry-Evolution of Metabolic Path- basis for cyclic terpene biosynthesis by tobacco 5-epi-aristolo-
chene synthase. Science 277: 18151820. ways, Vol. 34, edited by J. T. Romeo, R. K. Ibrahim, L. Varin
and V. DeLuca. Plenum Press, NY. Steele, C. L., J. Crock, J. Bohlmann and R. Croteau, 1998 Sesqui-
terpene synthases from grand r (Abies grandis). Comparison of Jarvis, B. B., and J. M. Miller, 1996 Natural products, complexity,
and evolution, pp. 265293 in Phytochemical Diversity and Redun- constitutive and wound-inducible activities, and cDNA isolation,
characterization, and bacterial expression of -selinene synthase dancy in Ecological Interactions, edited by J. T. Romeo. Plenum
Press, NY. and -humulene synthase. J. Biol. Chem. 273: 20782089.
Stofer Vogel, B., M. Wildung, G. Vogel and R. Croteau, 1996 Katoh, S., and R. Croteau, 1998 Individual variation in constitutive
and induced monoterpene biosynthesis in grand r (Abies Abietadiene synthase from grand r (Abies grandis): cDNA isola-
tion, characterization and bacterial expression of a bifunctional grandis). Phytochemistry 47: 577582.
Koepp, A. E., M. Hezari, J. Zajicek, B. Stofer Vogel, R. E. LaFever diterpene cyclase involved in resin acid biosynthesis. J. Biol.
Chem. 271: 2326223268. et al., 1995 Cyclization of geranylgeranyl diphosphate to taxa-
4(5),11(12)-diene is the committed step of Taxol biosynthesis in Sun, T., and Y. Kamiya, 1994 The Arabidopsis GA1 locus encodes
the cyclase ent-kaurene synthetase A of gibberellin biosynthesis. Pacic yew. J. Biol. Chem. 270: 86868690.
Koyama, T., and K. Ogura, 1999 Isopentenyl diphosphate iso- Plant Cell 6: 15091518.
Sun, T. P., H. M. Goodman and F. M. Ausubel, 1992 Cloning the merase and prenyltransferases, pp. 6996 in Comprehensive Natural
832 S. C. Trapp and R. B. Croteau
Arabidopsis GA1 locus by genomic subtraction. Plant Cell 4: Williams, D. C., D. J. McGarvey, E. J. Katahira and R. Croteau,
119128. 1998 Truncation of limonene synthase preprotein provides a
Trapp, S., T. M. Hohn, S. McCormick and B. B. Jarvis, 1998 Char- fully active pseudomature form of this monoterpene cyclase
acterization of the gene cluster for biosynthesis of macrocyclic and reveals the function of the amino-terminal arginine pair.
trichothecenes in Myrothecium roridum. Mol. Gen. Genet. 257:
Biochemistry 37: 1221312220.
421432.
Williams, D. H., M. J. Stone, P. R. Hauck and S. K. Rahman, 1989
Turner, G., 1993 Gene organization in lamentous fungi, pp. 107
Why are secondary metabolites (natural products) synthesized?
125 in The Euykaryotic Genome: Organization and Regulation, edited
J. Nat. Prod. 52: 11891208.
by P. M. A. Borda, S. Oliver and P. F. G. Sims. Cambridge
Wise, M. L., and R. Croteau, 1999 Monoterpene biosynthesis, pp.
University Press, New York.
97153 in Comprehensive Natural Products Chemistry: Isoprenoids In-
Van Geldre, E., A. Vergauwe and E. Van den Eeckhout, 1997
cluding Steroids and Carotenoids, Vol. 2, edited by D. E. Cane.
State of the art of the production of the antimalarial compound
Pergamon, Oxford.
artemisinin in plants. Plant Mol. Biol. 33: 199209.
Yamaguchi, S., T. P. Sun, H. Kawaide and Y. Kamiya, 1998 The
Waters, E. R., and E. Vierling, 1999 The diversication of plant
GA2 locus of Arabidopsis thaliana encodes ent-kaurene synthase
cytosolic small heat shock proteins preceded the divergence of
of gibberellin biosynthesis. Plant Physiol. 116: 12711278.
mosses. Mol. Biol. Evol. 16: 127139.
Yuba, A., K. Yazaki, M. Tabata, G. Honda and R. Croteau, 1996
Wendt, K. U., K. Poralla and G. E. Szhulz, 1997 Structure and
cDNA cloning, characterization, and functional expression of 4S-
function of a squalene cyclase. Science 277: 18111815.
()-limonene synthase from Perilla frutescens. Arch. Biochem.
West, C. A., 1981 Biosynthesis of diterpenes, pp. 375411 in Biosyn-
Biophys. 332: 280287.
thesis of Isoprenoid Compounds, Vol. 1, edited by J. W. Porter and
Zinkel, D. F., and J. Russell, 1989 Naval Stores: Production, Chemistry,
S. L. Spurgeon. John Wiley & Sons, New York.
Utilization. Pulp Chemicals Association, New York, 1060 pp.
Wildung, M. R., and R. Croteau, 1996 A cDNA clone for taxadiene
synthase, the diterpene cyclase that catalyzes the committed step
Communicating editor: V. L. Chandler
of Taxol biosynthesis. J. Biol. Chem. 271: 92019204.