You are on page 1of 37

J OP: A J ava Optimized Processor

for Embedded Real-Time Systems


Martin Schberl
University of Technology Vienna,
Austria
J OP Overview 2
J OP Research Targets
J ava processor
Time-predictable architecture
Small design
Working solution (FPGA)
J OP Overview 3
Overview
Motivation
Research objectives
J ava and the J VM
Related work
J OP architecture
Results
Conclusions, future work
J OP Overview 4
Current Praxis
C and assembler
Embedded systems are RT systems
Different RTOS
J IT is not possible
J VM interpreter are slow
=> J ava processor
J OP Overview 5
Why J ava?
Safe OO language
No pointers
Type-safety
Garbage collection
Built in model for concurrency
Platform independent
Very rich standard library
J OP Overview 6
Research Objectives
Primary objectives:
Time-predictable J ava platform
Small design
A working processor
Secondary objectives:
Acceptable performance
A flexible architecture
Real-time profile for J ava
J OP Overview 7
J ava and the J VM
J ava language definition
Class library
The J ava virtual machine (J VM)
An instruction set the bytecodes
A binary format the class file
An algorithm to verify the class file
J OP Overview 8
The J VM instruction set
32 (64) bit stack machine
Variable length instruction set
Simple to very complex instructions
Symbolic references
Only relative branches
J OP Overview 9
Memory Areas for the J VM
Stack
Most often accessed
On-chip memory as cache
Code
Novel instruction cache
Class description and constant pool
Heap
J OP Overview 10
Implementations of the J VM
Interpreter
J ust-in-time compilation
Batch compilation
Hardware implementation
J OP Overview 11
Related Work
picoJ ava
SUN, never released
aJ ile J EMCore
Available, RTSJ , two versions
Komodo
Multithreaded J ava processor
FemtoJ ava
Application specific processor
J OP Overview 12
Research Objectives
picoJ ava aJ ile Komodo FemtoJ ava J OP
Predictability - - . - . + +
Size - - - + - + +
Performance + + + - - - +
J VM conf. + + + - - - .
Flexibility - - - - + + + + +
J OP Overview 13
J OP Architecture
Overview
Microcode
Processor pipeline
An efficient stack machine
Instruction cache
J OP Overview 14
J OP Block Diagram
J OP Overview 15
J VM Bytecode Issue
Simple and complex instruction mix
No bytecodes for native functions
Common solution (e.g. in picoJ ava):
Implement a subset of the bytecodes
SW trap on complex instructions
Overhead for the trap 16 to 926 cycles
Additional instructions (115!)
J OP Overview 16
J OP Solution
Translation to microcode in hardware
Additional pipeline stage
No overhead for complex bytecodes
1 to 1 mapping results in single cycle
execution
Microcode sequence for more complex
bytecodes
Bytecodes can be implemented in J ava
J OP Overview 17
Microcode
Stack-oriented
Compact
Constant length
Single cycle
Low-level HW
access
An example
dup: dup nxt // 1 to 1 mapping
// a and b are scratch variables
// for the JVM code.
dup_x1: stm a // save TOS
stm b // and TOS1
ldm a // duplicate TOS
ldm b // restore TOS1
ldm a nxt // restore TOS
// and fetch next bytecode
J OP Overview 18
Processor Pipeline
J OP Overview 19
Interrupts
Interrupt logic at bytecode translation
Special bytecode
Transparent to the core pipeline
Interrupts under scheduler control
Priority for device drivers
No additional blocking time
Integration in schedulability analysis
J itter free timer events
Bound to a thread
J OP Overview 20
An Efficient Stack Machine
J VM stack is a logical stack
Frame for return information
Local variable area
Operand stack
Argument-passing regulates the layout
Operand stack and local variables need
caching
J OP Overview 21
Stack access
Stack operation
Read TOS and TOS-1
Execute
Write back TOS
Variable load
Read from deeper stack location
Write into TOS
Variable store
Read TOS
Write into deeper stack location
J OP Overview 22
Two-Level Stack Cache
Dual read only from TOS and
TOS-1
Two register (A/B)
Dual-port memory
Simpler Pipeline
No forwarding logic
Instruction fetch
Instruction decode
Execute, load or store
J OP Overview 23
J VM Properties
Short methods
Maximum method size is restricted
No branches out of or into a method
Only relative branches
J OP Overview 24
Proposed Cache Solution
Full method cached
Cache fill on call and return
Cache misses only at these bytecodes
Relative addressing
No address translation necessary
No fast tag memory
Simpler WCET analysis
J OP Overview 25
Architecture Summary
Microcode
1+3 stage pipeline
Two-level stack cache
Method cache
The JVM is a CISC stack architecture,
whereas JOP is a RISC stack architecture.
J OP Overview 26
Results
Size
Compared to soft-core processors
General performance
Application benchmark (KFL & UDP/IP)
Various J ava systems
Real-time performance
100MHz J OP 266MHz Pentium MMX
Simple RT profile RTSJ /RT-Linux
J OP Overview 27
Size of FPGA processors
Processor Resources Memory f
max
[LC] [KB] [MHz]
J OP min. 1077 3.25 98
J OP typ. 1831 3.25 101
Lightfoot 3400 1 40
Komodo 2600 ? 33/4
FemtoJ ava 2000 ? 4
NIOS 2923 5.5 119
SPEAR 1700 8 80
J OP Overview 28
Application Benchmark
1
10
100
1000
10000
100000
l
e
J
O
S
K
V
M
T
I
N
I
C
j
i
p
K
o
m
o
d
o
a
J
8
0
E
J
C
a
J
1
0
0
J
O
P
g
c
j
P
e
r
f
o
r
m
a
n
c
e

(
i
t
e
r
a
t
i
o
n
s
/
s
)
J OP Overview 29
Benchmark Scaled
0.1
1.0
10.0
100.0
1000.0
l
e
J
O
S
K
V
M
T
I
N
I
C
j
i
p
K
o
m
o
d
o
a
J
8
0
E
J
C
a
J
1
0
0
J
O
P
g
c
j
R
e
l
a
t
i
v
e

p
e
r
f
o
r
m
a
n
c
e

(
i
t
e
r
a
t
i
o
n
/
s
)
J OP Overview 30
Periodic Thread J itter
Period J OP RTSJ /Linux
Min. Max. Min. Max.
50 us 35 us 63 us - -
70 us 70 us 70 us - -
100 us 100 us 100 us - -
5 ms 5 ms 5 ms 0.017 ms 19.9 ms
10 ms 10 ms 10 ms 0.019 ms 19.9 ms
30 ms 30 ms 30 ms 29.7 ms 30.3 ms
35 ms 35 ms 35 ms 29.8 ms 40.3 ms
J OP Overview 31
Context Switch
J OP RTSJ /Linux
Min. Max. Min. Max.
Thread 2676 2709 11529 21090
SW Event 2773 2935 63060 101292
Low priority thread records current time
High priority periodic/event thread measures
elapsed time after unblocking
Time in cycles
J OP Overview 32
Applications
Kippfahrleitung
Distributed motor control
BB
Vereinfachtes Zugleitsystem
GPS, GPRS, supervision
TeleAlarm
Remote tele-control
Data logging
Automation
J OP Overview 33
J OP in Research
University of Lund, SE
Application specific hardware (J ava->VHDL)
Hardware garbage collector
Technical University Graz, AT
HW accelerator for encryption
University of York, GB
J avamen HW for real-time systems
Institute of Informatics at CBS, DK
RT GC, Embedded RT Machine Learning
University of California, Irvine, USA
WCET Analysis
J OP Overview 34
J OP for Teaching
Easy access open-source
Computer architecture
Embedded systems
UT Vienna
J VM in hardware course
Digital signal processing lab
CBS
Distributed data mining (WS 2005)
Very small information systems (SS 2006)
Wikiversity
J OP Overview 35
Contributions
Real-time J ava processor
Exactly known execution time of the BCs
No mutual dependency between BCs
Time-predictable method cache
Resource-constrained processor
RISC stack architecture
Efficient stack cache
Flexible architecture
J OP Overview 36
Future Work
HW for Real-time garbage collector
Instruction cache WC analysis
Multiprocessor J VM
J ava computer
J OP Overview 37
More Information
J OP Thesis and source
http://www.jopdesign.com/thesis/index.jsp
http://www.jopdesign.com/download.jsp
Various papers
http://www.jopdesign.com/docu.jsp