Martin Schberl University of Technology Vienna, Austria J OP Overview 2 J OP Research Targets J ava processor Time-predictable architecture Small design Working solution (FPGA) J OP Overview 3 Overview Motivation Research objectives J ava and the J VM Related work J OP architecture Results Conclusions, future work J OP Overview 4 Current Praxis C and assembler Embedded systems are RT systems Different RTOS J IT is not possible J VM interpreter are slow => J ava processor J OP Overview 5 Why J ava? Safe OO language No pointers Type-safety Garbage collection Built in model for concurrency Platform independent Very rich standard library J OP Overview 6 Research Objectives Primary objectives: Time-predictable J ava platform Small design A working processor Secondary objectives: Acceptable performance A flexible architecture Real-time profile for J ava J OP Overview 7 J ava and the J VM J ava language definition Class library The J ava virtual machine (J VM) An instruction set the bytecodes A binary format the class file An algorithm to verify the class file J OP Overview 8 The J VM instruction set 32 (64) bit stack machine Variable length instruction set Simple to very complex instructions Symbolic references Only relative branches J OP Overview 9 Memory Areas for the J VM Stack Most often accessed On-chip memory as cache Code Novel instruction cache Class description and constant pool Heap J OP Overview 10 Implementations of the J VM Interpreter J ust-in-time compilation Batch compilation Hardware implementation J OP Overview 11 Related Work picoJ ava SUN, never released aJ ile J EMCore Available, RTSJ , two versions Komodo Multithreaded J ava processor FemtoJ ava Application specific processor J OP Overview 12 Research Objectives picoJ ava aJ ile Komodo FemtoJ ava J OP Predictability - - . - . + + Size - - - + - + + Performance + + + - - - + J VM conf. + + + - - - . Flexibility - - - - + + + + + J OP Overview 13 J OP Architecture Overview Microcode Processor pipeline An efficient stack machine Instruction cache J OP Overview 14 J OP Block Diagram J OP Overview 15 J VM Bytecode Issue Simple and complex instruction mix No bytecodes for native functions Common solution (e.g. in picoJ ava): Implement a subset of the bytecodes SW trap on complex instructions Overhead for the trap 16 to 926 cycles Additional instructions (115!) J OP Overview 16 J OP Solution Translation to microcode in hardware Additional pipeline stage No overhead for complex bytecodes 1 to 1 mapping results in single cycle execution Microcode sequence for more complex bytecodes Bytecodes can be implemented in J ava J OP Overview 17 Microcode Stack-oriented Compact Constant length Single cycle Low-level HW access An example dup: dup nxt // 1 to 1 mapping // a and b are scratch variables // for the JVM code. dup_x1: stm a // save TOS stm b // and TOS1 ldm a // duplicate TOS ldm b // restore TOS1 ldm a nxt // restore TOS // and fetch next bytecode J OP Overview 18 Processor Pipeline J OP Overview 19 Interrupts Interrupt logic at bytecode translation Special bytecode Transparent to the core pipeline Interrupts under scheduler control Priority for device drivers No additional blocking time Integration in schedulability analysis J itter free timer events Bound to a thread J OP Overview 20 An Efficient Stack Machine J VM stack is a logical stack Frame for return information Local variable area Operand stack Argument-passing regulates the layout Operand stack and local variables need caching J OP Overview 21 Stack access Stack operation Read TOS and TOS-1 Execute Write back TOS Variable load Read from deeper stack location Write into TOS Variable store Read TOS Write into deeper stack location J OP Overview 22 Two-Level Stack Cache Dual read only from TOS and TOS-1 Two register (A/B) Dual-port memory Simpler Pipeline No forwarding logic Instruction fetch Instruction decode Execute, load or store J OP Overview 23 J VM Properties Short methods Maximum method size is restricted No branches out of or into a method Only relative branches J OP Overview 24 Proposed Cache Solution Full method cached Cache fill on call and return Cache misses only at these bytecodes Relative addressing No address translation necessary No fast tag memory Simpler WCET analysis J OP Overview 25 Architecture Summary Microcode 1+3 stage pipeline Two-level stack cache Method cache The JVM is a CISC stack architecture, whereas JOP is a RISC stack architecture. J OP Overview 26 Results Size Compared to soft-core processors General performance Application benchmark (KFL & UDP/IP) Various J ava systems Real-time performance 100MHz J OP 266MHz Pentium MMX Simple RT profile RTSJ /RT-Linux J OP Overview 27 Size of FPGA processors Processor Resources Memory f max [LC] [KB] [MHz] J OP min. 1077 3.25 98 J OP typ. 1831 3.25 101 Lightfoot 3400 1 40 Komodo 2600 ? 33/4 FemtoJ ava 2000 ? 4 NIOS 2923 5.5 119 SPEAR 1700 8 80 J OP Overview 28 Application Benchmark 1 10 100 1000 10000 100000 l e J O S K V M T I N I C j i p K o m o d o a J 8 0 E J C a J 1 0 0 J O P g c j P e r f o r m a n c e
( i t e r a t i o n s / s ) J OP Overview 29 Benchmark Scaled 0.1 1.0 10.0 100.0 1000.0 l e J O S K V M T I N I C j i p K o m o d o a J 8 0 E J C a J 1 0 0 J O P g c j R e l a t i v e
p e r f o r m a n c e
( i t e r a t i o n / s ) J OP Overview 30 Periodic Thread J itter Period J OP RTSJ /Linux Min. Max. Min. Max. 50 us 35 us 63 us - - 70 us 70 us 70 us - - 100 us 100 us 100 us - - 5 ms 5 ms 5 ms 0.017 ms 19.9 ms 10 ms 10 ms 10 ms 0.019 ms 19.9 ms 30 ms 30 ms 30 ms 29.7 ms 30.3 ms 35 ms 35 ms 35 ms 29.8 ms 40.3 ms J OP Overview 31 Context Switch J OP RTSJ /Linux Min. Max. Min. Max. Thread 2676 2709 11529 21090 SW Event 2773 2935 63060 101292 Low priority thread records current time High priority periodic/event thread measures elapsed time after unblocking Time in cycles J OP Overview 32 Applications Kippfahrleitung Distributed motor control BB Vereinfachtes Zugleitsystem GPS, GPRS, supervision TeleAlarm Remote tele-control Data logging Automation J OP Overview 33 J OP in Research University of Lund, SE Application specific hardware (J ava->VHDL) Hardware garbage collector Technical University Graz, AT HW accelerator for encryption University of York, GB J avamen HW for real-time systems Institute of Informatics at CBS, DK RT GC, Embedded RT Machine Learning University of California, Irvine, USA WCET Analysis J OP Overview 34 J OP for Teaching Easy access open-source Computer architecture Embedded systems UT Vienna J VM in hardware course Digital signal processing lab CBS Distributed data mining (WS 2005) Very small information systems (SS 2006) Wikiversity J OP Overview 35 Contributions Real-time J ava processor Exactly known execution time of the BCs No mutual dependency between BCs Time-predictable method cache Resource-constrained processor RISC stack architecture Efficient stack cache Flexible architecture J OP Overview 36 Future Work HW for Real-time garbage collector Instruction cache WC analysis Multiprocessor J VM J ava computer J OP Overview 37 More Information J OP Thesis and source http://www.jopdesign.com/thesis/index.jsp http://www.jopdesign.com/download.jsp Various papers http://www.jopdesign.com/docu.jsp