The State of LuaJit for ARM 64 - August/September 2016

This page is outdated. Please see LuaJIT for ARM64 for the status of LuaJIT JIT support on ARM64.

This is a detailed plan for enabling ARM64 JIT support in LuaJIT.  This will be updated as development progresses.

LuaJIT Technical Investigation

DONE - See the following links:

ARM64 LuaJIT Interpreter Bug

DONE - This has been fixed in the upstream git repository branch 'v2.1' by the previous maintainer:

commit 0c6fdc1039a3a4450d366fba7af4b29de73f0dc6
Author: Mike Pall <mike>
Date:   Mon Apr 18 10:57:49 2016 +0200
Rewrite memory block allocator.
    
Use a mix of linear probing and pseudo-random probing.
Workaround for 1GB MAP_32BIT limit on Linux/x64. Now 2GB with !LJ_GC64.
Enforce 128TB LJ_GC64 limit for > 47 bit memory layouts (ARM64).

Update LuaJIT 2.1 Beta Memory Block Allocator Patch to Fix ARM64 Interpreter Bug

UNKNOWN - The previous maintainer rewrote the memory block allocator but has not yet posted an updated LuaJIT 2.1 Beta to reflect the change.

 

Official 2.1 release with LuaJIT ARM64 Interpreter Support

IN PROGRESS - Presumably this is the purview of the maintainer.  The previous maintainer has expressed his intention to do this, but not given an updated time-table.

Identify new LuaJIT Maintainer

UNKNOWN - This is a community effort and no-one has yet stepped forward.

Create Linaro LuaJIT git Repository Working Branch For ARM64 Port

DONE http://git.linaro.org/toolchain/luajit-aarch64.git

ARM64 LuaJIT JIT port

DONE - LuaJIT runs code in the interpreter until it detects a hot-spot and then it generates SSA-IR (intermediary representation) and attempts to run the JIT.  The general strategy for developing the AArch64 JIT port is to:

    1. Create the AArch64 framework based on the ARM32 port with all of the architecture specific functions stubbed out as unimplemented "lua_unimpl"
    2. Create a series of simple programs which, when executed with the JIT enabled, expose the stub functions as unimplemented.
    3. Sequentially implement the stub functions for ARM64
    4. Increasingly complicate the test-programs to expose more stubs.
    5. Enable a program that forces garbage collection
    6. Work on missing LJ_GC64 support as exposed by the garbage collector.

      Functional completeness has been announced by the project team: 
      https://lists.linaro.org/pipermail/luajit/2016-September/000081.html 

Enable the JIT on ARM64

DONE By default the interpreter is enabled and the JIT is disabled.

Create ARM64 Stubs Informed by the ARM Port and Mark Them as Unimplemented

DONE - In total there are 96 unimplemented stubs to work from:

src/lj_asm_arm64.h:69
src/lj_emit_arm64.h:25
src/lj_target_arm64.h:1
src/luaconf.h:1
Total: 96

  Note: 
The complexity of each individual stub function varies.  Progress can not be accurately evaluated based on number of stubs completed.

Create Test Programs To Expose Unimplemented Stubs

DOE - A series of increasing complicated test applications will be written to expose the unimplemented stubs

Create Simple Test Program To Expose Initial Unimplemented Stubs

DONE - The first simple test program will be a summation of the numbers 1 to 100 in a for loop. 

Enhance Complexity Of Test Program When Simple Test Stops Hitting Stub Functions

 DONE

Execute The Test-Suite When Test Programs No Longer Expose Unimplemented Stubs

 DONE

 

Port Stub Functions to ARM64 when running with the JIT enabled against test programs

DONE - This is the bulk of the porting effort.  This includes both LJ_emit and LJ_asm for ARM64 as well as LuaJIT IR functions.

Get simple test-program working with the JIT.

.DONE - This is of unknown complexity.  We're hoping that this will be completed by Wednesday 18, May.

May 20, 2016 - Simple test-program completes execution of tracing after fixing bug in instruction coding.  Development has now moved into implementing the trace exit handler.

Open tree for collaboration and divide remaining stubs between participants

DONE - Once the simple test program is executing with the JIT we will divide the remaining stubs between participants who will execute test-programs of varying complexity to expose the remaining stubs.


LJ_GC64 Support for ARM64

DONE - This is the ARM64 implementation of the following LuaJIT issue: https://github.com/LuaJIT/LuaJIT/issues/25.  It is hypothesized that this will not be encountered until the garbage collector is encountered.  Note: RT-RK has raised the concern that the LJ_GC64 issue will be hit much sooner than when enabling the garbage collector, for instance when allocating strings or using global variables.  The sub-tasks have been updated to reflect this.

Port work-in-progress branch for x86_64 LJ_GC64 to ARM64 work-in-progress tree

 DONE - Peter Cawley's work-in-progress branch should be ported to the ARM64 branch as there are common code changes that will make the ARM64 implementation of LJ_GC64 easier: https://github.com/corsix/LuaJIT/tree/x64

ARM64 Work-in-Progress branch has integrated the x86_64 LJ_GC64 patch as it fixes some random segementation faults.

Create test-program to allocate strings in order to expose LJ_GC64 implementation issues

 OPEN

Create test-program to use global variables in order to expose LJ_GC64 implementation issues

 OPEN

Create Test Program To Expose The Garbage Collector

OPEN - This program will specifically allocate and then throw away memory to invoke the garbage collector.  The garbage collector itself should already be working because it runs with the interpreter.  The interaction between the JIT and the structures which the garbage collector uses which are expected to intersect with the LJ_GC64 issue

ARM64 Disassembler

DONE - This is functionally separate from JIT enablement but will likely be required to commit a fully functional port.
May 20, 2016 - RT-RK has committed the first parts of the ARM64 assembler .  It is already able to disassemble several instructions (such as ADD, SUB, AND, ORR). The work will be continued further.   

LJ_GC64 Support for x86_64

DONE  -  It's possible to use an x86_64 enablement to inform the AArch64 enablement for LJ_GC64.  This is tracked in the following issue: https://github.com/LuaJIT/LuaJIT/issues/25.  There is a work-in-progress branch by Peter Cawley at https://github.com/corsix/LuaJIT/tree/x64
  • Peter Cawley has completed the initial support for LJ_GC64 and it has been checked in upstream

    commit 2868715d80b6ac497a7f08393ec325b60d71df8d
    Author: Mike Pall <mike>
    Date: Mon May 23 06:01:54 2016 +0200

    x64/LJ_GC64: Add missing backend support and enable JIT compilation.

    Contributed by Peter Cawley.


ARM64 LuaJIT JIT Optimizations

OPEN - This will be updated with a list of optimization opportunities as we identify them.

52-bit (and greater) Virtual-Addressibility Support In LuaJIT

IN PROGRESS - This will be updated with a list of optimization opportunities as we identify them.

Investigate The Problem

DONE - Todo for Ryan Arnold - Write a synopsis of the current discussion.

Discuss Solutions With Linux Kernel community

IN PROGRESS - Ideas are to restrict the address space of allocated memory based on cgroups (or other methods).

Kernel Implementation

OPEN - Ideas are to restrict the address space of allocated memory based on cgroups (or other methods).

Discussion With Other JITs To Discover Similar Problems And Solutions

IN PROGRESS - Todo for Ryan Arnold - Write a synopsis of ongoing discussion

User-Space Implementation

OPEN OPTIONAL- Possible modifications to mmap, system-calls, cgroups interface modifications etc.

LuaJIT Implementation

OPEN  - Depends on decided-upon solution.