Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64 alignment issue #167

Closed
ehudmarvell opened this issue Dec 23, 2019 · 21 comments
Closed

arm64 alignment issue #167

ehudmarvell opened this issue Dec 23, 2019 · 21 comments

Comments

@ehudmarvell
Copy link

ehudmarvell commented Dec 23, 2019

Hi,

I am using 0.11.0-alpha-7 version (not the last one due to fp enablement)
Compiling to ARM64 cortex-a55 core.
And I got an alignment issue :

> 00000000000033a8 <_vfiprintf_r>:
>     33a8:	a9b27bfd 	stp	x29, x30, [sp, #-224]!
>     33ac:	910003fd 	mov	x29, sp
>     33b0:	a90153f3 	stp	x19, x20, [sp, #16]
>     33b4:	aa0103f4 	mov	x20, x1
>     33b8:	aa0203f3 	mov	x19, x2
>     33bc:	a9025bf5 	stp	x21, x22, [sp, #32]
>     33c0:	aa0003f5 	mov	x21, x0
>     33c4:	a90363f7 	stp	x23, x24, [sp, #48]
>     33c8:	aa0303f7 	mov	x23, x3
>     33cc:	a9046bf9 	stp	x25, x26, [sp, #64]
>     33d0:	f9002bfb 	str	x27, [sp, #80]
>     33d4:	b4000080 	cbz	x0, 33e4 <_vfiprintf_r+0x3c>
>     33d8:	b9405001 	ldr	w1, [x0, #80]
>     33dc:	35000041 	cbnz	w1, 33e4 <_vfiprintf_r+0x3c>
>     33e0:	94000a56 	bl	5d38 <__sinit>
>     33e4:	79402280 	ldrh	w0, [x20, #16]
>     33e8:	36180880 	tbz	w0, #3, 34f8 <_vfiprintf_r+0x150>
>     33ec:	f9400e80 	ldr	x0, [x20, #24]
>     33f0:	b4000840 	cbz	x0, 34f8 <_vfiprintf_r+0x150>
>     33f4:	52860400 	mov	w0, #0x3020                	// #12320
**>     33f8:	780993e0 	sturh	w0, [sp, #153]**

on the last line sturh w0, [sp, #153] I am getting alignment exception due to 153 offset

Can you suggest why?

Thanks,
Ehud

@stephanosio
Copy link
Member

cc @carlocaione

@carlocaione
Copy link
Contributor

@ehudmarvell which branch are you using? how are you testing this? what are you compiling? how? I guess we need some more info about this.

@ehudmarvell
Copy link
Author

ehudmarvell commented Dec 23, 2019

Hi,
@carlocaione

I download the SDK from "https://github.com/zephyrproject-rtos/sdk-ng/releases/tag/v0.11.0-alpha-7" zephyr-toolchain-arm64-0.11.0-alpha-7-setup.run

On zepher, I am rebase to this commit:( + our changes to ARM64 support which still not publish)
6933248e0cb4f7af31e2bab5b39c594806ab53ac - Jukka Rissanen, 3 months ago : net: shell: ping: Figure out the output network interface

I testing that on our cortex A55 chip, I am compiling with:
set(ARCH_FOR_cortex-a55 armv8.2-a+nofp )
set(CROSS_COMPILE_TARGET_arm aarch64-zephyr-elf)
export ZEPHYR_TOOLCHAIN_VARIANT=zephyr
export ZEPHYR_SDK_INSTALL_DIR=/opt/zephyr-sdk-arm64

I success to run without newlib, but with newlib I have this problem.

Thanks,
Ehud

@carlocaione
Copy link
Contributor

@ehudmarvell are you aware that there is an ongoing effort to upstream ARM64 support at zephyrproject-rtos/zephyr#20263?

Which code/test are you compiling? Just to have a way to reproduce your issue.

Is this reproducible when rebasing on the current master?

@ehudmarvell
Copy link
Author

@carlocaione I am familiar with zephyrproject-rtos/zephyr#20263,

Is this reproducible when rebasing on the current master? We Didn't try, currently it demand us a lot of effort. So I am consult you maybe you have an idea.
I don't think it is matter what is my code because the unalign access is on the libc code which come from the SDK(_vfiprintf_r), and if it isn't align(fix me if I wrong), what do you think?

@stephanosio
Copy link
Member

@ehudmarvell For now, ensure that SCTLR_ELn.A is not set in your arch implementation. If set, try setting it to 0 and see if the alignment exception goes away.

As for triage, I will investigate what other releases are doing tomorrow and make changes if necessary.

@carlocaione
Copy link
Contributor

on top of what @stephanosio suggested try also to set SCTLR_ELn.SA to 0.

@stephanosio
Copy link
Member

stephanosio commented Dec 24, 2019

zephyr-sdk-0.11.0-alpha-7

/opt/sdk/zephyr-sdk-0.11.0-alpha-8/aarch64-zephyr-elf/bin/aarch64-zephyr-elf-objdump -d /opt/sdk/zephyr-sdk-0.11.0-alpha-7/aarch64-zephyr-elf/aarch64-zephyr-elf/lib/libc.a | grep "<_vfiprintf_r>:" -A 30
00000000000000d4 <_vfiprintf_r>:
  d4:   a9b27bfd        stp     x29, x30, [sp, #-224]!
  d8:   910003fd        mov     x29, sp
  dc:   a90153f3        stp     x19, x20, [sp, #16]
  e0:   aa0103f4        mov     x20, x1
  e4:   aa0203f3        mov     x19, x2
  e8:   a9025bf5        stp     x21, x22, [sp, #32]
  ec:   aa0003f5        mov     x21, x0
  f0:   a90363f7        stp     x23, x24, [sp, #48]
  f4:   aa0303f7        mov     x23, x3
  f8:   a9046bf9        stp     x25, x26, [sp, #64]
  fc:   f9002bfb        str     x27, [sp, #80]
 100:   b4000080        cbz     x0, 110 <_vfiprintf_r+0x3c>
 104:   b9405001        ldr     w1, [x0, #80]
 108:   35000041        cbnz    w1, 110 <_vfiprintf_r+0x3c>
 10c:   94000000        bl      0 <__sinit>
 110:   79402280        ldrh    w0, [x20, #16]
 114:   36180880        tbz     w0, #3, 224 <_vfiprintf_r+0x150>
 118:   f9400e80        ldr     x0, [x20, #24]
 11c:   b4000840        cbz     x0, 224 <_vfiprintf_r+0x150>
 120:   52860400        mov     w0, #0x3020                     // #12320
>124:   780993e0        sturh   w0, [sp, #153]<
 128:   a94006e0        ldp     x0, x1, [x23]
 12c:   a90607e0        stp     x0, x1, [sp, #96]
 130:   52800038        mov     w24, #0x1                       // #1
 134:   a94106e0        ldp     x0, x1, [x23, #16]
 138:   90000017        adrp    x23, 0 <__sfputc_r>
 13c:   910002f7        add     x23, x23, #0x0
 140:   a90707e0        stp     x0, x1, [sp, #112]
 144:   b90097ff        str     wzr, [sp, #148]
 148:   aa1303f9        mov     x25, x19
...

zephyr-sdk-0.11.0-alpha-8

/opt/sdk/zephyr-sdk-0.11.0-alpha-8/aarch64-zephyr-elf/bin/aarch64-zephyr-elf-objdump -d /opt/sdk/zephyr-sdk-0.11.0-alpha-8/aarch64-zephyr-elf/aarch64-zephyr-elf/lib/libc.a | grep "<_vfiprintf_r>:" -A 30
0000000000000000 <_vfiprintf_r>:
       0:       a9a57bfd        stp     x29, x30, [sp, #-432]!
       4:       910003fd        mov     x29, sp
       8:       a90153f3        stp     x19, x20, [sp, #16]
       c:       a9025bf5        stp     x21, x22, [sp, #32]
      10:       a90363f7        stp     x23, x24, [sp, #48]
      14:       f90023f9        str     x25, [sp, #64]
      18:       f90047e0        str     x0, [sp, #136]
      1c:       f90043e1        str     x1, [sp, #128]
      20:       f9003fe2        str     x2, [sp, #120]
      24:       aa0303f3        mov     x19, x3
      28:       f900c3ff        str     xzr, [sp, #384]
      2c:       f900bfff        str     xzr, [sp, #376]
      30:       f94047e0        ldr     x0, [sp, #136]
      34:       f900bbe0        str     x0, [sp, #368]
      38:       f940bbe0        ldr     x0, [sp, #368]
      3c:       f100001f        cmp     x0, #0x0
      40:       540000e0        b.eq    5c <_vfiprintf_r+0x5c>  // b.none
      44:       f940bbe0        ldr     x0, [sp, #368]
      48:       b9405000        ldr     w0, [x0, #80]
      4c:       7100001f        cmp     w0, #0x0
      50:       54000061        b.ne    5c <_vfiprintf_r+0x5c>  // b.any
      54:       f940bbe0        ldr     x0, [sp, #368]
      58:       94000000        bl      0 <__sinit>
      5c:       f94043e0        ldr     x0, [sp, #128]
      60:       79c02000        ldrsh   w0, [x0, #16]
      64:       12003c00        and     w0, w0, #0xffff
      68:       121d0000        and     w0, w0, #0x8
      6c:       7100001f        cmp     w0, #0x0
      70:       540000a0        b.eq    84 <_vfiprintf_r+0x84>  // b.none
      74:       f94043e0        ldr     x0, [sp, #128]
...

@stephanosio
Copy link
Member

I am using 0.11.0-alpha-7 version (not the last one due to fp enablement)

I noticed that aarch64-zephyr-elf is not being built with multilib; this will need to be addressed separately.

@stephanosio
Copy link
Member

stephanosio commented Dec 24, 2019

@ehudmarvell Are you able to confirm if this issue can be fixed by setting SCTLR_ELn.A = 0?

The only condition for STURH that generates an alignment fault is if SCTLR.A = 1 (refer to the pages 7342 and 7338 of the ARMv8-A ARM), assuming your stack pointer is sane.

@ehudmarvell
Copy link
Author

Hi,
compiling the SDK manual return me another results, I will investigate it and update

@ehudmarvell
Copy link
Author

ehudmarvell commented Dec 24, 2019

Hi @stephanosio @carlocaione

After taking the latest SDK and compiling it to Arm64, I am falling on memset also on unalign access.

The line:

>   26cc:	3d800000 	str	q0, [x0]

And X0 = 0x33E358
Which should be align to 16

> 0000000000002640 memset:
>     2640:	4e010c20 	dup	v0.16b, w1
>     2644:	8b020004 	add	x4, x0, x2
>     2648:	f101805f 	cmp	x2, #0x60
>     264c:	540003c8 	b.hi	26c4 <memset+0x84>  // b.pmore
>     2650:	f100405f 	cmp	x2, #0x10
>     2654:	54000202 	b.cs	2694 <memset+0x54>  // b.hs, b.nlast
>     2658:	4e083c01 	mov	x1, v0.d[0]
>     265c:	361800a2 	tbz	w2, #3, 2670 <memset+0x30>
>     2660:	f9000001 	str	x1, [x0]
>     2664:	f81f8081 	stur	x1, [x4, #-8]
>     2668:	d65f03c0 	ret
>     266c:	d503201f 	nop
>     2670:	36100082 	tbz	w2, #2, 2680 <memset+0x40>
>     2674:	b9000001 	str	w1, [x0]
>     2678:	b81fc081 	stur	w1, [x4, #-4]
>     267c:	d65f03c0 	ret
>     2680:	b4000082 	cbz	x2, 2690 <memset+0x50>
>     2684:	39000001 	strb	w1, [x0]
>     2688:	36080042 	tbz	w2, #1, 2690 <memset+0x50>
>     268c:	781fe081 	sturh	w1, [x4, #-2]
>     2690:	d65f03c0 	ret
>     2694:	3d800000 	str	q0, [x0]
>     2698:	373000c2 	tbnz	w2, #6, 26b0 <memset+0x70>
>     269c:	3c9f0080 	stur	q0, [x4, #-16]
>     26a0:	36280062 	tbz	w2, #5, 26ac <memset+0x6c>
>     26a4:	3d800400 	str	q0, [x0, #16]
>     26a8:	3c9e0080 	stur	q0, [x4, #-32]
>     26ac:	d65f03c0 	ret
>     26b0:	3d800400 	str	q0, [x0, #16]
>     26b4:	ad010000 	stp	q0, q0, [x0, #32]
>     26b8:	ad3f0080 	stp	q0, q0, [x4, #-32]
>     26bc:	d65f03c0 	ret
>     26c0:	d503201f 	nop
>     26c4:	12001c21 	and	w1, w1, #0xff
>     26c8:	927cec03 	and	x3, x0, #0xfffffffffffffff0
>     26cc:	3d800000 	str	q0, [x0]  >>>>>>>>>>>>>>>>>>>>>

I try to set SCTLR_ELn.SA & SCTLR_ELn.A = 0 , but stll got alignment exception:
image

Thanks

@stephanosio
Copy link
Member

@ehudmarvell AFAICT, based on the pseudocodes, you should not be getting alignment faults with SCTLR.A and SCTLR.SA set to 0, unless the memory region you are trying to access is defined as something other than "normal."

From ARM docs:

Except for exclusive and ordered accesses, all loads and stores support the use of unaligned addresses when accessing normal memory. This simplifies porting code to A64.

Maybe MMU is enabled in your arch port and the page tables are configured incorrectly (e.g. 'strongly ordered' attribute is set).

@ehudmarvell
Copy link
Author

@stephanosio the mmu is disabled, I will investigate why we are getting this exception

@ehudmarvell
Copy link
Author

@stephanosio currently it seem that I have architecture limitation on alignment.
But let assume I am using mmu. Is newlib+arm64 should work fine? if yes, I shouldn't have this problem here too, if no, I will glad if you can explain.

Thank you very much

@stephanosio
Copy link
Member

Apparently, with the MMU disabled, all data accesses are treated as if they were to a Device-nGnRnE memory type (strongly ordered-equivalent), meaning an unaligned access will result in an alignment fault[1].

Data access The stage 1 translation assigns the Device-nGnRnE memory type.

Alignment checking is performed, and therefore Alignment faults can occur.

It seems we did not catch this in zephyrproject-rtos/zephyr#20263 because QEMU does not emulate the alignment fault[2]:

QEMU does not currently emulate unaligned access traps for ARM guest code. This is a reflection of the fact that its traditional primary purpose is "run correct guest code as quickly as possible"

This means that we need to do either of the following:

  1. Enable MMU stage 1 address translation with flat memory mapping in the arch port OR
  2. Compile all code targeting ARMv7-A and ARMv8-A to never use unaligned access (i.e. specify -mno-unaligned-access)

The second approach may not be feasible and/or desirable for the following reasons:

  • There are many architectural limitations regarding the Device memory type (note that the memory type is "Device"-nGnRnE when MMU is disabled)
  • There may exist some code that require unaligned access support
  • For GCC, -mno-unaligned-access is known to increase code size (though whether this really matters for Cortex-A is arguable)

[1] https://armv8-ref.codingbelief.com/en/chapter_d4/d42_8_the_effects_of_disabling_a_stage_of_address_translation.html#all-other-accesses
[2] https://stackoverflow.com/questions/51520635/how-to-emulate-arm-unaligned-memory-access-exceptions

@ehudmarvell
Copy link
Author

@ stephanosio Thank you for you answer

Compiling it with -mno-unaligned-access won't solve the newlib problem, because this code already compiled( I already using -mno-unaligned-access).

@stephanosio
Copy link
Member

stephanosio commented Dec 26, 2019

@ stephanosio Thank you for you answer

Compiling it with -mno-unaligned-access won't solve the newlib problem, because this code already compiled( I already using -mno-unaligned-access).

@ehudmarvell The -mno-unaligned-access approach requires everything (including newlib, libstdc++, ...) to be compiled with that option, which is one of the reasons why I mentioned adding MMU support would be better.

Have a look at the following; Broadcom has already implemented MMU support on top of the @carlocaione's AArch64 port:
zephyrproject-rtos/zephyr#20263 (comment)

@ehudmarvell
Copy link
Author

ehudmarvell commented Dec 26, 2019

@stephanosio,
Can you explain how " MMU with flat memory "will solve the unaligned exception?
Thank you

@stephanosio
Copy link
Member

stephanosio commented Dec 26, 2019

Can you explain how " MMU with flat memory "will solve the unaligned exception?

#167 (comment)

That is, of course, unless your entire RAM is "strongly ordered," but I see that to be very unlikely.

@ehudmarvell
Copy link
Author

@stephanosio I updating that the problem solved after using broadcom MMU code.
Thank you for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants