Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update instruction table to capture the correct state of EFlags #53806

Merged
5 commits merged into from
Jun 8, 2021

Conversation

kunalspathak
Copy link
Member

In #53214, I included the information of which EFlags gets updated per instruction. But for few instructions like imul and div, the flags were wrong. Fix those entries. I also went through each instruction in Intel's manual "Appendix A" "EFlags Cross-Reference" . I also made sure that I capture all possible state of EFlags per instruction e.g. undefined or restore so in future we can mitigate such issue.

A request to reviewer to please cross check from the Intel manual to make sure if I didn't miss anything.

No asmdiffs in libraries/benchmarks. Only asmdiff in coreclr_tests.pmi is the test case that was failing.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 25772
Total bytes of diff: 25777
Total bytes of delta: 5 (0.02% of base)
Total bytes of relative delta: 0.016923668075580132
    diff is a regression.
    relative diff is a regression.


Top file regressions (bytes):
           3 : 228538.dasm (0.01% of base)
           2 : 86265.dasm (1.68% of base)

2 total files with Code Size differences (0 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           3 ( 0.01% of base) : 228538.dasm - ILGEN_0x372a9ae6:Method_0xdc6ff1a4(byte,byte,int,long,ushort,double,long,long):int
           2 ( 1.68% of base) : 86265.dasm - Test:A(int):int

Top method regressions (percentages):
           2 ( 1.68% of base) : 86265.dasm - Test:A(int):int
           3 ( 0.01% of base) : 228538.dasm - ILGEN_0x372a9ae6:Method_0xdc6ff1a4(byte,byte,int,long,ushort,double,long,long):int

2 total methods with Code Size differences (0 improved, 2 regressed), 0 unchanged.

Fixes: #53781

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 7, 2021
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib, @tannergooding


// id nm um mr mi rm flags

// Note that emitter has only partial support for BT. It can only emit the reg,reg form
// and the registers need to be reversed to get the correct encoding.
INST3(bt, "bt", IUM_RD, 0x0F00A3, BAD_CODE, 0x0F00A3, INS_FLAGS_WritesAllFlags) // PF = ?, AF = ?, ZF = ?, SF = ?, OF = ?
INST3(bt, "bt", IUM_RD, 0x0F00A3, BAD_CODE, 0x0F00A3, Undefined_OF | Undefined_SF | Undefined_AF | Undefined_PF | Writes_CF )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is interesting and looks to possibly be an AMD vs Intel difference.

Intel specifies ZF is unaffected while AMD specifies it is undefined

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, AMD had a different implementation of this instruction (and all the other BTx instruction) decades ago prior to/around the Athlon64. Since then the documentation declares these flags as being Undefined. All recent CPUs behave like Intel's. There were some issues with the gcc optimizer that assumed these flags didn't change, causing branching errors. That's why I remember.

On a side note, the BTx instructions have a nasty behavior when applied on memory and the index-register (or immediate value) is larger than the size of the destination. Then the destination address is taken as an array, and you can get out of bounds unaware. bts qword [rax], 0xC0 behaves like bts qword [rax+24], 0 -- I just mention it for safety reasons... I'm sure the JIT team is aware of it. Or as the code comment suggests, isn't using it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep it as Undefined_ZF in that case, to be conservative. No optimization will take benefit from flags that are marked as Undefined for a given instruction.

@kunalspathak
Copy link
Member Author

@BruceForstall or @AndyAyersMS - Can you take a look. This fixes a blocking outerloop issue.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question

The table looks fine, but I didn't review it thoroughly

src/coreclr/jit/emitxarch.cpp Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented Jun 8, 2021

Hello @kunalspathak!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure in JIT/Regression/JitBlue/GitHub_13822
4 participants