Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux instability #182

Open
clbr opened this issue Dec 25, 2020 · 10 comments
Open

Linux instability #182

clbr opened this issue Dec 25, 2020 · 10 comments

Comments

@clbr
Copy link
Contributor

clbr commented Dec 25, 2020

When running my port of Linux on (patched) cen64, it's unstable in ways that real hw is not. Very hard to track down random hangs that don't happen on hw, and there's a small chance the patches are at fault, but now that it's ready, others can try too.

I have a suspicion the TLB logic contains some more bugs, given how many I already found there, but there could be others too.

https://github.com/clbr/n64bootloader/releases

I have the following patches applied to cen64 currently. I'll be submitting PRs as the old ones get reviewed.

  • Teach the profiler about L1D misses
  • Implement ll/lld/sc/scd
  • Implement trap instructions (by James Lambert)
  • Implement Reserved Instruction exception
  • Implement fpu prid
  • commented out the TLB valid check lines
  • changed cen64_one_hot_lut[] TLB fetch to __builtin_ffs
  • corrected TLB mod exception behavior
@tj90241
Copy link
Collaborator

tj90241 commented Dec 26, 2020

Yeah, finding the last hidden issues is quite the endeavor.

Sometime in 2015-2016 when I was actively working on this code, I had the VR4300 component isolated and booting a Linux kernel all the way to initrd loading, and that was successful in fuzzing out some CP0 issues. I broke the TLB valid check (the issue you found) after that particular fuzzing endeavor. I'm surprised the TLB mod exception issue never turned up, though.. that's a new one.

Because the code models the pipeline and cache to a point where you can almost write synthesizable logic around it, there is also always the possibility that it may also be a bug with an instruction not getting squashed correctly or something. This particular case works, I think, but as for an example of how this gets tricky:

lw $at, $s0  # assume this raises an exception
tlbwi  # this instruction must be squashed in the pipeline

... but TLBWI writes to CP0 while it's in the EX stage:
https://github.com/n64dev/cen64/blob/master/vr4300/cp0.c#L267

so we inject a fault when the lw exception is raised in the DC stage and propagate it along:
https://github.com/n64dev/cen64/blob/master/vr4300/fault.c#L50

which is used on the next cycle to prevent the EX stage from executing:
https://github.com/n64dev/cen64/blob/master/vr4300/pipeline.c#L437

@tj90241
Copy link
Collaborator

tj90241 commented Dec 26, 2020

Some gotchas with sign extension too, here's another one I found during my initial Linux fuzzing:
9d9655c

@bryanperris
Copy link

bryanperris commented Jan 10, 2021

I am wondering why cen64 always shifts the virtual address by 13, when the MIPS docs says to shift the offset off based upon the pagemask register. When I do the math, shifting by 13 seems to give me the correct VPN2 value to do the search on while shifting by the page size (16 bits) will give me a value way too small. I know shift by 13 bits works for EntryHi.

@tj90241
Copy link
Collaborator

tj90241 commented Jan 10, 2021

@bryanperris That's just an optimization-related thing. x86 SSE encoding does not allow variable-length shifts (it must be a constant coded into the instruction word).
So, instead, we say "let's just shift off what we know will be an offset into the page (4k pages, 2 pages per TLB = 13 bits) and then AND off dynamically to workaround the fact we cannot shift dynamically. This is what check_l = _mm_and_si128(vpn, page_mask_l); is accomplishing. So, ultimately, the comparison (check_l = _mm_cmpeq_epi32(check_l, vpn_l);) is done with regards to pagemask still.

@bryanperris
Copy link

@tj90241 Thanks, that makes sense now. In the case of 4K pages, why shift off the 13th bit when the mask for offset is 0xFFF? Is that to apply the divide by 2 for the VPN?

@tj90241
Copy link
Collaborator

tj90241 commented Jan 10, 2021

Correct, it's because in MIPS the smallest page size is 4k (12 bits), and each physical TLB entry provides a mapping for 2 pages ("VPN2"), which is where the 13th bit comes in. The SSE lookup is just trying to find the "VPN2" entry in hardware -- once the DC stage has a hit, it will use the full address (again) to determine if EntryLo/EntryHi matches, etc.

@bryanperris
Copy link

Looking at your pipeline code, it calls the tlb_probe function to find the index of the matching entry. Does cen64 only handle 4K pages?

@tj90241
Copy link
Collaborator

tj90241 commented Jan 10, 2021

Right - tlb_probe is only responsible for finding the hardware entry. Then the pipeline uses the attributes of that entry to select the right page/etc.:

      tlb_miss = tlb_probe(&vr4300->cp0.tlb, vaddr, asid, &index);
      page_mask = vr4300->cp0.page_mask[index];
      select = ((page_mask + 1) & vaddr) != 0;
...
      cached = ((vr4300->cp0.state[index][select] & 0x38) != 0x10);
      paddr = (vr4300->cp0.pfn[index][select]) | (vaddr & page_mask);

@awsms
Copy link

awsms commented Jul 30, 2024

Has anyone been able to compile it on Linux? Even the debian build task fails on Github

@clbr
Copy link
Contributor Author

clbr commented Jul 31, 2024

awsms, this report is about running Linux on cen64. If you're trying to compile cen64 on Linux, please open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants