Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading corefreqk module utterly breaks my system (Kernel block layer?) #418

Closed
WildPenquin opened this issue Feb 22, 2023 · 8 comments
Closed
Labels

Comments

@WildPenquin
Copy link

WildPenquin commented Feb 22, 2023

For some reason, loading the corefreqk utterly breaks my system after running with the module for a few hours. The system is stable (with uptimes up to several days) when not loading corefreqk module. I'm running on arch linux, on the -zen kernel branch, and I've installed corefreq with the help of AUR packages. I'm running on the B550 chipset (MSI Tomahawk) and 5950X

The symptoms include things which could all be explained by the block layer of the Kernel breaking down such that the Kernel can not anymore read or write on any block devices, including the root filesystem. Logs can not and will not be written to, journalctl can not be run to check the currently running logs. Running simple commands such as cat(from /bin/cat) will result in I/O errors or segmentation faults.

Yesterday, I loaded corefreqk at 17:16:23 and the system broke down at 23:15:00, as at that time the journal ceased to be written to. The system was in an almost unusable state as I've described in the previous paragraph.

I'm a bit reluctant in reproducing the bug since it will take 30 to 300 minutes to trigger, and as it involves the block layer, I fear it might result in data loss (though, so far, I haven't had any data loss because of this bug). But it is indeed the corefreqk module which makes the system unstable, and I can reproduce the bug (eventually, just not at will).

Any ideas as to how to monitor or get useful debug logs before reproducing (if needed)? Would ssh:ing to the computer before triggering and running journalctl -f (or similar) and saving the output on the other computer be a good idea?

@cyring
Copy link
Owner

cyring commented Feb 23, 2023

corefreqk module. I'm running on arch linux, on the -zen kernel branch

I will first stop on this. I have not tested this branch.
If Patches are bringing shared components in conflict then lockups can appear.
EDIT: where are those Zen patches ?

The most fragile being SMU where sensors are periodically read from.
If received commands are interlaced between two sources (corefreqk.ko and Zen Patches) then it is unknown.

CSR Registers, accessed using index and data registers can't also be interlaced. See CSR Register in amd_reg.h

CoreFreq is pretty exclusive in the way it works. Best prerequisites are the mainstream kernel with most SMU, CSR, MSR, P-State, C-Staes drivers not loaded.

To track your issue, I would suggest two three approches:

  1. Dump, throw kernel log to an external destination using the crontab
  2. Run in virtualization. Less good because faulty Registers may not be involved in a VM PC.
  3. Boot CoreFreq ISO and let it run for the time required. This will confirm or eliminate the patched kernel assumption.

@WildPenquin
Copy link
Author

WildPenquin commented Feb 23, 2023

Hi cyring, thanks for your reply!

One thing before I proceed: browsing previous bug reports I noticed there is the section Software incompatibilities and workarounds. I had k10temp loaded at the same time. Do you think this could be the cause? I am/was using the sensors constantly, as I need that to adjust my cooling system.

But I also found an old bug report where it was claimed k10temp can now be used alongside corefreqk, but it's not in the wiki. Which is the current state of things?

EDIT: As for the Zen patches, I'm not sure there is an easy way to get a list of all the code chances. I'm a bit over my head to know what chances could cause problems. But the upstream code repository is here: https://github.com/zen-kernel/zen-kernel - and there's some information in their Wiki and FAQ (both in the github) about their branches. Also: https://liquorix.net/#features (the Liquorix Kernel is essentially the same AFAICT?)

@cyring
Copy link
Owner

cyring commented Feb 23, 2023

I'm spending hours of programming with CoreFreq running in parallel without a crash.
But mine is a Matisse 3950X and the plain archlinux kernel. No k10temp and other sensors drivers in used.

Thus to be sure of the environment, I would say the best way is to boot my ISO image and let it run for a minimum of 30 minutes.

Please let me know if you can proceed with this ?

@cyring
Copy link
Owner

cyring commented Mar 1, 2023

@WildPenquin Hello,

Because your kernel flavor is not supported, did you have a chance to run the ISO master live image with your Processor ?

Get ISO at www.cyring.fr

@cyring cyring added the invalid label Mar 2, 2023
@cyring
Copy link
Owner

cyring commented Mar 2, 2023

I'm about 4 hours with no crash

2023-03-02-123224_642x410_scrot
2023-03-02-123219_644x1012_scrot

Feel free to provides answers.

@cyring cyring closed this as completed Mar 2, 2023
@cyring cyring mentioned this issue Mar 3, 2023
@cyring
Copy link
Owner

cyring commented Mar 9, 2023

  • PC with 3950X has booted 6.2.2-zen1-1-zen kernel. All modules loaded.
  • CoreFreq is running in parallel of a Sensors loop which query temperature from k10temp
while /bin/true ; do clear; sensors; done
  • CoreFreq interval has been reduced to 150 ms
    2023-03-09-193634_268x387_scrot
  • After 8 hours, including a short Suspend To Ram period, the issue is not reproduced
    2023-03-09-193846_644x143_scrot

Thus I suspect a correlation with your kernel environment.
That's why I'm asking you to boot and test with my ISO.

@cyring
Copy link
Owner

cyring commented Apr 1, 2024

Hi,
Are you still facing a crash using latest master branch ?
Can you show screenshots if ok ?

@WildPenquin
Copy link
Author

Hi,

Given all the hassle I've given up on trying to use corefreqk. The last time I tried, yes, I did get the issue on all Arch Kernel flavors I tried; as I'm a bit time-constrained, I don't have the time to debug this in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants