Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high cpu usage with >1 instance running #1022

Closed
mlauss2 opened this issue Mar 25, 2024 · 25 comments
Closed

Very high cpu usage with >1 instance running #1022

mlauss2 opened this issue Mar 25, 2024 · 25 comments
Labels
support Support request

Comments

@mlauss2
Copy link

mlauss2 commented Mar 25, 2024

Describe the problem

I have 3 instances of jkbms-ble connected; cpu usage on the cerbo gx is basically 100%, GUI is really slow, and the display of the battery states in the device list is lagging about 2 minutes behind the lynx shunt and the pv inverter display (both update still in realtime though). For instance when PV power drops below the baseline, the lynx shunt current becomes negative, while the 3 batteries still show positive values for about 2 minutes, then they rapidly drop, as if the displayed values are simply 2 minutes behind.
Also, the Multis once or twice a day complain about "BMS: connection lost" and the EM24 RS485 connection also dies for a few minutes.

"ps" on the cerbo shows each dbus-serialbattery process consuming about 10% cpu.

I've now disabled all but one battery, and since then the cpu usage of the whole system has dropped considerably, to 55% idle, the GUI updates much much smoother, and also the state and currentflow reading of the one battery are in line with the readings of the lynx shunt; the "bms connection lost" messages are gone and the em24 reading is also more stable.

Any ideas on what to do here?

Thanks!
Manuel

Driver version

v1.1.20240121

Venus OS device type

Cerbo GX

Venus OS version

3.30

BMS type

JKBMS / Heltec BMS

Cell count

12

Battery count

3

Connection type

Bluetooth

Config file

[DEFAULT]

EXCLUDED_DEVICES = /dev/ttyUSB0, /dev/ttyUSB1, /dev/ttyUSB2, /dev/ttyUSB3, /dev/ttyUSB4, /dev/ttyUSB5
BMS_TYPE = Jkbms

; BAT1, BAT2, BAT3
;BLUETOOTH_BMS = Jkbms_Ble C8:47:8C:EE:C3:BD, Jkbms_Ble C8:47:80:05:82:78, Jkbms_Ble C8:47:80:05:85:16
; BAT1
BLUETOOTH_BMS = Jkbms_Ble C8:47:8C:EE:C3:BD

MAX_BATTERY_CHARGE_CURRENT = 150
MAX_BATTERY_DISCHARGE_CURRENT = 150

MIN_CELL_VOLTAGE   = 3.250
MAX_CELL_VOLTAGE   = 4.150
FLOAT_CELL_VOLTAGE = 4.140
SOC_RESET_VOLTAGE =  4.150

CVCM_ENABLE = True
CVL_ICONTROLLER_MODE = True

CCCM_SOC_ENABLE = False
DCCM_SOC_ENABLE = False

DCCM_CV_ENABLE = True
CELL_VOLTAGES_WHILE_DISCHARGING   = 3.250, 3.3, 3.35, 4.20
MAX_DISCHARGE_CURRENT_CV_FRACTION =     0,  0.5,    1,    1

CCCM_CV_ENABLE = True
CELL_VOLTAGES_WHILE_CHARGING   = 4.15, 4.12, 4.11, 3.125
MAX_CHARGE_CURRENT_CV_FRACTION =    0,  0.5,     1,    1

CCCM_T_ENABLE = True
DCCM_T_ENABLE = True
TEMPERATURE_LIMITS_WHILE_CHARGING = 0,   2,   5,  10,  15, 20, 35,  40, 55
MAX_CHARGE_CURRENT_T_FRACTION     = 0, 0.1, 0.5,   1 ,  1,  1,  1, 0.4,  0
TEMPERATURE_LIMITS_WHILE_DISCHARGING = -20,   0,   5,   10, 15, 45, 55
MAX_DISCHARGE_CURRENT_T_FRACTION     =   0, 0.2, 0.4, 0.75,  1,  1,  0

Relevant log output

2024-03-25 08:19:51.151131500 
2024-03-25 08:19:51.151138500 INFO:Bluetooth details
2024-03-25 08:19:51.228159500 Attempting to disconnect from C8:47:8C:EE:C3:BD
2024-03-25 08:19:51.228168500 Successful disconnected
2024-03-25 08:19:56.370366500 Device C8:47:8C:EE:C3:BD (public)
2024-03-25 08:19:56.370374500   Alias: 1_B2A24S15P
2024-03-25 08:19:56.370375500   Paired: no
2024-03-25 08:19:56.370377500   Trusted: no
2024-03-25 08:19:56.370378500   Blocked: no
2024-03-25 08:19:56.370379500   Connected: no
2024-03-25 08:19:56.370381500   LegacyPairing: no
2024-03-25 08:19:56.370382500   UUID: Device Information        (0000180a-0000-1000-8000-00805f9b34fb)
2024-03-25 08:19:56.370385500   RSSI: -50
2024-03-25 08:19:56.372566500 
2024-03-25 08:19:59.992337500 INFO:SerialBattery:
2024-03-25 08:20:00.009594500 INFO:SerialBattery:Starting dbus-serialbattery
2024-03-25 08:20:00.011701500 INFO:SerialBattery:dbus-serialbattery v1.1.20240121
2024-03-25 08:20:01.397528500 INFO:SerialBattery:Init of Jkbms_Ble at C8:47:8C:EE:C3:BD
2024-03-25 08:20:01.411740500 INFO:SerialBattery:Test of Jkbms_Ble at C8:47:8C:EE:C3:BD
2024-03-25 08:20:13.898129500 INFO:SerialBattery:--> asy_connect_and_scrape(): Exit
2024-03-25 08:20:13.965853500 ERROR:SerialBattery:No BMS found at C8:47:8C:EE:C3:BD
2024-03-25 08:20:13.965864500 ERROR:SerialBattery:ERROR >>> No battery connection at Jkbms_Ble
2024-03-25 08:20:14.327874500 
2024-03-25 08:20:14.328342500 INFO:Bluetooth details
2024-03-25 08:20:14.382321500 [CHG] Device C8:47:8C:EE:C3:BD RSSI: -50
2024-03-25 08:20:14.382329500 [CHG] Device C8:47:8C:EE:C3:BD RSSI: -50
2024-03-25 08:20:14.546599500 Attempting to disconnect from C8:47:8C:EE:C3:BD
2024-03-25 08:20:14.546608500 [CHG] Device D4:9D:C0:8B:99:07 RSSI is nil
2024-03-25 08:20:14.546611500 [DEL] Device D4:9D:C0:8B:99:07 D4-9D-C0-8B-99-07
2024-03-25 08:20:14.546614500 [CHG] Device C8:47:80:05:85:16 RSSI is nil
2024-03-25 08:20:14.546617500 [CHG] Device C8:47:8C:EE:C3:BD RSSI is nil
2024-03-25 08:20:14.546620500 [CHG] Device C8:47:80:05:82:78 RSSI is nil
2024-03-25 08:20:14.546622500 [CHG] Controller 68:4E:05:C3:BF:32 Discovering: no
2024-03-25 08:20:14.546747500 Successful disconnected
2024-03-25 08:20:19.707416500 Device C8:47:8C:EE:C3:BD (public)
2024-03-25 08:20:19.707425500   Alias: 1_B2A24S15P
2024-03-25 08:20:19.707427500   Paired: no
2024-03-25 08:20:19.707428500   Trusted: no
2024-03-25 08:20:19.707430500   Blocked: no
2024-03-25 08:20:19.707431500   Connected: no
2024-03-25 08:20:19.707432500   LegacyPairing: no
2024-03-25 08:20:19.707434500   UUID: Device Information        (0000180a-0000-1000-8000-00805f9b34fb)
2024-03-25 08:20:19.707437500   RSSI: -50
2024-03-25 08:20:19.708200500 
2024-03-25 08:20:21.288299500 INFO:SerialBattery:
2024-03-25 08:20:21.291650500 INFO:SerialBattery:Starting dbus-serialbattery
2024-03-25 08:20:21.293404500 INFO:SerialBattery:dbus-serialbattery v1.1.20240121
2024-03-25 08:20:21.790137500 INFO:SerialBattery:Init of Jkbms_Ble at C8:47:8C:EE:C3:BD
2024-03-25 08:20:21.790949500 INFO:SerialBattery:Test of Jkbms_Ble at C8:47:8C:EE:C3:BD
2024-03-25 08:20:26.301233500 INFO:SerialBattery:BAT: JKBMS 11.XW 12 cells (20230923)
2024-03-25 08:20:26.302581500 INFO:SerialBattery:Connection established to Jkbms_Ble
2024-03-25 08:20:26.304674500 INFO:SerialBattery:Battery Jkbms_Ble connected to dbus from c8478ceec3bd
2024-03-25 08:20:26.305303500 INFO:SerialBattery:========== Settings ==========
2024-03-25 08:20:26.306035500 INFO:SerialBattery:> Connection voltage: 41.28V | Current: 2.1A | SoC: None%
2024-03-25 08:20:26.306623500 INFO:SerialBattery:> Cell count: 12 | Cells populated: 12
2024-03-25 08:20:26.307230500 INFO:SerialBattery:> LINEAR LIMITATION ENABLE: True
2024-03-25 08:20:26.307850500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 150.0A | MAX BATTERY DISCHARGE CURRENT: 150.0A
2024-03-25 08:20:26.308439500 INFO:SerialBattery:> CVCM:     True
2024-03-25 08:20:26.309087500 INFO:SerialBattery:> MIN CELL VOLTAGE: 3.25V | MAX CELL VOLTAGE: 4.15V
2024-03-25 08:20:26.309887500 INFO:SerialBattery:> CCCM CV:  True  | DCCM CV:  True
2024-03-25 08:20:26.310686500 INFO:SerialBattery:> CCCM T:   True  | DCCM T:   True
2024-03-25 08:20:26.311579500 INFO:SerialBattery:> CCCM SOC: False | DCCM SOC: False
2024-03-25 08:20:26.312390500 INFO:SerialBattery:Serial Number/Unique Identifier: 3022846822
2024-03-25 08:20:30.097473500 INFO:SerialBattery:Found existing battery with DeviceInstance = 2
2024-03-25 08:20:30.452253500 INFO:SerialBattery:DeviceInstance = 2
2024-03-25 08:20:30.453079500 INFO:SerialBattery:Used device instances: ['2', '3', '1']
2024-03-25 08:20:30.453929500 INFO:SerialBattery:com.victronenergy.battery.c8478ceec3bd
2024-03-25 08:20:30.454834500 INFO:SerialBattery:BAT: JKBMS 11.XW 12 cells (20230923)
2024-03-25 08:20:30.516947500 INFO:SerialBattery:publish config values = True

Any other information that may be helpful

No response

@mlauss2 mlauss2 added the support Support request label Mar 25, 2024
@mlauss2
Copy link
Author

mlauss2 commented Mar 25, 2024

For example. The first 3 rows are the batteries, then PV Power, EM24 reading and Lynx Shunt at the bottom.
The batteries still show energy being taken out, while the Lynx Shunt is (correctly) showing energy flowing into them.
dsb_lag_1

dsb_lag_2

Usually the Lynx Shunt reports ~0.2V less than the BMSs.

@DirkReudenbach
Copy link

which Values or Current were shown by your Jkbms at the same time?

@mlauss2
Copy link
Author

mlauss2 commented Mar 25, 2024

Can't tell since the app can't connect as long as the cerbo has them grabbed.
With only the one battery connected, the other two report more or less the same current, which adds up to what the lynx shunt shows.

@Honusnap
Copy link

One instance already take like 17% CPU, which is IMMENSE .. !! We already reported those but ... well ... those has been closed.

@mr-manuel
Copy link
Collaborator

We optimized it the best we could. If you compare with a EM* it gets much more data and processes it. Feel free to open a PR to increase performance.
Currently you only could use a Rasperry Pi to connect all BMS and then transfer the aggregated data to the Cerbo.

@ramack
Copy link
Contributor

ramack commented Mar 26, 2024

Maybe a reduction of the polling frequency could imrpove the situation? 1sec is great to have but not always needed.

@mlauss2
Copy link
Author

mlauss2 commented Mar 26, 2024

Maybe a reduction of the polling frequency could imrpove the situation? 1sec is great to have but not always needed.

Or poll one battery every second, in a round-robin fashion. For 3+ batteries maybe give the one that is set as the controlling BMS some preference.

@mr-manuel
Copy link
Collaborator

That will not work. You need to poll the battery at the same moment, else you have a snapshot of different moments. This will result in wrong data and wrong system behaviour.

@ramack
Copy link
Contributor

ramack commented Mar 26, 2024

Consistancy in time is good, but not having an overloaded CPU is more important. And with 5 Batteries a consistent set of data from 5 sec ago doesn't seem better than having the 'average' state from 2.5s ago.

Do you have an example scenario that would cause problems in the round robin reading scheme but would work with low frequency all at the same time scenario?

@mlauss2
Copy link
Author

mlauss2 commented Mar 26, 2024

for my 3 batteries, I'd use a poll scheme like
1->2->1->3->1->2->1->3-> ...
where 1 is my controlling BMS. Yes, the state is now 2 and 4 seconds old instead of the (theoretical) 1, but I think this is miles better than having the state of 1 minute ago or worse for all batteries.
I of course don't know how battery-aggregator et al would be affected by this.

@Honusnap
Copy link

We optimized it the best we could. If you compare with a EM* it gets much more data and processes it. Feel free to open a PR to increase performance. Currently you only could use a Rasperry Pi to connect all BMS and then transfer the aggregated data to the Cerbo.

Reduced poling, as you said .. it will crash the drivers once a day (if i remember correctly) maybe this crash is specific to Jkbms.

@mlauss2
Copy link
Author

mlauss2 commented Mar 27, 2024

it will crash the drivers once a day (if i remember correctly)

Try to disable the "VRM 2-way-communication" in "VRM online portal" settings page I had that crash/reboot once/twice a day too, it went away when I disabled this setting.

@mr-manuel
Copy link
Collaborator

Some months ago we switched the Bluetooth connections from polling to callback, since the driver crashed after some hours without reason.

@mr-manuel mr-manuel added the help wanted Extra attention is needed label Mar 31, 2024
@Baxter117
Copy link

Baxter117 commented Apr 2, 2024

We optimized it the best we could. If you compare with a EM* it gets much more data and processes it. Feel free to open a PR to increase performance. Currently you only could use a Rasperry Pi to connect all BMS and then transfer the aggregated data to the Cerbo.

Hello @mr-manuel ,
I am experiencing the same problem.
100% cpu usage with 3 JK-BMs over bluetooth. ( no other instance. no aggregation, no NodeRed, no GUI)

Can you please explain your proposed solution ?
How can I collect the 3 bms on a raspberry and transfer the "aggregated BMS" to my cerbo ?

installing Venus OS on a raspberry, configuring the 3 bms and use an aggregation of course is no problem.
But how do I get the aggregated BMS to the cerbo ?

Thank you a lot for your help :)

@Baxter117
Copy link

Baxter117 commented Apr 2, 2024

Bildschirmfoto 2024-04-02 um 23 37 19

same Issue with 3 connected JK bms over bluetooth.
I only uploaded the picture in case it might help someone.

I also wondered why I see 2 processes of each Jkbms_Ble. In sum 6

@mr-manuel
Copy link
Collaborator

But how do I get the aggregated BMS to the cerbo ?

  1. Install Venus OS large on the Raspberry Pi
  2. Use Node-RED to combine the data from the battery aggregator you want to transfer to the Cerbo GX
  3. Install dbus-mqtt-battery on the Cerbo to elaborate the data

@mr-manuel
Copy link
Collaborator

I also wondered why I see 2 processes of each Jkbms_Ble. In sum 6

There are different threads that act as watchdog and restart the driver in case of failure.

@Honusnap
Copy link

Honusnap commented Apr 21, 2024

I also wondered why I see 2 processes of each Jkbms_Ble. In sum 6

There are different threads that act as watchdog and restart the driver in case of failure.

Just seen htop screenshots from users that use RS485 ... it's something like 3% per BMS (i asked which one he is using) while it's 14% with bluetooth on jkbms.
Why would the bluetooth take that much CPU ?

@mr-manuel
Copy link
Collaborator

Bluetooth don't poll the data, but elaborates all what the BMS sends over.

I'm modifying the driver right now to let the user choose, if it should be used in the normal way or as passthough only. In passthrough mode only the data is fetched and no calculations are made. In this case the calculations for CVL, CCL and DCL are not done anymore and it is not recommended to select the battery as battery monitor. You will need a battery aggregator or something else that make these calculations and then select it as battery monitor. This way the data is calculated only once and not per battery.

@mlauss2
Copy link
Author

mlauss2 commented May 10, 2024

I'm modifying the driver right now to let the user choose, if it should be used in the normal way or as passthough only. In passthrough mode only the data is fetched and no calculations are made. In this case the calculations for CVL, CCL and DCL are not done anymore and it is not recommended to select the battery as battery monitor. You will need a battery aggregator or something else that make these calculations and then select it as battery monitor. This way the data is calculated only once and not per battery.

that sounds great!

I toyed with the idea of introducing a "battery count" parameter: i.e. calculate the charge/discharge current limit for one battery and multiply it with the "battery count" parameter. This would work great for my 3 identical parallel units (i.e. right now charge limit is 150A, while the 3 units could easily sustain the >300A of the MP2s, with dbus-serialbattery still connected to just one bms).

Thanks!

@mr-manuel
Copy link
Collaborator

Could you all please install v1.3.20240510passthrough, change DRIVER_MODE = 0 to DRIVER_MODE = 1 in the config.default.ini and compare the CPU usage between the setting 0 and 1 for about 30 minutes?

@ramack
Copy link
Contributor

ramack commented May 10, 2024

in config.ini:

DRIVER_MODE = 0:

load average: 3.40, 3.33, 3.50

DRIVER_MODE = 1:

load average: 3.52, 3.82, 2.77
Also I get "Error: unsupported operand type(s) for +=: 'int' and 'NoneType'. Read trial nr. 3"

and I changed the poll interval to 5000 in the heltec BMS driver, to avoid watchdog resets:

class HeltecModbus(Battery):
    def __init__(self, port, baud, address):
[...]
        self.poll_interval = 5000

I have the impression that a configuration option for the poll_interval (or even a dynamic reduction on overload) would be more effective than tweaking the calculations.

@mr-manuel
Copy link
Collaborator

I had the same impression that the load did not change, but it would be good to have multiple feedbacks.

@mr-manuel mr-manuel removed the help wanted Extra attention is needed label May 20, 2024
@mr-manuel
Copy link
Collaborator

Since there are not many willing to test but only concerning I will close this.

Meanwhile I added an option to change the polling interval.

https://github.com/mr-manuel/venus-os_dbus-serialbattery/blob/d946e3d1d8bf755872bd03f9080f94641ebba4a6/etc/dbus-serialbattery/config.default.ini#L349-L354

@ramack
Copy link
Contributor

ramack commented May 21, 2024

This is also a great feature, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Support request
Projects
None yet
Development

No branches or pull requests

6 participants