Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL 8.3 - Codenomicon testing causes Kernel crash! #12

Closed
mikeyjoon opened this issue Dec 23, 2021 · 4 comments · Fixed by #62
Closed

RHEL 8.3 - Codenomicon testing causes Kernel crash! #12

mikeyjoon opened this issue Dec 23, 2021 · 4 comments · Fixed by #62
Labels
bug Something isn't working crash The bug that caused linux kernel crash iscsi-scst SCST iSCSI subsystem

Comments

@mikeyjoon
Copy link

Details of this crash can be found after the VM is rebooted. Here is the kernel BUG (nthred.c) that is hit:

NOTE: I used Codenomicon ISCSI testing to hit this issue!

[75999.447928] [33903]: iscsi-scst: Creating connection 00000000b60b426d for sid 0x3f20000000000, cid 0 (initiator iqn.1994-09.org.freebsd:iscsi)
[75999.471707] [33887]: iscsi-scst: Logout received from initiator iqn.1994-09.org.freebsd:iscsi
[75999.471718] [33889]: iscsi-scst: Closing connection at initiator's iqn.1994-09.org.freebsd:iscsi request
[75999.471815] [34498]: scst: TM fn NEXUS_LOSS_SESS/6 (mcmd 0000000087b23588, initiator iqn.1994-09.org.freebsd:iscsi, target iqn.2012-05.local:mailbox.target.ryan00001)
[75999.471831] [33884]: scst: TM fn 6 (mcmd 0000000087b23588) finished, status 0
[75999.471838] [34498]: iscsi-scst: Freeing conn 00000000b60b426d (sess=00000000f4d68531, 0x3f20000000000 0, initiator iqn.1994-09.org.freebsd:iscsi)
[75999.471848] [34498]: iscsi-scst: Freeing session 00000000f4d68531 (SID 3f20000000000)
[76035.304588] [33903]: scst: Using security group "access_group" for initiator "iqn.1994-09.org.freebsd:iscsi" (target iqn.2012-05.local:mailbox.target.ryan00001)
[76035.305532] [33903]: iscsi-scst: Session 00000000f4d68531 created: target 0000000000f93e99, tid 1, sid 0x4f30000000000, initiator iqn.1994-09.org.freebsd:iscsi
[76035.305536] [33903]: iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData Yes, MaxConnections 1, MaxRecvDataSegmentLength 1048576, MaxXmitDataSegmentLength 8192,
[76035.305538] [33903]: iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 0, DefaultTime2Retain 0,
[76035.305539] [33903]: iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
[76035.305540] [33903]: iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048, RDMAExtensions No
[76035.305542] [33903]: iscsi-scst: Target parameters set for session 4f30000000000: QueuedCommands 32, Response timeout 90, Nop-In interval 30, Nop-In timeout 30
[76035.305582] [33903]: iscsi-scst: Creating connection 00000000b60b426d for sid 0x4f30000000000, cid 0 (initiator iqn.1994-09.org.freebsd:iscsi)
[76035.419640] [34506]: iscsi-scst: ERROR: error 0 at sid:cid 0x4f30000000000:0, cmnd 0000000031af3b7e
[76035.419646] [34506]: iscsi-scst: ERROR: Sending data failed: initiator iqn.1994-09.org.freebsd:iscsi (conn 00000000b60b426d), write_size 8240, write_state 1, res 0
[76035.419696] [33889]: iscsi-scst: CRITICAL ERROR: 0 6 25
[76035.419717] ------------[ cut here ]------------
[76035.419718] kernel BUG at /root/ontap-mediator/dist/scst/scst-3.5.0/iscsi-scst/kernel/nthread.c:1506!
[76035.419754] invalid opcode: 0000 [#1] SMP PTI
[76035.419757] CPU: 0 PID: 33889 Comm: iscsiwr0_1 Kdump: loaded Tainted: G OE --------- - - 4.18.0-240.el8.x86_64 #1
[76035.419758] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
[76035.419765] RIP: 0010:iscsi_send+0x820/0x850 [iscsi_scst]
[76035.419767] Code: 02 6e c0 41 b8 e1 05 00 00 48 c7 c1 c8 e4 6d c0 48 c7 c6 e8 14 6e c0 83 e2 3f 52 48 c7 c2 c4 14 6e c0 50 6a 00 e8 00 21 17 00 <0f> 0b e8 e9 f1 bd d3 8b 85 08 01 00 00 bb f5 ff ff ff 89 44 24 14
[76035.419768] RSP: 0018:ffffb87081823de8 EFLAGS: 00010292
[76035.419770] RAX: 0000000000000036 RBX: ffff8ee397ff3440 RCX: 0000000000000000
[76035.419771] RDX: 0000000000000007 RSI: 0000000000000246 RDI: 0000000000000246
[76035.419772] RBP: ffff8ee3edec1400 R08: 0000000000000585 R09: 0000000000000040
[76035.419773] R10: 2a2a2a524f525245 R11: 353220362030203a R12: ffff8ee397ff3488
[76035.419774] R13: ffff8ee39536e180 R14: ffff8ee3edec14b8 R15: ffff8ee397ff34a0
[76035.419775] FS: 0000000000000000(0000) GS:ffff8ee3ffc00000(0000) knlGS:0000000000000000
[76035.419776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76035.419777] CR2: 00007fbaba5a3000 CR3: 0000000027e0a001 CR4: 00000000003606f0
[76035.419813] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76035.419814] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76035.419815] Call Trace:
[76035.419829] ? __schedule+0x2ae/0x700
[76035.419833] istwr+0x120/0x3b0 [iscsi_scst]
[76035.419840] ? finish_wait+0x80/0x80
[76035.419844] ? iscsi_send+0x850/0x850 [iscsi_scst]
[76035.419846] kthread+0x112/0x130
[76035.419848] ? kthread_flush_work_fn+0x10/0x10
[76035.419850] ret_from_fork+0x35/0x40
[76035.419852] Modules linked in: scst_vdisk(OE) isert_scst(OE) iscsi_scst(OE) scst(OE) dlm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter tun bridge stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi target_core_mod ib_srp scsi_transport_srp ib_ipoib nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set rdma_ucm ib_uverbs ib_umad nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rdma_cm iw_cm ib_cm ib_core ip6_tables nft_compat ip_set nf_tables nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock sunrpc intel_rapl_msr intel_rapl_common sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon intel_rapl_perf joydev pcspkr vmw_vmci i2c_piix4 ip_tables xfs libcrc32c vmwgfx drm_kms_helper sr_mod cdrom ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod sg drm
[76035.419968] ata_piix crc32c_intel libata serio_raw vmxnet3 vmw_pvscsi dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: dlm]

Any clues/hints what might be going on here?

@lnocturno
Copy link
Contributor

Could you please retest the case on the master branch (SCST 3.6)?

Just to be sure that the bug is still reproducible on upstream.

Could you please also provide a link to the Codenomicon tool and tell how you reproduce the problem with this?

@mikeyjoon
Copy link
Author

I will try this by first week of Jan (latest) and update with results.
Happy New Year!

lnocturno added a commit that referenced this issue Jun 28, 2022
This patch should fix the following bug:

iscsi-scst: ***ERROR***: Sending data failed: initiator ..., write_size 0, write_state 1, res 0
iscsi-scst: ***CRITICAL ERROR***: 0 6 31
 ------------[ cut here ]------------
kernel BUG at /usr/src/packages/BUILD/scst-3.7.0.8695/iscsi-scst/kernel/nthread.c:1517!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 12 PID: 997595 Comm: iscsiwr0_14 ...
...
RIP: 0010:iscsi_send+0x877/0x8b0 [iscsi_scst]
Call Trace:
 istwr+0x123/0x3b0 [iscsi_scst]
 kthread+0x120/0x136
 ret_from_fork+0x24/0x36
 -------------------------------------

What happens:

 - istwr() calls scst_do_job_wr().

 - scst_do_job_wr() calls iscsi_send().

 - iscsi_send() sets the 'res' variable to 0 during error
   in one of three possible places:
   iscsi_do_send(), tx_padding(), tx_ddigest().

 - All of these functions call exit_tx() which sets conn->write_state to TX_END.

 - After iscsi_send() has completed for the current iteration, the next time
   it processes iscsi_conn with conn->write_state == TX_END,
   which will call BUG() in the switch default case.

Therefore, remove the res == 0 check in iscsi_send() to handle TX_END state.

Fixes: #12
@lnocturno lnocturno linked a pull request Jun 28, 2022 that will close this issue
lnocturno added a commit that referenced this issue Jun 28, 2022
This patch should fix the following bug:

iscsi-scst: ***ERROR***: Sending data failed: initiator ..., write_size 0, write_state 1, res 0
iscsi-scst: ***CRITICAL ERROR***: 0 6 31
 ------------[ cut here ]------------
kernel BUG at /usr/src/packages/BUILD/scst-3.7.0.8695/iscsi-scst/kernel/nthread.c:1517!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 12 PID: 997595 Comm: iscsiwr0_14 ...
...
RIP: 0010:iscsi_send+0x877/0x8b0 [iscsi_scst]
Call Trace:
 istwr+0x123/0x3b0 [iscsi_scst]
 kthread+0x120/0x136
 ret_from_fork+0x24/0x36
 -------------------------------------

What happens:

 - istwr() calls scst_do_job_wr().

 - scst_do_job_wr() calls iscsi_send().

 - iscsi_send() sets the 'res' variable to 0 during error
   in one of three possible places:
   iscsi_do_send(), tx_padding(), tx_ddigest().

 - All of these functions call exit_tx() which sets conn->write_state to TX_END.

 - After iscsi_send() has completed for the current iteration, the next time
   it processes iscsi_conn with conn->write_state == TX_END,
   which will call BUG() in the switch default case.

Therefore, remove the res == 0 check in iscsi_send() to handle TX_END state.

Fixes: #12
@lnocturno lnocturno added bug Something isn't working crash The bug that caused linux kernel crash labels Jun 28, 2022
@lnocturno
Copy link
Contributor

HI,

It seems I have fixed this problem. If you reproduce it again, feel free to reopen the issue.

Thanks,
Gleb.

@lnocturno lnocturno added the iscsi-scst SCST iSCSI subsystem label Jun 28, 2022
@mikeyjoon
Copy link
Author

Hi Gleb,

Thanks so much for this report. We will pickup your change with SCST 3.7 release. At that time, I can re-test and re-open if need be.

Thanks,
Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crash The bug that caused linux kernel crash iscsi-scst SCST iSCSI subsystem
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants