Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Cloud - postgres crash during fork() #13091

Closed
vmallepalli opened this issue Jun 28, 2022 · 3 comments
Closed

[YSQL] Cloud - postgres crash during fork() #13091

vmallepalli opened this issue Jun 28, 2022 · 3 comments
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@vmallepalli
Copy link
Contributor

vmallepalli commented Jun 28, 2022

Jira Link: DB-2802

Description

We saw two core files get generated around the same time in a cluster created using Yugabyte Managed. Please TAL in case it helps mitigate any future crashes !

Cluster-id: e0dd2601-6008-4674-90b7-792e0ab09f08
Db-version: 2.12.5.1-b2
PagerDuty generated alert - https://yugabyte.pagerduty.com/incidents/Q2XLOVQ1E5BT77

First Core File Backtrace

Core was generated by `/home/yugabyte/yb-software/yugabyte-2.12.5.1-b2-centos-x86_64/postgres/bin/post'.
Program terminated with signal 11, Segmentation fault.
#0 __GI_abort () at abort.c:125

Thread 1 (LWP 5006):
#0 __GI_abort () at abort.c:125
#1 0x00007fa4a1e681f6 in __assert_fail_base (fmt=0x7fa4a1fa1198 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fa4a1fa2f40 "THREAD_GETMEM (self, tid) != ppid", file=file@entry=0x7fa4a1f9f00f "../sysdeps/nptl/fork.c", line=line@entry=136, function=function@entry=0x7fa4a1f9f026 <PRETTY_FUNCTION.12124> "__libc_fork") at assert.c:92
#2 0x00007fa4a1e682a2 in __GI___assert_fail (assertion=0x7fa4a1fa2f40 "THREAD_GETMEM (self, tid) != ppid", file=0x7fa4a1f9f00f "../sysdeps/nptl/fork.c", line=136, function=0x7fa4a1f9f026 <PRETTY_FUNCTION.12124> "__libc_fork") at assert.c:101
#3 0x00007fa4a1ef2335 in __libc_fork () at ../sysdeps/nptl/fork.c:136
#4 0x00000000007dbe70 in fork_process () at ../../../../../../src/postgres/src/backend/postmaster/fork_process.c:61
#5 0x000000000049df59 in BackendStartup (port=0x1fc0960) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4121
#6 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1754
#7 0x00000000007ebb8f in PostmasterMain (argc=argc@entry=23, argv=argv@entry=0x1e9a300) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1417
#8 0x00000000007372aa in PostgresServerProcessMain (argc=23, argv=0x1e9a300) at ../../../../../../src/postgres/src/backend/main/main.c:234
#9 0x00000000007374a9 in main ()

Second Core File Backtrace

Core was generated by `/home/yugabyte/yb-software/yugabyte-2.12.5.1-b2-centos-x86_64/postgres/bin/post'.
Program terminated with signal 6, Aborted.
#0 0x00007fa4a1f1b253 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84

Thread 1 (LWP 18730):
#0 0x00007fa4a1f1b253 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x000000000049d83a in ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1711
#2 0x00000000007ebb8f in PostmasterMain (argc=argc@entry=23, argv=argv@entry=0x1e9a300) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1417
#3 0x00000000007372aa in PostgresServerProcessMain (argc=23, argv=0x1e9a300) at ../../../../../../src/postgres/src/backend/main/main.c:234
#4 0x00000000007374a9 in main ()

@vmallepalli vmallepalli added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jun 28, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 28, 2022
@vmallepalli
Copy link
Contributor Author

postgres_logs.gz

Here are the postgres logs that include the timestamp when the core files were created.

@tedyu
Copy link
Contributor

tedyu commented Jun 28, 2022

2022-06-28 20:28:54.524 UTC [4824] STATEMENT:  INSERT INTO "addresses" ("decompiled","fetched_coin_balance_block_number","hash","inserted_at","updated_at","verified") VALUES ($1,$2,$3,$4,$5,$6),($7,$8,$9,$10,$11,$12),($13,$14,$15,$16,$17,$18),($19,$20,$21,$22,$23,$24),($25,$26,$27,$28,$29,$30),($31,$32,$33,$34,$35,$36),($37,$38,$39,$40,$41,$42),($43,$44,$45,$46,$47,$48),($49,$50,$51,$52,$53,$54),($55,$56,$57,$58,$59,$60),($61,$62,$63,$64,$65,$66),($67,$68,$69,$70,$71,$72),($73,$74,$75,$76,$77,$78),($79,$80,$81,$82,$83,$84),($85,$86,$87,$88,$89,$90),($91,$92,$93,$94,$95,$96),($97,$98,$99,$100,$101,$102),($103,$104,$105,$106,$107,$108),($109,$110,$111,$112,$113,$114),($115,$116,$117,$118,$119,$120),($121,$122,$123,$124,$125,$126),($127,$128,$129,$130,$131,$132),($133,$134,$135,$136,$137,$138),($139,$140,$141,$142,$143,$144),($145,$146,$147,$148,$149,$150),($151,$152,$153,$154,$155,$156),($157,$158,$159,$160,$161,$162),($163,$164,$165,$166,$167,$168),($169,$170,$171,$172,$173,$174),($175,$176,$177,$178,$179,$180),($181,$182,$183,$184,$185,$186),($187,$188,$189,$190,$191,$192),($193,$194,$195,$196,$197,$198),($199,$200,$201,$202,$203,$204),($205,$206,$207,$208,$209,$210),($211,$212,$213,$214,$215,$216),($217,$218,$219,$220,$221,$222),($223,$224,$225,$226,$227,$228),($229,$230,$231,$232,$233,$234),($235,$236,$237,$238,$239,$240),($241,$242,$243,$244,$245,$246),($247,$248,$249,$250,$251,$252),($253,$254,$255,$256,$257,$258),($259,$260,$261,$262,$263,$264),($265,$266,$267,$268,$269,$270),($271,$272,$273,$274,$275,$276),($277,$278,$279,$280,$281,$282),($283,$284,$285,$286,$287,$288),($289,$290,$291,$292,$293,$294),($295,$296,$297,$298,$299,$300) ON CONFLICT ("hash") DO NOTHING RETURNING "updated_at","inserted_at","gas_used","token_transfers_count","transactions_count","verified","decompiled","nonce","contract_code","fetched_coin_balance_block_number","fetched_coin_balance","hash"
I0628 20:28:54.525954  4844 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
2022-06-28 20:28:55.164 UTC [5002] LOG:  PID 6251 in cancel request did not match any process
2022-06-28 20:28:55.193 UTC [4923] LOG:  unexpected EOF on client connection with an open transaction
I0628 20:28:55.193727  4925 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
W0628 20:28:55.605315  4957 tablet_rpc.cc:475] Timed out (yb/rpc/outbound_call.cc:492): Failed UpdateTransaction: tablet_id: "5b5ea425ec0e43f183f0b1b024f1cd37" state { transaction_id: "\036\277 |\033\013F\033\257\013\206\2069C\237\334" status: PENDING } propagated_hybrid_time: 6784811558694264832, retrier: { task_id: -1 state: kIdle deadline: 515028.896s } to tablet 5b5ea425ec0e43f183f0b1b024f1cd37 on tablet server { uuid: 26d8d7ef867d413293ec6f98dab5b3cc private: [host: "10.8.24.246" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a" after 1 attempt(s): UpdateTransaction RPC (request call id 126) to 10.8.24.246:9100 timed out after 0.500s
W0628 20:28:55.605368  4957 transaction.cc:1136] 1ebf207c-1b0b-461b-af0b-868639439fdc: Send heartbeat failed: Timed out (yb/rpc/outbound_call.cc:492): UpdateTransaction RPC (request call id 126) to 10.8.24.246:9100 timed out after 0.500s, state: kRunning
W0628 20:28:55.605827  2134 tablet_rpc.cc:475] Timed out (yb/rpc/outbound_call.cc:548): Failed UpdateTransaction: tablet_id: "5b5ea425ec0e43f183f0b1b024f1cd37" state { transaction_id: "\033\030\023b\026\202O\352\243\257\361\320\t\017\240\243" status: PENDING } propagated_hybrid_time: 6784811559095996416, retrier: { task_id: -1 state: kIdle deadline: 515028.994s } to tablet 5b5ea425ec0e43f183f0b1b024f1cd37 on tablet server { uuid: 26d8d7ef867d413293ec6f98dab5b3cc private: [host: "10.8.24.246" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a" after 1 attempt(s): Call timed out before sending
W0628 20:28:55.605875  2134 transaction.cc:1136] 1b181362-1682-4fea-a3af-f1d0090fa0a3: Send heartbeat failed: Timed out (yb/rpc/outbound_call.cc:548): Call timed out before sending, state: kRunning
I0628 20:28:55.644915  1922 tablet_rpc.cc:157] Unable to pick leader for 5b5ea425ec0e43f183f0b1b024f1cd37, replicas: [0x0000000002651ef0 -> { uuid: 52ca9df1501b408094549d92e6b1a393 private: [host: "10.8.24.206" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a", 0x00000000025a7ef0 -> { uuid: 97acc384761a4b35848f7822d2892578 private: [host: "10.8.25.151" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a"], followers: [{0x00000000025a7ef0 -> { uuid: 97acc384761a4b35848f7822d2892578 private: [host: "10.8.25.151" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a", { status: Illegal state (yb/consensus/consensus.cc:162): Not the leader (tablet server error 15) time: 0.003s }}, {0x0000000002651ef0 -> { uuid: 52ca9df1501b408094549d92e6b1a393 private: [host: "10.8.24.206" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-east-1" placement_zone: "us-east-1a", { status: Illegal state (yb/consensus/consensus.cc:162): Not the leader (tablet server error 15) time: 0.013s }}]
postgres: ../sysdeps/nptl/fork.c:136: __libc_fork: Assertion `THREAD_GETMEM (self, tid) != ppid' failed.

I googled the last line above - there were some entries from 2017 which I don't think are relevant.

@tedyu
Copy link
Contributor

tedyu commented Jun 29, 2022

2022-06-28 19:47:07.410 UTC [19233] LOG:  PID 11938 in cancel request did not match any process
grep 'in cancel request did not ma' postgres_logs | wc
    1401   21015  133763

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Sep 18, 2022
@sushantrmishra sushantrmishra changed the title [YSQL] Cloud - postgres - two core files generated - pls rename this issue based on backtrace content [YSQL] Cloud - postgres crash during fork() Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants