Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codis迁移时 客户端读写问题 #297

Closed
cyflhn opened this issue Jul 3, 2015 · 21 comments
Closed

codis迁移时 客户端读写问题 #297

cyflhn opened this issue Jul 3, 2015 · 21 comments

Comments

@cyflhn
Copy link

cyflhn commented Jul 3, 2015

codis做slot迁移时,客户端通过proxy进行数据读写,客户端运行一会之后无法写数据,proxy报错退出,slot迁移同样报错退出,请问是怎么回事。下面给出部分日志。
proxy日志:
2015/07/03 13:33:12 router.go:71: [warning] taskrunner count 2
2015/07/03 13:33:12 taskrunner.go:174: [warning] exit taskrunner 192.168.50.72:6379
2015/07/03 13:33:12 taskrunner.go:174: [warning] exit taskrunner 192.168.50.72:6381
2015/07/03 13:33:12 router.go:400: [info] &{Type:slot_changed Desc: Target:{
"product_name": "focus",
"id": 627,
"group_id": 2,
"state": {
"status": "pre_migrate",
"migrate_status": {
"from": -1,
"to": -1
},
"last_op_ts": "0"
}
} Ts:1435901592 Receivers:[{"id":"proxy_1","addr":"localhost.localdomain:19000","last_event":"","last_event_ts":0,"state":"online","description":"","debug_var_addr":"localhost.localdomain:11000","pid":27093,"start_at":"2015-07-03 13:29:25.028999096 +0800 CST"}]}
2015/07/03 13:33:12 topology.go:50: [error] slot not online, {
"product_name": "focus",
"id": 627,
"group_id": 2,
"state": {
"status": "pre_migrate",
"migrate_status": {
"from": -1,
"to": -1
},
"last_op_ts": "0"
}
}
2015/07/03 13:33:12 router.go:107: [info] fill slot 627, force true, &{master:192.168.50.72:6381 redisServers:map[192.168.50.72:6381:0xc20a6332f0]}
2015/07/03 13:33:12 router.go:380: [info] send response 7076
2015/07/03 13:33:12 router.go:279: [info] new connection 192.168.40.163:56484
2015/07/03 13:33:13 session.go:67: [fatal] should never happend

dashboard日志:

2015/07/03 13:33:16 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:17 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:18 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:19 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:20 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:21 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:22 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:23 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:24 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:25 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:26 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:27 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:28 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:29 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:30 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:31 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:32 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:33 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:34 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:35 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:36 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:37 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:38 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:39 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:40 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:41 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:42 action.go:99: [warning] abnormal waiting time for receivers /zk/codis/db_focus/ActionResponse/0000007078
2015/07/03 13:33:43 migrate_task.go:100: [error] receiver timeout
2015/07/03 13:33:43 migrate_task.go:148: [error] receiver timeout
2015/07/03 13:33:43 migrate_manager.go:119: [error] receiver timeout
2015/07/03 13:36:11 dashboard_apis.go:286: [warning] group -1 not found

@yangzhe1991
Copy link
Member

反复测试会稳定复现吗?

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

我测试了两次都会出现,而且现在出现了奇怪的情况,slot再也无法迁移,总是显示migratefrom:-1 migrateto: -1。

Slot Id Current Owner Group Migrate From Migrate To Remain Keys
slot_628 group_2 -1 -1

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

刚才出现“slot再也无法迁移,总是显示migratefrom:-1 migrateto: -1”是因为proxy没有启动,启动proxy之后迁移可以正常进行。

@yangzhe1991
Copy link
Member

也就是proxy的 [fatal] should never happend 这个错误不是每次迁移都会出?概率大吗?还是只出现过一次

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

我刚才又测试了两次,可以说每次必现。我客户端用的jedis。只要在slot迁移,客户端读写数据,proxy就会推出,迁移也无法进行下去。

@yangzhe1991
Copy link
Member

客户端读写的key是固定的还是随机的?slot迁移的话,是刚开始迁移proxy就一定会挂吗

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

我把客户端代码贴出来吧:
conf.setHost("192.168.50.72");
conf.setPort(19000);
CacheClient cli = CacheClientFactory.getCacheClient(conf);
// for (int i = 0; i < 1000; i++) {
// cli.set("k_" + i, "adasda" + i);
// // String val = cli.get("k_1");
// System.out.println(i);
// }

    for (int i = 0; i < 1000; i++) {
        String val = cli.get("k_" + i);
        System.out.println(val);
    }

slot迁移,只要客户端启动,过了一会,proxy就挂,slot也停止。

@yangzhe1991
Copy link
Member

ClientFactory 是自行封装的?

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

是封装的里面就是jedis,cli.get就是调用jedis,get
public String get(String key) {
Jedis jedis = null;
try {
jedis = jedispool.getResource();
return jedis.get(key);
} finally {
if (jedis != null) {
jedispool.returnResourceObject(jedis);
}
}
}

@yangzhe1991
Copy link
Member

每次proxy挂都是session.go:67: [fatal] should never happend 这个错误是吧?
另外,codis的版本是release1.9的最新版?

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

是的 codis版本1.9.6

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

另外出现fatal日志的时候,能否将日志打的更加详细一点,现在这种情况根本无法知道是哪里的问题,dashboad的日志也看不出来

@yangzhe1991
Copy link
Member

因为这个fatal本来就不是预期的情况,按道理不应该出现……所以也没法打详细的信息……

我这边暂时无法复现你遇到的问题,我再看看

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

这个fatal出现的堆栈信息啥的不能打印吗? 要复现这个问题,还需要我提供什么样的信息?

@yangzhe1991
Copy link
Member

堆栈信息没啥用,因为这个行号能看到是哪里的问题。
已经能复现了,我看看是哪里的问题

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

好的 辛苦了 很期待解决

@yangzhe1991
Copy link
Member

发现bug了,我在release1.9上push了,暂时没打tag,你用这个分支的最新代码试试?

@cyflhn
Copy link
Author

cyflhn commented Jul 3, 2015

好的 辛苦了

@spinlock
Copy link
Member

spinlock commented Jul 3, 2015

顺便试试2.0.0 pre release ? 😄

@spinlock spinlock closed this as completed Jul 3, 2015
@cyflhn
Copy link
Author

cyflhn commented Jul 6, 2015

我用release1.9编译通过后,启动dashboard报错,报错信息如下:
[root@localhost codis19]# ./bin/codis-config -c ./config.ini dashboard
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x28 pc=0x401ef2]

goroutine 16 [running]:
runtime.panic(0xa34960, 0xe6a013)
/opt/soft/go/src/pkg/runtime/panic.c:279 +0xf5
main.CreateZkConn(0x0, 0x0)
/opt/codis19/cmd/cconfig/dashboard.go:70 +0x42
main.releaseDashboardNode()
/opt/codis19/cmd/cconfig/dashboard.go:186 +0x26
main.Fatal(0x9b0ac0, 0xc2080a9830)
/opt/codis19/cmd/cconfig/main.go:56 +0x26
main.main()
/opt/codis19/cmd/cconfig/main.go:111 +0x45a

goroutine 19 [finalizer wait]:
runtime.park(0x42cf50, 0xe716d8, 0xe6f109)
/opt/soft/go/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0xe716d8, 0xe6f109)
/opt/soft/go/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
/opt/soft/go/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
/opt/soft/go/src/pkg/runtime/proc.c:1445

goroutine 23 [runnable]:
main.func路012()
/opt/codis19/cmd/cconfig/main.go:94
created by main.main
/opt/codis19/cmd/cconfig/main.go:97 +0x1d0

goroutine 22 [runnable]:
os/signal.loop()
/opt/soft/go/src/pkg/os/signal/signal_unix.go:19
created by os/signal.init路1
/opt/soft/go/src/pkg/os/signal/signal_unix.go:27 +0x32

请问是怎么回事

@cyflhn
Copy link
Author

cyflhn commented Jul 6, 2015

上面的问题已经找到,ini配置文件指定错了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants