`
carlosfu
  • 浏览: 570824 次
  • 性别: Icon_minigender_1
  • 来自: 北京
博客专栏
Ba8b5055-9c58-3ab0-8a1c-e710f0495d2c
BigMemory实战与理...
浏览量:29914
53b2087e-c637-34d2-b61d-257846f73ade
RedisCluster开...
浏览量:148929
C9f66038-7478-3388-8086-d20c1f535495
缓存的使用与设计
浏览量:122524
社区版块
存档分类
最新评论

Redis-Cluster实战--13.redis Cluster故障转移(failover)

阅读更多

转载请注明出处哈:http://carlosfu.iteye.com/blog/2240426


 

一、测试环境

1. Redis版本:

    由于我们较早的使用了Redis-Cluster版本,所以此测试使用的是Redis 3.0.0 RC1 (version 2.9.101)
   后来有开发者提出,如果是大集群的话,会造成判定失败过慢,造成failover失败,所以作者在Redis 3.0.0 RC3做了修正,当然现在的release版已经不存在这个问题了。

   具体可以参考:

   (1). Redis Cluster的FailOver失败案例分析  : 这篇文章分析了大集群下RedisCluster-RC1会出现由于gossip不完善造成failover失败的案例。

   (2). RedisCluster-RC3优化加快了gossip的传播和失败判定。(redis release notes

 

2. 机器:(5台虚机)

    机器配置:CPU: 2*E5-2650,内存:24G,硬盘:120G

  • 10.10.52.126
  • 10.10.34.55
  • 10.10.52.133
  • 10.10.52.157
  • 10.10.52.151

3. 每个分片128M(5台机器交叉互为主从), A1、B1、C1、D1、E1是5个Master, D2,E2,A2,B2,C2分别是5个对应的Slave。

  • A1(10.10.52.126)----D2(10.10.52.157)
  • B1(10.10.34.55)---E2(10.10.52.151)
  • C1(10.10.52.133)---A2(10.10.52.126)
  • D1(10.10.52.157)---B2(10.10.34.55)
  • E1(10.10.52.151)--C2(10.10.52.133)

      

   

 

 

二、数据5个节点存储100w key-value数据

        key:value为1:1 2:2..........1000000:1000000

 

三、客户端毫秒级间隔发送读请求

       每次随机从(1-1000000)取1个key,进行get操作,记录相关日志。

package com.sohu.cc.redis.fail.test;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**
 * 测试redis-cluster的failover
 * 
 * @Date 2014年12月5日
 * @Time 上午11:16:50
 */
public class RedisClusterFailRead extends RedisClusterFailOverBaseTest {
    private static final int THREAD_SIZE = 1;
    private static ExecutorService threadPool = Executors.newFixedThreadPool(THREAD_SIZE);
    public static void main(String[] args) {
        for (int i = 0; i < THREAD_SIZE; i++) {
            threadPool.execute(new RedisClusterReadThread());
        }
    }
}

   

package com.sohu.cc.redis.fail.test;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Random;
import java.util.concurrent.TimeUnit;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
 * redis get线程
 * @author leifu
 * @Date 2014年12月5日
 * @Time 下午2:52:58
 */
public class RedisClusterReadThread extends RedisClusterFailOverBaseTest
		implements Runnable {
	private final Logger logger = LoggerFactory.getLogger(RedisClusterReadThread.class);
	private int count = 0;
	
	public void run() {
		while (true) {
			try {
				int randomKey = new Random().nextInt(MAX_KEY);
				String result = pipelineCluster.get(String.valueOf(randomKey));
				if (result == null || "".equals(result)) {
					logger.error("result is empty, randomKey is {}", randomKey);
				}else{
					if(count % 100 == 0){
					    logger.info("success data time={}", new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date()));
					}
				}
				count++;
				TimeUnit.MILLISECONDS.sleep(2);
			} catch (Exception e) {
                                logger.info("time={}", new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date()));
				logger.error(e.getMessage(), e);
			}
		}
	}
}

 

  

四、故障模拟

1.   kill 掉1台机器下集群所有主从点,收集redis实例反馈日志和客户端故障&恢复时间

      (1)分析:当前拓扑情况下,无论停掉哪台机器,集群都是完整的(有故障转移),不会丢失数据。

         

 

     (2)以kill A节点为例子(10.10.52.126), 观察A1(10.10.52.126)-D2(10.10.52.157),C1(10.10.52.133)-A2(10.10.52.126), 

        涉及机器和节点如下

    • A1(10.10.52.126)----D2(10.10.52.157)
    • B1(10.10.34.55)---E2(10.10.52.151)
    • C1(10.10.52.133)---A2(10.10.52.126)
    • D1(10.10.52.157)---B2(10.10.34.55)
    • E1(10.10.52.151)--C2(10.10.52.133)

 1)redis实例日志:

 

D2(10.10.52.157)日志:
23706:S 17 Dec 18:51:43.380 # Connection with master lost.
23706:S 17 Dec 18:51:43.380 * Caching the disconnected master state.
23706:S 17 Dec 18:51:44.154 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:44.154 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:44.155 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:45.161 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:45.162 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:45.162 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:46.170 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:46.170 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:46.171 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:47.177 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:47.178 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:47.178 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:48.183 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:48.185 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:48.186 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:49.194 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:49.194 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:49.195 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:50.203 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:50.204 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:50.204 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:51.208 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:51.209 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:51.209 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:52.215 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:52.215 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:52.216 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:53.220 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:53.220 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:53.221 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:54.228 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:54.228 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:54.229 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:55.231 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:55.231 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:55.231 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:56.237 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:56.237 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:56.238 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:57.245 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:57.245 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:57.245 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:58.253 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:58.253 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:58.255 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:51:59.259 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:51:59.259 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:51:59.260 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:52:00.263 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:52:00.263 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:52:00.263 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:52:01.078 * FAIL message received from e6f56def9a30ee2e5e57639b76314d6970d98260 about a6e0ce48cc9a13b57a801cf7b695a4b0dad222b8
23706:S 17 Dec 18:52:01.170 # Start of election delayed for 633 milliseconds (rank #0, offset 4817).
23706:S 17 Dec 18:52:01.272 * Connecting to MASTER 10.10.52.126:6381
23706:S 17 Dec 18:52:01.272 * MASTER <-> SLAVE sync started
23706:S 17 Dec 18:52:01.272 # Error condition on socket for SYNC: Connection refused
23706:S 17 Dec 18:52:01.876 # Starting a failover election for epoch 17.
23706:S 17 Dec 18:52:01.881 * FAIL message received from 8d5358167c0539daf74481d42c9e991822a0522f about cd0837a7c88c849eac575af33458e277e65a2014
23706:S 17 Dec 18:52:01.882 # Failover election won: I'm the new master.
23706:M 17 Dec 18:52:01.882 * Discarding previously cached master state.
23706:M 17 Dec 18:52:01.883 # configEpoch set to 17 after successful failover
 
C1(10.10.52.133)日志
Connection with slave 10.10.52.126:6382 lost.
24901:M 17 Dec 18:52:01.093 * FAIL message received from e6f56def9a30ee2e5e57639b76314d6970d98260 about a6e0ce48cc9a13b57a801cf7b695a4b0dad222b8
24901:M 17 Dec 18:52:01.892 * FAIL message received from 8d5358167c0539daf74481d42c9e991822a0522f about cd0837a7c88c849eac575af33458e277e65a2014
24901:M 17 Dec 18:52:01.896 # Failover auth granted to 43090dceee24fd448be5688f04d200da0e8ab850 for epoch 17
 
 
其他机器:
2713:M 17 Dec 18:52:01.071 * FAIL message received from e6f56def9a30ee2e5e57639b76314d6970d98260 about a6e0ce48cc9a13b57a801cf7b695a4b0dad222b8
2713:M 17 Dec 18:52:01.869 * Marking node cd0837a7c88c849eac575af33458e277e65a2014 as failing (quorum reached).
2713:M 17 Dec 18:52:01.874 # Failover auth granted to 43090dceee24fd448be5688f04d200da0e8ab850 for epoch 17

 

 

2)客户端异常信息:

18:51:37.535 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:37
18:51:38.246 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:38
18:51:38.967 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:38
18:51:39.670 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:39
18:51:40.374 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:40
18:51:41.077 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:41
18:51:41.779 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:41
18:51:42.481 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:42
18:51:43.189 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:51:43
---------------------------------------------------发生异常----------------------------------------------------
18:51:44.560 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 18:51:44
18:51:44.568 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=386948
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=386948
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
18:51:46.680 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 18:51:46
18:51:46.680 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=869071
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=869071
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
18:51:49.871 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 18:51:49
18:51:49.871 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=900131
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=900131
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
18:51:51.979 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 18:51:51
---------------------------------------------------(重复的异常太长了不打印了)----------------------------------------------------
---------------------------------------------------故障恢复----------------------------------------------------
18:52:02.620 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:02
18:52:03.331 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:03
18:52:04.047 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:04
18:52:04.802 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:04
18:52:05.548 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:05
18:52:06.261 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:06
18:52:06.985 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:06
18:52:07.708 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 18:52:07

  

3)客户端故障时间&恢复时间:

服务器停机实际时间: 2014-12-17 18:51:42
客户端故障时间: 2014-12-17 18:51:43
恢复时间: 2014-12-17 18:52:02, 一共耗时19秒

          

 

 

2. kill 掉2台机器下集群所有主从点,收集redis实例反馈日志和客户端故障&恢复时间

(1)分析:如果kil掉两台机器,可能出现两种情况

       (a) AB AE BC CD DE 集群依然完整,故障自动迁移

       (b) AC AD BD BE CE 由于是涉及主从同时杀掉,会出现集群不完整情况。

     下面以(a)为例子,kill掉A(10.10.52.126)和B(10.10.34.55)     ((b)涉及到了集群不完整的情况(从下面3再进行分析))。

     

 

 

 

(2)客户端日志和redis服务器日志:

涉及机器和节点如下

    • A1(10.10.52.126)----D2(10.10.52.157)
    • B1(10.10.34.55)---E2(10.10.52.151)
    • C1(10.10.52.133)---A2(10.10.52.126)
    • D1(10.10.52.157)---B2(10.10.34.55)
    • E1(10.10.52.151)--C2(10.10.52.133)

1)redis实例日志

10.10.52.157 (D2)
23756:M 17 Dec 19:07:01.198 # Connection with slave 10.10.34.55:7629 lost.
23756:M 17 Dec 19:07:14.685 * FAIL message received from 8d5358167c0539daf74481d42c9e991822a0522f about b0b4a7c5ced736127e70c9e96c91aa6b2bd632ac
23756:M 17 Dec 19:07:15.738 # Failover auth granted to 5d9125e11d1be2d389fdd9623ddeff1906204acc for epoch 18
23756:M 17 Dec 19:07:19.218 * FAIL message received from 5d9125e11d1be2d389fdd9623ddeff1906204acc about 93b5a7aab976a86ca82e2adb4d2a521c192dc56b
 
10.10.52.151
4529:S 17 Dec 19:06:58.086 # Connection with master lost.
4529:S 17 Dec 19:06:58.086 * Caching the disconnected master state.
4529:S 17 Dec 19:06:58.495 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:06:58.496 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:06:58.496 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:06:59.502 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:06:59.503 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:06:59.503 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:00.507 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:00.507 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:00.508 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:01.515 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:01.515 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:01.516 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:02.523 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:02.523 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:02.524 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:03.531 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:03.531 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:03.531 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:04.538 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:04.538 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:04.539 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:05.544 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:05.544 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:05.545 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:06.551 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:06.551 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:06.551 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:07.557 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:07.558 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:07.558 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:08.567 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:08.567 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:08.567 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:09.575 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:09.575 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:09.575 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:10.582 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:10.582 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:10.582 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:11.590 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:11.590 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:11.590 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:12.596 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:12.597 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:12.597 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:13.606 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:13.606 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:13.607 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:14.614 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:14.615 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:14.616 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:14.677 * FAIL message received from 8d5358167c0539daf74481d42c9e991822a0522f about b0b4a7c5ced736127e70c9e96c91aa6b2bd632ac
4529:S 17 Dec 19:07:14.716 # Start of election delayed for 947 milliseconds (rank #0, offset 2185).
4529:S 17 Dec 19:07:15.624 * Connecting to MASTER 10.10.34.55:7628
4529:S 17 Dec 19:07:15.625 * MASTER <-> SLAVE sync started
4529:S 17 Dec 19:07:15.625 # Error condition on socket for SYNC: Connection refused
4529:S 17 Dec 19:07:15.726 # Starting a failover election for epoch 18.
4529:S 17 Dec 19:07:15.730 # Failover election won: I'm the new master.
4529:M 17 Dec 19:07:15.730 * Discarding previously cached master state.
4529:M 17 Dec 19:07:15.730 # configEpoch set to 18 after successful failover
4529:M 17 Dec 19:07:19.208 * Marking node 93b5a7aab976a86ca82e2adb4d2a521c192dc56b as failing (quorum reached).
 
其他存活的机器:
2713:M 17 Dec 19:07:14.676 * Marking node b0b4a7c5ced736127e70c9e96c91aa6b2bd632ac as failing (quorum reached).
2713:M 17 Dec 19:07:15.729 # Failover auth granted to 5d9125e11d1be2d389fdd9623ddeff1906204acc for epoch 18
2713:M 17 Dec 19:07:19.209 * FAIL message received from 5d9125e11d1be2d389fdd9623ddeff1906204acc about 93b5a7aab976a86ca82e2adb4d2a521c192dc56b
 

  

2)客户端异常信息

19:06:48.773 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:48
19:06:49.493 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:49
19:06:50.506 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:50
19:06:51.235 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:51
19:06:51.969 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:51
19:06:52.708 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:52
19:06:53.419 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:53
19:06:54.179 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:54
19:06:54.893 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:54
19:06:55.624 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:55
19:06:56.328 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:56
19:06:57.037 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:57
19:06:57.798 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:06:57
---------------------------------------------------发生异常----------------------------------------------------
19:07:00.113 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 19:07:00
19:07:00.121 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=598260
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=598260
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
19:07:02.223 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 19:07:02
19:07:02.224 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=711411
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=711411
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
19:07:05.406 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 19:07:05
19:07:05.407 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=115983
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=115983
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
---------------------------------------------------(重复的异常太长了不打印了)----------------------------------
---------------------------------------------------故障恢复----------------------------------------------------
19:07:20.757 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:07:20
19:07:21.490 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:07:21
19:07:22.215 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:07:22
19:07:22.917 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:07:22

 

 

3)客户端故障时间&恢复时间:

服务器停机实际时间: 2014-12-17 19:07:00
客户端故障时间: 2014-12-17 19:07:00
恢复时间: 2014-12-17 19:07:20, 一共耗时20秒

          

3. kill 掉3台机器下集群所有主从点,收集redis实例反馈日志和客户端故障&恢复时间

   (1)分析:如果kil掉两台机器,可能出现两种情况

 

       (a) AB AE BC CD DE 集群依然完整,故障自动迁移

 

       (b) AC AD BD BE CE 由于是涉及主从同时杀掉,会出现集群不完整情况。

 


 

 

    以kill掉ABC为例子:

    涉及到的机器

  • A1(10.10.52.126)----D2(10.10.52.157)
  • B1(10.10.34.55)---E2(10.10.52.151)
  • C1(10.10.52.133)---A2(10.10.52.126)
  • D1(10.10.52.157)---B2(10.10.34.55)
  • E1(10.10.52.151)--C2(10.10.52.133)

(2)客户端日志和redis服务器日志:

1)redis实例日志

E1(10.10.52.151):
4529:M 17 Dec 19:28:41.802 * FAIL message received from e6f56def9a30ee2e5e57639b76314d6970d98260 about 6ddd2fc00bc218ac6917431daff810b9d334cd87
4529:M 17 Dec 19:28:50.535 * Marking node 6a4008414c773c5a3b66f56f7027da882f6cfee7 as failing (quorum reached).
 
E2(10.10.52.151):
2713:M 17 Dec 19:28:41.801 * FAIL message received from e6f56def9a30ee2e5e57639b76314d6970d98260 about 6ddd2fc00bc218ac6917431daff810b9d334cd87
2713:M 17 Dec 19:28:50.536 * FAIL message received from 5d9125e11d1be2d389fdd9623ddeff1906204acc about 6a4008414c773c5a3b66f56f7027da882f6cfee7
 
D1(10.10.52.157)日志:
23756:M 17 Dec 19:28:41.841 * Marking node 6ddd2fc00bc218ac6917431daff810b9d334cd87 as failing (quorum reached).
23756:M 17 Dec 19:28:50.579 * FAIL message received from 5d9125e11d1be2d389fdd9623ddeff1906204acc about 6a4008414c773c5a3b66f56f7027da882f6cfee7

D2(10.10.52.157)日志:
23706:M 17 Dec 19:28:41.842 * FAIL message received from e6f56def9a30ee2e5e57639b76314d6970d98260 about 6ddd2fc00bc218ac6917431daff810b9d334cd87
23706:M 17 Dec 19:28:50.579 * FAIL message received from 5d9125e11d1be2d389fdd9623ddeff1906204acc about 6a4008414c773c5a3b66f56f7027da882f6cfee7
  

  

2)客户端异常信息:

19:28:36.116 [main] INFO  c.s.tv.builder.RedisClusterBuilder - client is up to date, Cheers!
19:28:36.233 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.265 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.290 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.316 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.348 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.373 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.405 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.432 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.457 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
19:28:36.482 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:36
---------------------------------------------------发生异常----------------------------------------------------
19:28:41.619 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 19:28:41
19:28:41.627 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=749119
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=749119
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
---------------------------------------------------正常返回----------------------------------------------------
19:28:41.633 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:41
19:28:41.658 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:41
---------------------------------------------------发生异常----------------------------------------------------
19:28:43.745 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 19:28:43
19:28:43.746 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=195338
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=195338
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
---------------------------------------------------正常返回----------------------------------------------------
19:28:43.752 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.777 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.803 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.829 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.855 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.880 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.906 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.932 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.958 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:43.985 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:43
19:28:44.010 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:44.035 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:44.061 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:44.087 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:44.113 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:44.140 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:44.165 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:44.191 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:44
19:28:51.382 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-17 19:28:51
19:28:51.382 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=537888
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=537888
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
19:28:51.389 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:51
19:28:51.415 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-17 19:28:51
---------------------------------------------------一直存在部分失败----------------------------------------------------

  

3)客户端故障时间&恢复时间:

服务器停机实际时间: 2014-12-17 19:28:41
客户端故障时间: 2014-12-17 19:28:41
恢复时间: 由于集群不完整,一直存在部分失败(cluster-require-full-coverage为no允许部分失败)

          

4. 测试cluster-require-full-coverage参数的影响

(1) 恢复上面的集群(启动所有被关闭的实例)

(2) 修改所有实例的配置cluster-require-full-coverage为yes(通过CacheCloud的实例页面进行修改配置申请)

(3) kill掉ABC机器上面的实例。

1)客户端异常信息:

16:12:19.727 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:19
16:12:19.832 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:19
16:12:19.937 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:19
16:12:20.042 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.148 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.255 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.360 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.465 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.570 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.675 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.780 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.887 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:20.994 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:20
16:12:26.265 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-18 16:12:26
16:12:26.298 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=724908
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=724908
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
16:12:26.312 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:26
16:12:30.698 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-18 16:12:30
16:12:30.698 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - Too many Cluster redirections? key=677711
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections? key=677711
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:39) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:92) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
16:12:30.705 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:30
16:12:30.810 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:30
16:12:30.915 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:30
16:12:31.021 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:31
16:12:31.127 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:31
16:12:31.234 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:31
16:12:31.339 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:31
16:12:31.444 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:31
16:12:31.550 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:31
16:12:31.656 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - success data time=2014-12-18 16:12:31
16:12:39.933 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-18 16:12:39
16:12:39.934 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
	at redis.clients.jedis.Protocol.processError(Protocol.java:114) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.process(Protocol.java:142) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.read(Protocol.java:197) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:248) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:194) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBulkReply(Connection.java:184) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Jedis.get(Jedis.java:105) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:101) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:98) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:64) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
16:12:45.124 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-18 16:12:45
16:12:45.125 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
	at redis.clients.jedis.Protocol.processError(Protocol.java:114) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.process(Protocol.java:142) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.read(Protocol.java:197) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:248) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:194) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBulkReply(Connection.java:184) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Jedis.get(Jedis.java:105) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:101) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:98) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:64) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
16:12:53.348 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-18 16:12:53
16:12:53.348 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
	at redis.clients.jedis.Protocol.processError(Protocol.java:114) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.process(Protocol.java:142) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.read(Protocol.java:197) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:248) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:194) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBulkReply(Connection.java:184) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Jedis.get(Jedis.java:105) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:101) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:98) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:64) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:75) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
16:12:53.354 [pool-1-thread-1] INFO  c.s.c.r.f.t.RedisClusterReadThread - time=2014-12-18 16:12:53
16:12:53.355 [pool-1-thread-1] ERROR c.s.c.r.f.t.RedisClusterReadThread - CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down. Use CLUSTER INFO for more information
	at redis.clients.jedis.Protocol.processError(Protocol.java:114) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.process(Protocol.java:142) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Protocol.read(Protocol.java:197) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:248) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBinaryBulkReply(Connection.java:194) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Connection.getBulkReply(Connection.java:184) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.Jedis.get(Jedis.java:105) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:101) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster$3.execute(JedisCluster.java:98) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:64) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:33) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at redis.clients.jedis.JedisCluster.get(JedisCluster.java:97) ~[jedis-2.6.2-sohutv-SNAPSHOT.jar:na]
	at com.sohu.cc.redis.fail.test.RedisClusterReadThread.run(RedisClusterReadThread.java:28) ~[test-classes/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_55]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_55]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
------------------------------------------------错误一直持续下去-----------------------------------------------------------

 

 

3)客户端故障时间&恢复时间:

服务器停机实际时间: 2014-12-18 16:12:26
客户端故障时间: 故障不能恢复,设置了cluster-require-full-coverage=yes后不允许集群部分失败。 

         

五、结论

  1. 部分停机能实现故障自动转移,需要时间20秒以内。(测试环境中):在3.0release中这个速度提高了一倍。
  2. 过多机器停机或者实例挂掉可能会造成集群不完整。
  3. cluster-require-full-coverage=yes不允许集群部分失败, cluster-require-full-coverage=no允许集群部分失败。

 

 

 

 

 

 

  • 大小: 106.7 KB
  • 大小: 233.5 KB
  • 大小: 75.4 KB
  • 大小: 69.9 KB
  • 大小: 70.2 KB
  • 大小: 71.8 KB
  • 大小: 73.4 KB
分享到:
评论

相关推荐

    Reading-and-comprehense-redis-cluster_y123456yz.tar.gz

    分布式NOSQL redis源码阅读中文分析注释,带详尽注释以及相关流程调用注释,提出改造点,redis cluster集群功能、节点扩容、槽位迁移、failover故障切换、一致性选举完整分析,对理解redis源码很有帮助,解决了...

    Redis-7.0.13-Windows-x64-with-Service

    Upgrade urgency SECURITY: See security fixes below. ...subsequent failover or node joining (redis/redis#12344) Ensure that the function load timeout is disabled during loading from RDB/AOF and on repli

    docker-rediscluster:用于设置一次性 redis 集群以测试哨兵故障转移的 Docker 脚本

    docker-rediscluster 设置一次性 redis 集群以测试哨兵故障转移。 拓扑 redis_0 - 主 redis_1 - redis_0 的从属 sentinel_0 sentinel_1 sentinel_2 哨兵配置了具有以下属性的“测试”实例 - down-after-...

    Redis 3.0 中文版 - v1.1.pdf

    第 1 章Redis 介绍. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 第 2 章数据类型初探. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

    redis-manager.zip

    Redis Manager 是 Redis 一站式管理平台,支持集群(cluster、master-replica、sentinel)的监控、安装(除sentinel)、管理、告警以及基本的数据操作功能 集群监控:支持监控 Memory、Clients 等 Redis 重要指标;...

    redis-manager-master.zip

    Redis Manager 是 Redis 一站式管理平台,支持集群(cluster、master-replica、sentinel)的监控、安装(除sentinel)、管理、告警以及基本的数据操作功能,支持监控 Memory、Clients 等 Redis 重要指标;...

    Redis V3.0 中文文档

    第 1 章Redis 介绍. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 第 2 章数据类型初探. . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

    Redis云管理平台CacheCloud.zip

    一、CacheCloud是做什么的CacheCloud提供一个Redis云管理平台:实现多种类型(Redis Standalone、Redis Sentinel、Redis Cluster)自动部署、解决Redis实例碎片化现象、提供完善统计、监控、运维功能、减少开发人员的...

    redis-manager:Redis 一站式管理平台,支持集群的监控、安装、管理、告警以及基本的数据操作

    Redis ManagerRedis Manager 是 Redis 一站式管理平台,支持集群(cluster、master-replica、sentinel)的监控、安装(除sentinel)、管理、告警以及基本的数据操作功能集群监控:支持监控 Memory、Clients 等 Redis...

    Undermoon:Mordern Redis Cluster解决方案,易于操作

    月下 Undermoon是基于Redis群集协议的自我管理Redis群集系统,支持: 横向可扩展性和高可用性通过HTTP API进行集群管理对主服务器和副本服务器进行自动故障转移快速缩放群集模式客户端和非群集模式客户端。...

    Redis源码解析:集群手动故障转移、从节点迁移详解

    也就是向从节点发送”CLUSTER FAILOVER”命令,使其在主节点未下线的情况下,发起故障转移流程,升级为新的主节点,而原来的主节点降级为从节点。  为了不丢失数据,向从节点发送”CLUSTER FAILOVER”命令后,...

    redis源码阅读中文分析注释

    redis源码阅读中文分析注释,带详尽注释以及相关流程调用注释,提出改造点,redis cluster集群功能、节点扩容、槽位迁移、failover故障切换、一致性选举完整分析

    kvrocks:Kvrocks 是一个基于 RocksDB 的分布式键值 NoSQL 数据库,兼容 Redis 协议

    Redis协议,用户可以使用redis客户端访问kvrocks 命名空间,类似于 redis db,但每个命名空间使用令牌复制,异步复制使用像 MySQL 这样的 binlog 高可用,支持redis sentinel在master或slave故障时进行failover ...

    CacheCloud云管理平台-其他

    CacheCloud提供一个Redis云管理平台:实现多种类型(Redis Standalone、Redis Sentinel、Redis Cluster)自动部署、解决Redis实例碎片化现象、提供完善统计、监控、运维功能、减少开发人员的运维成本和误操作,提高...

    Java面试 Java分布式知识点 Java微服务知识点 面试知识点

    Dubbo 3 Dubbo是什么? 3 为什么要用Dubbo? 3 Dubbo架构 4 节点角色说明 4 调用关系说明 4 连通性 4 健状性 5 伸缩性 5 升级性 5 ...redis:// 7 ...Failover Cluster 12 Failfast Cluster 13 等分布式微服

Global site tag (gtag.js) - Google Analytics