Welcome to MrRobot5's Blog!

记录源码阅读心得、工作遇到的问题

因为慢 SQL 导致的线上服务不可用案例分析

2021-10-19

实战问题

Mysql
问题场景

记一次线上事故排查过程，事故的根本原因是慢SQL 查询。但线上问题的情况以及监控工具指标表现，并没有直接指向慢SQL 的问题，排查的过程值得记录和反思。

线上事故

系统使用人员反馈系统的操作卡顿或者不可用，数据列表查询有问题，后端请求响应非常慢。

初步问题排查分析

首先，第一反应应该是 web-server 出问题或者 mysql-server 出问题。比如CPU打满或者内存打满，导致的服务不可用。

通过监控工具查看应用服务器的指标、JVM指标以及数据库服务器CPU、磁盘的指标，都处于正常范围。

然后，猜想是程序、功能方面的问题，通过查询mysql 状态，并没有发现有查询的慢SQL。查询JVM线程，没有突增。

那么问题是出现在哪呢？

问题排查分析plus

上述的基础指标排除完毕，那就尝试看是否有并发锁等待、事务死锁、mysql 死锁的情况。

线程dump

通过dump JVM线程，发现有很多线程等待，前端请求线程在调用getConnection()时候，阻塞了。
```
java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0a9d2f48> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at com.alibaba.druid.pool.DruidDataSource.pollLast(DruidDataSource.java:1487)
        at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1086)
        at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:953)
        at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:931)
```
Insight getConnection()

DruidDataSource 实现线程池的机制是典型的生成者、消费者模式，通过类似BlockingQueue 来实现连接池的管理。

通过分析，是因为持有数据库连接的线程一直占用连接，导致其他需要和数据库交互的请求线程等待。这样的结果造成了系统的JVM CPU和堆栈都是正常情况，但是服务不可用。

同时，由于应用服务是受制于数据库连接池maxActive 的限制，并不会无限制创建mysql connection，导致数据库的IO链接也处于正常范围。

那么是什么请求一直在占用有限的connections？
```
private DruidConnectionHolder pollLast(long nanos) throws InterruptedException {
    for (;;) {
        if (poolingCount == 0) {
            try {
                long startEstimate = estimate;
                // 如果连接池的连接都在工作，那么就需要等待
                estimate = notEmpty.awaitNanos(estimate); // signal by recycle or creator
                notEmptyWaitCount++;
                notEmptyWaitNanos += (startEstimate - estimate);
            } catch (InterruptedException ie) {
                notEmpty.signal(); // propagate to non-interrupted thread
            } finally {
                notEmptyWaitThreadCount--;
            }
        }

        // 从连接池获取空闲的连接，并标记
        decrementPoolingCount();
        DruidConnectionHolder last = connections[poolingCount];
        connections[poolingCount] = null;

        return last;
    }
}
```
原来是update

通过导出mysql 慢查询日志，发现竟然有超长执行时间的update语句。

在一开始的排查过程中，只关注事故的表象，排查select 慢sql并没有问题，导致排查走了弯路。
```
UPDATE `orders` SET `order_carrier_tel`='xxx-xxx-5501' WHERE `order_carrier_name`= 'xxx';
```
由于昨天新上的编辑功能，update 的where 条件没有索引，导致全表扫描，形成慢sql。

总结
1. 敬畏每一次上线。对于关键系统的code review 是不可缺失的，一次的疏忽或者冷门功能点的问题，都会导致线上的事故。
2. 90%的问题是出在数据库交互过程中。平时的代码编写，SQL编写一定要慎重。只有面对线上环境的请求和庞大的数据量时，才是真正的考验。
Read All
漫游 Redis SORTGET combination

2021-07-30

学习笔记

Redis
起因

阅读Retwis-J（Redis版本的Twitter ）设计文档过程中，对于Redis join 方案的实现比较感兴趣，因此记录下SORT/GET 的神奇用法。以备不时之需。

Retwis-J 设计方案

A common problem with any store is dealing efficiently with normalized data.

A simple approach would be simply iterate through the list and load each post one by one but clearly this is not efficient as it means a lot of (slow) IO activity between the application and the database.

The best solution in such cases it to use the SORT/GET combination which allows data to be loaded based on its key - more information here. SORT/GET can be seen as the equivalent of RDBMS join.

SORT/GET combination 使用说明

在某些情况下，获取实际对象而不是它们的ID(object_1、object_2和object_3)更有用。

可以使用以下命令，根据列表中的元素获取外部数据：SORT mylist BY weight_* GET object_*

获取外部hash 字段数据的命令：SORT mylist BY weight_*->fieldname GET object_*->fieldname。通过->来标识hash key 和hash 字段。

命令例子
```
redis> lpush timeline 1 2 3
redis> lrange timeline 0 -1
1) "3"
2) "2"
3) "1"

redis> set post_1 10
redis> set post_2 20
redis> set post_3 30
redis> sort a get o_*
1) "10"
2) "20"
3) "30"

redis> set foo_1 100
redis> set foo_2 200
redis> set foo_3 300
redis> sort timeline get post_* get foo_*
1) "10"
2) "100"
3) "20"
4) "200"
5) "30"
6) "300"
```
Retwis-J join示例

Method convertPidsToPosts shows how these classes can be used load the posts by executing a join over a hash.

Spring Data provides support for the SORT/GET pattern through its sort method and the SortQuery and BulkMapper interface for querying and mapping the bulk result back to an object.
```
// spring-data-redis 提供的StringRedisTemplate 支持上述sort 的combination 操作
// String pid = "pid:*->";
SortQuery<String> query = SortQueryBuilder.sort(key).noSort().get(pidKey).get(pid + uid).get(pid + content).get(pid + replyPid).get(pid + replyUid).get(pid + time).limit(range.begin, range.end).build();
```
```
// 查询结果处理
BulkMapper<WebPost, String> hm = new BulkMapper<WebPost, String>() {
    @Override
    public WebPost mapBulk(List<String> bulk) {
        Map<String, String> map = new LinkedHashMap<String, String>();
        Iterator<String> iterator = bulk.iterator();
        // 对应上述SORT/GET 命令的结果集，通过遍历得到对应的结果
        String pid = iterator.next();
        map.put(uid, iterator.next());
        map.put(content, iterator.next());
        map.put(replyPid, iterator.next());
        map.put(replyUid, iterator.next());
        map.put(time, iterator.next());

        return convertPost(pid, map);
    }
};
List<WebPost> sort = template.sort(query, hm);
```
参考

Spring Data Redis - Retwis-J

SORT key

Redis sort命令详解

Tutorial: Design and implementation of a simple Twitter clone using PHP and the Redis key-value store
Read All

okhttp Read timed out 重试方案解疑

2021-06-11

学习笔记

okhttp 设计模式

疑问
RetryAndFollowUpInterceptor分析
- 能做什么
- 不能做什么
SocketTimeoutException 方案
总结
其他
- okhttp3.Interceptor 注册
参考

疑问

使用 okhttp 抓取数据场景中，偶发Read timed out异常。正常的做法是增加重试机制。

在查看文档过程中，发现okhttp 默认会注册RetryAndFollowUpInterceptor ，字面上是支持重试的。

那么，为什么timed out 异常不会重试，RetryAndFollowUpInterceptor 是干啥的？

RetryAndFollowUpInterceptor分析

能做什么

This interceptor recovers from failures and follows redirects as necessary.

add authentication headers

// 针对未授权的异常(HTTP Status-Code .401: Unauthorized)，尝试调用authenticate(), 继续请求操作
client.authenticator().authenticate(route, userResponse);

follow redirects

/**
 * 针对重定向的异常
 * HTTP Status-Code 301: Moved Permanently.
 * HTTP Status-Code 302: Temporary Redirect.
 * HTTP Status-Code 303: See Other.
 * 通过重新构造request情况，达到自动跳转的目的
 */
String location = userResponse.header("Location");
HttpUrl url = userResponse.request().url().resolve(location);

handle a client request timeout(稀有场景)

case HTTP_CLIENT_TIMEOUT:
// 408's are rare in practice, but some servers like HAProxy use this response code. The
// spec says that we may repeat the request without modifications. Modern browsers also
// repeat the request (even non-idempotent ones.)
// 注意：此处的Timeout 不是上述的SocketTimeout...

参考：okhttp3.internal.http.RetryAndFollowUpInterceptor#followUpRequest

不能做什么

遇到如下的异常：ProtocolException、InterruptedIOException、SSLHandshakeException、CertificateException

称之为：the failure is permanent。

if (e instanceof InterruptedIOException) {
    return e instanceof SocketTimeoutException && !requestSendStarted;
}

参考：okhttp3.internal.http.RetryAndFollowUpInterceptor#recover

SocketTimeoutException 方案

根据情况，适当调整timeout设置

new OkHttpClient.Builder()         
    .connectTimeout(10, TimeUnit.SECONDS)
    .writeTimeout(5, TimeUnit.SECONDS)
    .readTimeout(10, TimeUnit.SECONDS)
    .build();

增加重试机制，对网络的波动进行容错

实现Interceptor接口，对SocketTimeoutException catch 重试。

总结

通过上述的分析，RetryAndFollowUpInterceptor 解决的是http 协议应用层重试问题，而read timed out 通讯协议层的问题。解决timeout 对于RetryAndFollowUpInterceptor 不是职责内的功能。

其他

okhttp3.Interceptor 注册

Response getResponseWithInterceptorChain() throws IOException {
    // Build a full stack of interceptors.
    List<Interceptor> interceptors = new ArrayList<>();
    // 自定义的拦截器优先执行
    interceptors.addAll(client.interceptors());
    interceptors.add(retryAndFollowUpInterceptor);
    interceptors.add(new BridgeInterceptor(client.cookieJar()));
    interceptors.add(new CacheInterceptor(client.internalCache()));
    interceptors.add(new ConnectInterceptor(client));
    if (!forWebSocket) {
      interceptors.addAll(client.networkInterceptors());
    }
    // 真正的发起网络请求
    interceptors.add(new CallServerInterceptor(forWebSocket));

    Interceptor.Chain chain = new RealInterceptorChain(interceptors, null, null, null, 0,
        originalRequest, this, eventListener, client.connectTimeoutMillis(),
        client.readTimeoutMillis(), client.writeTimeoutMillis());
    // 调用链发起调用
    return chain.proceed(originalRequest);
}

参考

浅析 OkHttp 拦截器之 RetryAndFollowUpInterceptor

Read All

Insight Spring 重复 Bean 注册的过程

2021-04-23

源码阅读

Spring

明确的前提
xml bean标签定义Bean注册实现
自动扫描Bean注册实现
Bean注册到容器的校验
实例case 分析
结论

疑问：在业务工程代码梳理过程中，发现竟然存在xml 和注解两种方式配置相同beanName，但是不同的Class。竟然能正常启动发布。理论上beanName 是唯一的，是怎么回事。

Insight Spring版本：3.2.0.RELEASE

明确的前提

Spring Bean在容器中的唯一标识是beanName。对应到xml bean标签是id，对应到注解中是默认属性value。
xml 文件内，是不允许配置多个相同id 的Bean。Ide 会提示，同时启动也会报错 SAXParseException：There are multiple occurrences of ID value 'xxx'.
基于注解的Bean 定义，是不允许配置多个相同value 的Bean。自动扫描注册的过程中，启动报错 ConflictingBeanDefinitionException: Annotation-specified bean name ‘xxx’ for bean class [com.Foo] conflicts with existing, non-compatible bean definition of same name and class [com.Too]

Bean注册是面向BeanFactory 层次的操作。简单的说是存储在Map中。

/** Map of bean definition objects, keyed by bean name */
private final Map<String, BeanDefinition> beanDefinitionMap = new ConcurrentHashMap<String, BeanDefinition>(64);

xml bean标签定义Bean注册实现

/**
 * xml bean 标签解析实现, 生成BeanDefinition，并注册到BeanFactory
 * 通过源码可以看到，从解析到注册，是没有唯一校验beanName，是否能注册成功，就完全依赖the registry。
 *
 * 源码：DefaultBeanDefinitionDocumentReader#processBeanDefinition
 */
protected void processBeanDefinition(Element ele, BeanDefinitionParserDelegate delegate) {
    BeanDefinitionHolder bdHolder = delegate.parseBeanDefinitionElement(ele);
    if (bdHolder != null) {
        bdHolder = delegate.decorateBeanDefinitionIfRequired(ele, bdHolder);
        try {
            // Register the given bean definition with the given bean factory. 直接调用，没有校验。
            BeanDefinitionReaderUtils.registerBeanDefinition(bdHolder, getReaderContext().getRegistry());
        }
        catch (BeanDefinitionStoreException ex) {
            // ...
        }
    }
}

自动扫描Bean注册实现

/**
 * 扫描指定的包路径, 生成bean definitions，并注册到BeanFactory
 * 注意：checkCandidate 会对beanName 进行唯一性校验，Bean兼容判断。如果判断已存在兼容的BeanDefinition,则不再注册。
 *
 * @see ClassPathScanningCandidateComponentProvider#findCandidateComponents
 * 源码：org.springframework.context.annotation.ClassPathBeanDefinitionScanner#doScan
 */
protected Set<BeanDefinitionHolder> doScan(String... basePackages) {
    for (String basePackage : basePackages) {
        Set<BeanDefinition> candidates = findCandidateComponents(basePackage);
        for (BeanDefinition candidate : candidates) {
            String beanName = this.beanNameGenerator.generateBeanName(candidate, this.registry);
            // ......
            // 注意checkCandidate 的作用：beanName唯一性校验(上述的ConflictingBeanDefinitionException，就是此处出现的)；Bean 兼容判断（如果是非扫描Bean，则默认兼容!!!）。
            if (checkCandidate(beanName, candidate)) {
                BeanDefinitionHolder definitionHolder = new BeanDefinitionHolder(candidate, beanName);
                registerBeanDefinition(definitionHolder, this.registry);
            }
        }                        
    }
    return beanDefinitions;
}

Bean注册到容器的校验

BeanFactory 有个配置allowBeanDefinitionOverriding,默认true，是支持重复注册的。

/**
 * Register a new bean definition with this registry.
 * @throws BeanDefinitionStoreException 如果beanDefinition.validate()失败，或者禁止覆盖状态下重复beanName注册
 * 
 * @see RootBeanDefinition
 * @see ChildBeanDefinition
 */
public void registerBeanDefinition(String beanName, BeanDefinition beanDefinition) throws BeanDefinitionStoreException {
    // ......
    synchronized (this.beanDefinitionMap) {
        Object oldBeanDefinition = this.beanDefinitionMap.get(beanName);
        // 唯一性校验，如果allowBeanDefinitionOverriding，那么会重复注册，替换原有beanDefinition。默认支持。
        if (oldBeanDefinition != null) {
            if (!this.allowBeanDefinitionOverriding) {
                throw new BeanDefinitionStoreException(beanDefinition.getResourceDescription(), beanName,
                        "Cannot register bean definition [" + beanDefinition + "] for bean '" + beanName +
                        "': There is already [" + oldBeanDefinition + "] bound.");
            }
        }
        this.beanDefinitionMap.put(beanName, beanDefinition);
    }

    resetBeanDefinition(beanName);
}

实例case 分析

根据上述两种注册实现，实例分析配置的注册过程。

<!-- case1: 先配置自动扫描。先注册Foo，再注册Woo，最终暴露的Bean 是Woo -->
<!-- 包路径下存在beanName="foo"的Class(com.service.Foo) -->
<context:component-scan base-package="com.service.*" />
<!-- xml 中直接定义Bean，beanName="foo" -->
<bean id="foo" class="com.service.Woo"></bean>

<!-- case2: 先配置xml bean。先注册Woo，自动扫描发现同名兼容Bean，跳过Foo，最终暴露的Bean 是Woo -->
<!-- xml 中直接定义Bean，beanName="foo" -->
<bean id="foo" class="com.service.Woo"></bean>
<!-- 包路径下存在beanName="foo"的Class(com.service.Foo) -->
<context:component-scan base-package="com.service.*" />

这样的话，xml bean 配置的优先级是高于自动扫描的bean。

结论

结合上述的分析，Spring 在多个xml配置相同Bean，或者自动扫描和xml混合Bean配置的情况下，默认是允许相同beanName 多次出现的。默认可以理解为，最终解析到的BeanDefinition，会覆盖掉之前相同beanName 的所有BeanDefinition。

通过上述分析，可以发现成熟框架在配置细节上都做的非常完善。对于兼容性（支持多种bean注册、支持重复配置）、扩展性（支持overwrite）、一致性（注册结果和配置顺序无关）的设计和实现，都是值得我们在日常开发中借鉴和思考的。

Read All

SpringBoot booting 原理(EnableAutoConfiguration)

2021-04-23

学习笔记

SpringBoot

springboot auto-configuration 原理
insight springboot @EnableAutoConfiguration
insight spring @Conditional
insight spring Configuration annotations
- 注册ConfigurationClassPostProcessor
- ConfigurationClassPostProcessor 处理逻辑

springboot boot spring 的方案除了前一篇文章提到的，通过 SpringApplicationRunListener 暴露spring 框架启动的阶段，为spring 容器的初始化各种事件的扩展提供方案。

另外一个boot spring 的方案就是 auto-configuration，通过个各种starter，提供各种EnableAutoConfiguration 接口的实现，将对应的特性注册到spring容器。

springboot auto-configuration 原理

首先，spring 框架支持以@Configuration 注解的形式配置Bean，以@Import 引入并注册Bean。springboot 自定义对应的ImportSelector，将各种starter 提供的各种@Configuration 配置类引入到spring 容器中。

然后，基于spring的Condition 机制，通过扩展@Conditional，提供更加丰富、具体的选择判断的功能，支持根据当前classpath或者spring 容器的情况判断是否注册Bean。最终只会有效合法的Bean 注册到spring 容器中。

接下来，针对上述描述的过程，从springboot 入手，逐步分析关键注解的作用。

insight springboot @EnableAutoConfiguration

// EnableAutoConfiguration 的作用就是引入自定义ImportSelector，识别和引入configuration
@Import(EnableAutoConfigurationImportSelector.class)
public @interface EnableAutoConfiguration {
}

借助SpringFactoriesLoader 提供的通用工厂模式机制，springboot 可以加载到classpath 下的configuration classes。

// @see AutoConfigurationImportSelector#selectImports
public String[] selectImports(AnnotationMetadata annotationMetadata) {
    // find auto configuration classes in META-INF/spring.factories
    List<String> configurations = getCandidateConfigurations(annotationMetadata, attributes);
    configurations = removeDuplicates(configurations);
    configurations = sort(configurations, autoConfigurationMetadata);
    Set<String> exclusions = getExclusions(annotationMetadata, attributes);
    configurations.removeAll(exclusions);
    configurations = filter(configurations, autoConfigurationMetadata);
    fireAutoConfigurationImportEvents(configurations, exclusions);
    return configurations.toArray(new String[configurations.size()]);
}

insight spring @Conditional

相当于TypeFilter 增强版，通过自定义编程的方式进行判断和筛选bean definition 。

springboot 扩展的注解：@ConditionalOnBean、@ConditionalOnClass、@ConditionalOnProperty 等

/**
 * Determine if an item should be skipped based on @Conditional annotations.
 * 
 * @see ConditionEvaluator#shouldSkip
 */
public boolean shouldSkip(AnnotatedTypeMetadata metadata, ConfigurationPhase phase) {
    // 规则：无条件=默认符合
    if (metadata == null || !metadata.isAnnotated(Conditional.class.getName())) {
        return false;
    }
    // 解析@Conditional 并初始化conditions
    List<Condition> conditions = new ArrayList<Condition>();
    for (String[] conditionClasses : getConditionClasses(metadata)) {
        for (String conditionClass : conditionClasses) {
            Condition condition = getCondition(conditionClass);
            conditions.add(condition);
        }
    }
    // 根据conditions 判断是否matches
    for (Condition condition : conditions) {
        if (requiredPhase == null || requiredPhase == phase) {
            if (!condition.matches(this.context, metadata)) {
                return true;
            }
        }
    }

    return false;
}

insight spring Configuration annotations

自从spring 3.x开始，spring 支持以java 代码的形式配置容器。

关键的注解：@Configuration @Bean @Import @ComponentScans @PropertySources

@Configuration 类似spring XML configuration 的作用，加载配置文件、BeanDefinition等，因此应该在容器初始化初期进行。对应的处理类：ConfigurationClassPostProcessor，也就是BeanFactoryPostProcessor。

注意：上述的处理类实现了BeanDefinitionRegistryPostProcessor 接口，这个比标准的BeanFactoryPostProcessor 更早的调用和执行。

注册ConfigurationClassPostProcessor

public AnnotatedBeanDefinitionReader(BeanDefinitionRegistry registry, Environment environment) {
    this.registry = registry;
    this.conditionEvaluator = new ConditionEvaluator(registry, environment, null);
    // 注册配置处理器，spring annotation特性核心支持
    AnnotationConfigUtils.registerAnnotationConfigProcessors(this.registry);
}

ConfigurationClassPostProcessor 处理逻辑

类比xml configuration，@Configuration class 相当于某一个xml 文件。

处理逻辑分为两部分：

解析class内部的注解（类比xml文件的标签）
解析class 引入的class（类比xml 文件引入的其他xml文件配置）

处理逻辑的两个核心类：

ConfigurationClassParser，解析Configuration class。
ConfigurationClassBeanDefinitionReader，判断、注册BeanDefinitions。

// @see ConfigurationClassPostProcessor#processConfigBeanDefinitions
public void processConfigBeanDefinitions(BeanDefinitionRegistry registry) {

    // Parse each @Configuration class
    ConfigurationClassParser parser = new ConfigurationClassParser(
            this.metadataReaderFactory, this.problemReporter, this.environment,
            this.resourceLoader, this.componentScanBeanNameGenerator, registry);

    Set<BeanDefinitionHolder> candidates = new LinkedHashSet<BeanDefinitionHolder>(configCandidates);
    Set<ConfigurationClass> alreadyParsed = new HashSet<ConfigurationClass>(configCandidates.size());
    do {
        // 解析@Configuration class，对class 声明的@PropertySources、@ComponentScans、@ImportResource、@Bean、@Import 进行处理。
        parser.parse(candidates);
        parser.validate();

        // Read the model and create bean definitions based on its content
        if (this.reader == null) {
            this.reader = new ConfigurationClassBeanDefinitionReader(
                    registry, this.sourceExtractor, this.resourceLoader, this.environment,
                    this.importBeanNameGenerator, parser.getImportRegistry());
        }
        // 根据Conditional 判断是否注册到容器中
        this.reader.loadBeanDefinitions(configClasses);
        alreadyParsed.addAll(configClasses);
        // 判断是否有新引入的@Configuration class... 继续加载、解析、处理。
    }
    while (!candidates.isEmpty());
}

Read All

13/19

Welcome to MrRobot5's Blog!

因为慢 SQL 导致的线上服务不可用案例分析

问题场景

线上事故

初步问题排查分析

问题排查分析plus

线程dump

Insight getConnection()

原来是update

总结

漫游 Redis SORTGET combination

起因

Retwis-J 设计方案

SORT/GET combination 使用说明

Retwis-J join示例

参考

okhttp Read timed out 重试方案解疑

疑问

RetryAndFollowUpInterceptor分析

能做什么

不能做什么

SocketTimeoutException 方案

总结

其他

okhttp3.Interceptor 注册

参考

Insight Spring 重复 Bean 注册的过程

明确的前提

xml bean标签定义Bean注册实现

自动扫描Bean注册实现

Bean注册到容器的校验

实例case 分析

结论

SpringBoot booting 原理(EnableAutoConfiguration)

springboot auto-configuration 原理

insight springboot @EnableAutoConfiguration

insight spring @Conditional

insight spring Configuration annotations

注册ConfigurationClassPostProcessor

ConfigurationClassPostProcessor 处理逻辑