Posts
Debuging Session
Today we encountered an intriguing production resource leak that deserved to be shared.
The Observation: In our production server, the CPU and Memory usage continued to increase while the new request came in. In the development server where no requests were served, the resource usage is stable. The server code was written in Go.
Two suspicions were drawn from the observation right away.
The problematic code resides in the hot path, ie.
Posts
Metaverse
The word Metaverse still bugs me. I have learned the word metadata means the data of data, and the word metaprogramming means the programing of programing (macros, annotations, code generations etc). In Reactive programing I have also seen the word metastream, meaning a stream of streams. Given the word Metaverse, shouldn’t it just be the (uni)verse of (uni)verse? In my understanding metaverse, should it exist, can only be found in a higher dimension than the universe we know about.
Posts
查看Pod重启的原因
在Kubernetes中,有时候Pod会异常重启。事后发现的时候错误原因在控制台面板里已经看不到了。
实际上Kubernetes提供了相关的工具。我们可以在不可恢复的异常发生时拦截异常,将其写入/dev/termination-log。
package main import ( "fmt" "os" ) func main() { defer func() { if r := recover(); r != nil { _ = os.WriteFile( "/dev/termination-log", []byte(fmt.Sprintf("panic: %s", r)), os.ModePerm, ) panic(r) } }() // ... } 查看的时候,我们可以在edit pod时在lastState当中找到错误信息:
apiVersion:v1kind:Pod...lastState:terminated:containerID:...exitCode:0finishedAt:...message:| panic: goroutine 1 [running]:main.main.func1()/Users/donew/src/kitty/main.go:18+0x131panic(0x5192e20,0x5640e10)/usr/local/Cellar/go/1.16/libexec/src/runtime/panic.go:965+0x1b9main.main()/Users/donew/src/kitty/main.go:21+0x5b... 也可以直接用kubectl 查看。
kubectl get pod termination-demo -o go-template="{{range .status.containerStatuses}}{{.lastState.terminated.message}}{{end}}" 写入的路径可以在terminationMessagePath中更改。
除此之外,如果项目不方便拦截错误,还可以将terminationMessagePolicy设置为FallbackToLogsOnError。 此时使用容器日志输出的最后一块作为终止消息。 日志输出限制为 2048 字节或 80 行,以较小者为准。
实际操作中,FallbackToLogsOnError的长度限制导致有时候会截取太少,丢掉重要信息。
这里提供一个简单的最佳实践:将正常的日志打印到stdout中,将致命错误打印到stderr(Go里的panic默认就是stderr,不用拦截了),然后将terminationMessagePath设置为/dev/stderr,即可精确的获取Pod重启原因。
参考阅读:https://kubernetes.io/zh/docs/tasks/debug-application-cluster/determine-reason-pod-failure/
Posts
An Elegant Way to Bootstrap Go App
The twelve-factor methodology has proven its worth over the years. Since its invention many fields in technology have changed, many among them are shining and exciting. In the age of Kubernetes, service mesh and serverless architectures, the twelve-factor methodology has not faded away, but rather has happened to be a good fit for nearly all of those powerful platforms.
Scaffolding a twelve-factor go app may not be a difficult task for experienced engineers, but certainly presents some challenges to juniors.
Posts
Keys to Fast Development
I have been on both sides. Sometimes people are amazed at our project grounding up from zero so soon, and some other times people are questioning why our project progress is that slow.
It was said that compiled/strong-typed languages were faster at runtime, but slower to code. While it is true to a certain degree, it rarely matters in commercial applications.
There was also a common belief that enforcing code qualities by means of TDD/Code review/testing-in-general would increase the time to launch.
Posts
Socket.io Server For Hyperf
有小伙伴抱怨道,WebSocket Server感觉太原始,没有“框架感”。希望Socket.io协议的支持,可以让WebSocket更好用,不再有开篇提到的困惑。
Posts
使用Hyperf插入100万行数据到MongoDB,能行吗
最近用go搞了一个swoole的边车。是真的边车,用swoole process启动的。挂载到swoole server上跑,swoole起它起,swoole停它停,中间如果go挂了swoole还负责给拉起来。消息投递也照搬swoole task走IPC,从web worker上直接投递,等结果出来再返还web worker。
Posts
Timeout vs Deadline
One of the first few things I feel strange about gRPC is that gRPC terminate unfinished request based on a deadline mechanism instead of the more common timeout mechanism.
In pseudo-code:
var timeout = 5 * time.Second; var deadline = time.Now() + 5 * time.Second; As you can see, the deadline mechanism is less straight-forward at first glance. So why bother?
I thought this might be another “Google” thing, so I didn’t put my mind to it until recently.
Posts
Why You Should Avoid Using Request-scoped Injection in NestJS
NestJS 6 comes with all-new injection scope support. Namely, instead of enforcing every object managed by DI container to be a singleton, you can now freely assign three types of injection scope to them: singleton, request, and transient.
We adopted the request-scoped injection in our project to achieve distributed tracing as well as context-aware logging. The result turned out to be not so great.
For starters, the request-scoped injection is way too expensive.
Posts
云原生Hyperf骨架包
2020-01-22 日更新:现已提供Hyperf Helm chart。详见repo。
Hyperf官方提供了容器镜像,配置选项又非常开放,将Hyperf部署于云端本身并不复杂。下面我们以Kubernetes为例,对Hyperf默认的骨架包进行一些改造,使它可以优雅的运行于Kubernetes上。本文不是Kubernetes的入门介绍,需要读者已经对Kubernetes有一定了解。
生命周期 容器在Kubernetes上启动以后,Kubernetes会对容器进行两项检查: Liveness Probe和Readiness Probe。Liveness Probe如果没有通过,容器会被重启,而Readiness Probe没有通过,则会暂时将服务从发现列表中移除。当Hyperf作为HTTP Web server启动时,我们只需要添加两条路由就行了。
<?php namespace App\Controller; class HealthCheckController extends AbstractController { public function liveness() { return 'ok'; } public function readiness() { return 'ok'; } } <?php // in config/Routes.php Router::addRoute(['GET', 'HEAD'], '/liveness', 'App\Controller\HealthCheckController@liveness'); Router::addRoute(['GET', 'HEAD'], '/readiness', 'App\Controller\HealthCheckController@readiness'); 在Kubernetes的deployment上配置:
livenessProbe:httpGet:path:/livenessport:9501failureThreshold:1periodSeconds:10readinessProbe:httpGet:path:/readinessport:9501failureThreshold:1periodSeconds:10 当然这里我们只是简单了返回‘ok’,显然不能真正检查出健康状况。实际的检查要考虑业务具体场景和业务依赖的资源。例如对于重数据库服务我们可以检查数据库的连接池,如果连接池已满就暂时在Readiness Probe返回状态码503。
服务在Kubernetes销毁时,Kubernetes会先发来SIGTERM信号。进程有terminationGracePeriodSeconds这么长的时间(默认60秒)来自行结束。如果到时间后还没结束,Kubernetes就会发来SIGINT信号来强制杀死进程。Swoole本身是可以正确响应SIGTERM结束服务的,正常情况下不会丢失任何运行中的连接。实际生产中,如果Swoole没有响应SIGTERM退出,很有可能是因为服务端注册的定时器没有被清理。我们可以在OnWorkerExit处清理定时器来保证顺利退出。
<?php // config/autoload/server.php // ... 'callbacks' => [ SwooleEvent::ON_BEFORE_START => [Hyperf\Framework\Bootstrap\ServerStartCallback::class, 'beforeStart'], SwooleEvent::ON_WORKER_START => [Hyperf\Framework\Bootstrap\WorkerStartCallback::class, 'onWorkerStart'], SwooleEvent::ON_PIPE_MESSAGE => [Hyperf\Framework\Bootstrap\PipeMessageCallback::class, 'onPipeMessage'], SwooleEvent::ON_WORKER_EXIT => function () { Swoole\Timer::clearAll(); }, ], // .
Posts
Hyperf 注解整洁之道
注解是元编程的一种。元编程从字面意思上说就是编写程序的程序。和普通编程一样,注解在给我们带来便捷的同时,如果使用不当,也有可能造成可读性、可维护性下降等问题。
在某些注解中,可能有很多配置项,比如:
//这还不是一个特别夸张的例子 @CircuitBreaker(timeout=0.05, failCounter=1, successCounter=1, fallback="App\Service\UserService::searchFallback") 如果我们的代码里用很多这样复杂的注解,就会引发以下几个问题:
注解中可使用的数据类型表达能力有限,比如必须用方法的字符串全名来表达方法,容易出错。 离开了IDE的帮助,长注解的可读性变得很差。(比如在GitHub上) 同样配置的注解多个地方使用,修改时要改很多地方。 这里我向大家推荐通过继承的方式配置Hyperf内的注解。
下面是一个继承CircuitBreaker(熔断器)注解的例子。
<?php ... /** * @Annotation * @Target({"METHOD"}) * * Shorthand for CircuitBreaker(timeout=0.05, failCounter=1, successCounter=1, fallback="App\Service\UserService::searchFallback") */ class FooCircuitBreakerAnnotation extends CircuitBreakerAnnotation { /** * @var float */ public $timeout = 0.05; /** * @var string */ public $fallback = UserService::class.'::searchFallback'; /** * The counter required to reset to a close state. * @var int */ public $successCounter = 1; /** * The counter required to reset to a open state.