After six years of iteration and precipitation, Tencent Tencent Server Web (hereinafter referred to as TSW), the company-level operation and maintenance component, was officially open sourced today. TSW is a front-end developer for WEB, to improve the efficiency of problem-based positioning. It provides a Node.js infrastructure for cloud capture, holographic logging, and anomaly discovery. TSW provides stable services for billions of requests each day, and is widely used in more than 30 important services such as QQ Space, Microvision, Micro Cloud, QQ Music, National K Song, and Tencent Cloud.
Github open source address:https://github.com/Tencent/TSW
TSW supports capture of user dimensions
For dyed users
Capture packets during the request life cycle
Provides capture, view, download and other functions
Capture format supports Fiddler and Charles, and HAR
TSW provides holographic logs of user dimensions to help developers quickly locate problems
Record the log of the request life cycle in a holographic manner to form a stream
Flow water is aggregated according to user dimension
Provides viewing capabilities to quickly locate the problem
Built-in indicators real-time monitoring
Code exception push alert
Who is using it?
From straight out
In October 2012, the first version of TSW was launched on Zhiyun (integrated automatic operation and maintenance platform). The Node.js version was 0.6.20. The function at the time was very simple, and only the server-side JS came out of the page. It can be said that there is no operation and maintenance, and the only value is that it can make the first screen experience faster. In order to strengthen this unique value, achieve gzip+chunked, side-by-side compression, the ability to make page content rhythm back, precise control of the flow back to the package, is very simple for Node.js.
In the browser, when you want a cookie, you get the context window variable, and you never think that taking a cookie is a particularly laborious task. However, it is different in Node.js. For such a simple request for a cookie, you need to pay special attention to where the request object is. Because the request object is a local variable and cannot be obtained globally, this is the nature of the problem.
Until 2014, a method of implementing context was discovered and the window object was born. The use of process.domain always points to the current wrapper's own domain object to complete the context switch automatically. The essence of the window object is the global variable that is bound to the lifetime of the request object. You can get the request object through the window at any time, and then get other information such as cookies.
The emergence of window has opened up the connection between different dimensions, and put these links into the ground to produce new value. For example, in the DNS resolution API, there is no user concept. The user exists in a higher dimension. The high dimension is invisible to the low dimension. Through the window object, the low dimension can know which user is doing the resolution service. When there is a problem with the parsing result, it is known which user is affected, and the user can also look back to the parsing process. Dimensions are not merely dependent but closely related. This leads to the fact that the code of TSW needs to be unconditionally high in cohesion and cannot exist in a modular manner. In order to collect these contacts, it is necessary to have a matching storage and viewing system. These systems eventually evolved into the TSW open platform (tswjs.org).
One problem has been plaguing us: To solve a problem, capture one hour of code and one minute of code, packet capture efficiency seriously affects the efficiency of locating problems. In this context, TSW proposed a solution for server cloud capture. Compared with client-side packet capture, users do not need to pay attention to the network environment, location, and access layer protocols.
It is not enough to just grab the request packet and the response packet. Use the context object to reinforce it. In the request processing process, using the context, the derivative request is associated with packet capture to form a holographic packet capture; the derivative logs are correlated to form a holographic log. For a request, it is natural to see logs and captures of all dimensions. Finally, log and catch the package for viewing. For the request that does not meet expectations, the exact cause of the problem can be given. From the results, the team's overall R&D efficiency has been qualitatively improved.
Long time in operation and maintenance
The company's top front-end development, distributed in various types of products, wants them to become users, first to break through the sector walls. This dimension of operation and maintenance is an opportunity to break the wall. The operation and maintenance of service A and service B may be the same person. Just like gravitational waves, you can cross the space dimension and directly copy the path, instead of running around the space. Therefore, TSW chooses to work on the operation and maintenance, rather than how to write the code, with the open source library.
Installing 1 Node.js is a skill, installing 1000 units is operation and maintenance. Installation and upgrade should be treated as a problem of operation and maintenance. Node.js version, everyone knows that iterating quickly, if each version has business in use, there are multiple operation and maintenance objects. Providing multiple versions for business selection seems very democratic, but good operation and maintenance is to reduce the operation and maintenance objects.
Can we upgrade only to maintain one version? The unified upgrade encounters new challenges: The CPP extension is strongly related to the Node.js version. It is not enough to just promote Node.js. The extension also needs to be maintained. In order to find out all the CPP extensions, TSW uses monitoring to alert private business expansion e-mails and then incorporate TSW unified maintenance. The unified maintenance speeds up the iteration speed of TSW. The strong consistency avoids the code corruption caused by iterations.
There is no problem in communication between front-end developments. However, for operation and maintenance students, it is a dialect. At this time, it needs to be able to speak Mandarin. Taking start-up logic, a Node.js program daemon starts, and when it hangs, only developers know how to pull it up, which is very embarrassing. Then take the expansion and contraction capacity for example, operation and maintenance after scaling up, you need to move code to go online and offline, that is a problem. After weaving clouds for many years, the degree of automation in operation and maintenance has reached the level of self-help. With regard to the standardization of TSW, we have chosen to settle in weaving clouds. When you need to install, restart, or upgrade the TSW, it's just a click to the weaving cloud.
Sheng Yu open source
TSW has a strong stability and business versatility. Node.js has considerable popularity and popularity in the front-end community. TSW as a Node.js infrastructure has been recognized by the development team and the operation and maintenance team within the company. This time, TSW is open-sourced and TSW is further enhanced in the industry. The technical influence, but also further improve and optimize the TSW.