Before linux did not implement the EPOLL event driving mechanism, we generally choose to use IO multi -way reuse such as Select or Poll to implement the concurrent service program. In the age of big data, high concurrency, and clusters, the use of SELECT and POLL’s martial arts land is becoming more and more limited, and the limelight has been dominated by EPOLL.
This article introduces the implementation mechanism of EPOLL, and comes with the explanation of Select and Poll. By comparing its different implementation mechanisms, truly understand why EPOLL can achieve high concurrency.
Select () and POLL () IO Multi -Road Reuse models
SELECT’s disadvantages:
- Single processes can monitor the number of file descriptive numbers, which are maximum limit, usually 1024, of course, the number can be changed, but because Select uses a rotation to scan the file descriptor, the more the number of file descriptors, the worse the performance; In the Linux kernel header file, there is such a definition: #Define __FD_SETSIZE 1024)
- kernel / user space memory copy problem, Select needs to copy a large number of handle data structures to generate huge expenses;
- Select returns an array containing the entire handle. The application needs to traverse the whole array to find which handles can be found;
The trigger method of
- Select is horizontal triggering. If the application does not complete the IO operation of a well -ready file descriptor, then each selct calls will still notify these file descriptive notification processes each time.
Compared with the SELECT model, the POLL uses the linked list to save the file descriptor, so there is no limit on the number of monitoring files, but the other three shortcomings still exist.
Take the Select model as an example. Assuming that our server needs to support 1 million concurrent connections, in the case of __FD_SETSIZE is 1024, we need to open at least 1K to achieve 1 million concurrent connections. In addition to the time consumption of switching time between the process, a large number of brain memory copy and array inquiries from the kernel/user space are unbearable. Therefore, the server program based on the Select model is a difficult task to achieve a 100,000 -level concurrent access.
Therefore, the epoll is on the stage.
EPOLL IO Multi -Road Reuse Model Implementation Mechanism
Since the implementation mechanism of the EPOLL is completely different from the Select/Poll mechanism, the disadvantages of the SELECT mentioned above no longer exist on EPOLL.
Imagine the following scene: 1 million clients maintain TCP connection with a server process at the same time. And every time, only a few hundred and thousands of TCP connections are active (in fact most scenes are the case). How to achieve such high and merger?
In the Select/POLL era, the server process told the operating system (copying the data structure from the user mode to the kernel state) each time the server process was allowed After the rotation inquiry, copy the handle data to the user mode, and let the server application inquire about the network incidents that have occurred. This process consumes great resources. Essence
The design and implementation of
EPOLL are completely different from select. EPOLL is applying for a simple file system in the Linux kernel (what data structures are generally used in the file system? B+tree). Divide the original select/position into 3 parts:
1) Call epoll_create () to create an EPOLL object (allocate resources for this handle object in the EPOLL file system)
2) Call epoll_ctl to add these 1 million connected sockets to the EPOLL object
3) Call the connection of events collected by epoll_wait
In this way, to achieve the scene mentioned above, you only need to create an EPOLL object during the process start, and then add or delete the connection to this EPOLL object when needed. At the same time, EPOLL_WAIT’s efficiency is also very high, because when calling EPOLL_WAIT, there is no brain to copy these 1 million connected handle data to the operating system, and the kernel does not need to traverse all the connections.
Let’s take a look at the specific EPOLL mechanism of the Linux kernel.
When the EPOLL_CREATE method is called in a process, the Linux kernel will create an EventPoll structure. Two members in this structure are closely related to the use of EPOLL. The EventPoll structure is shown below:
1 2 3 4 5 6 7 8 |
struct eventpoll{ …. /*The root node of the red and black trees, this tree stores all events that need to be monitored in Epoll* / struct rb_rootrbr; /*Double chain tables are stored in the event that will return to the user through the EPOLL_WAIT to return to the user* / struct list_head rdlist; …. }; |
Each EPOLL object has an independent EventPoll structure to store events added to the EPOLL object through the EPOLL_CTL method. These incidents will be mounted in red and black trees. In this way, repeatedly added events can be easily identified by red and black trees (the insertion time efficiency of red and black trees is LGN, where n is the height of the tree).
and all events added to EPOLL will establish a callback relationship with the device (network card) driver, that is, the callback method is called when the corresponding event occurs. This callback method is called EP_POLL_CALLLBACK in the kernel, which will add the events to the RDList dual linked list.
In EPOLL, for each event, an EPITEM structure will be established, as shown below:
1 2 3 4 5 6 7 |
struct epitem{ struct rb_noderbn; // Red and black tree nodes Struct list_headrdllink; // Two -way linked list nodes Struct epoll_filefdffd; // Event handle information Struct eventPoll *EP; // Pointing to the EventPoll object it belongs Struct epoll_event every; // The type of event that expects occurs } |
When the EPOLL_WAIT checks whether an event occurs, you only need to check whether there is an EPITEM element in the RDList dual chain table in the EventPoll object. If the RDList is not empty, copy the event to the user mode and return the number of incidents to the user at the same time.
EPOLL data structure schematic diagram
From the above explanation, we can see that through the data structure of red and black trees and dual linked lists, and combined with the callback mechanism, the EPOLL is efficient.