Any corporate computer network, even a small one, requires constant attention to itself. No matter how well it is configured, no matter how reliable the software is installed on servers and client computers, you cannot rely only on the attention of the system administrator; automatic and continuously operating means of monitoring the state of the network and timely notification of possible problems are needed.
Even accidental hardware or software failures can lead to very unpleasant consequences. A significant slowdown in the functioning of network services and services is still the least unpleasant of them (although in the worst cases it can remain unnoticed for long periods of time). It is much worse when critically important services or applications completely stop functioning, and this goes unnoticed for a long time. The types of “critical” services can be very diverse (and, accordingly, require different monitoring methods). The performance of internal corporate applications and important external services for customers may depend on the correct operation of web servers and database servers; failures and disruptions of routers may disrupt communication between various parts of the company and its branches; servers of internal mail and network messengers, automatic updates and backups, print servers – any of these elements may suffer from software and hardware failures.
And yet, unintentional failures of equipment and software are, in most cases, one–time and easily correctable situations. Much more harm can be caused by deliberate malicious actions from inside or outside the network. Attackers who have discovered a hole in the security of the system can perform many destructive actions – ranging from simple disabling of servers (which, as a rule, is easily detected and corrected), and ending with infection with viruses (the consequences are unpredictable) and theft of confidential data (the consequences are deplorable).
Almost all of the scenarios described above (and many similar ones) ultimately lead to serious material losses: violation of employee interaction schemes, irretrievable loss of data, loss of customer trust, disclosure of secret information, etc. Since it is impossible to completely exclude the possibility of failure or incorrect operation of equipment, the solution is to detect problems at the earliest stages, and receive the most detailed information about them. To do this, as a rule, various network monitoring and control software is used, which is able both to notify technical specialists in a timely manner about the detected problem, and to accumulate statistical data on the stability and other parameters of the operation of servers, services and services available for detailed analysis.
Below we consider the basic methods of monitoring the operation of the network and monitoring its security.
network status monitoring methods
The choice of methods and objects of network monitoring depends on many factors – the configuration of the network, the services and services operating in it, the configuration of servers and software installed on them, the capabilities of the software used for monitoring, etc. At the most general level , we can talk about such elements as:
– checking the availability of equipment;
– checking the status (operability) of services and services running on the network;
– detailed verification of non-critical, but important parameters of network functioning: performance, load, etc.;
– checking the parameters specific to the services and services of this particular environment (the presence of some values in the database tables, the contents of log files).
The initial level of any verification is testing the availability of equipment (which may be disrupted as a result of disconnecting the equipment itself or failure of communication channels). At a minimum, this means checking availability via the ICMP protocol (ping), and it is desirable to check not only the presence of a response, but also the signal transit time and the number of lost requests: abnormal values of these values, as a rule, signal serious problems in the network configuration. Some of these problems are easy to track using route tracing (traceroute) – it can also be automated if there are “reference routes”.
The next stage is to check the principle operability of critical services. As a rule, this means a TCP connection to the appropriate port of the server on which the service is to be started, and possibly the execution of a test request (for example, authentication on the mail server via SMTP or POP, or a request for a test page from a web server).
In most cases, it is advisable to check not only the fact of the response of the service / service, but also the delay – however, this already applies to the next most important task: load checking. In addition to the response time of devices and services for various types of servers, there are other fundamentally important checks: memory and CPU utilization (web server, database server), disk space (file server), and more specific ones – for example, the status of printers at the print server.
The methods of checking these values may vary, but one of the main ones that is almost always available is checking using the SNMP protocol. In addition, you can use specific tools provided by the OS of the equipment being tested: for example, modern server versions of Windows at the system level provide so-called performance counters (performance counters), from which you can “read” quite detailed information about the state of the computer.
Finally, many environments require specific checks – database queries that control the operation of a certain application; checking report files or settings values; tracking the presence of a certain file (for example, created when the system “crashes”).
network security monitoring
The security of a computer network (in the sense of its protection from malicious actions) is provided by two methods: audit and control. Security audit – checking network settings (open ports, availability of “internal” applications from outside, reliability of user authentication); audit methods and tools are beyond the scope of this article.
The essence of security control is to identify abnormal events in the functioning of the network. It is assumed that the basic methods of ensuring and controlling security (authentication, filtering requests by client address, overload protection, etc.) are built into all server software. However, firstly, it is not always possible to trust this assumption; secondly, such protection is not always sufficient. For full confidence in the security of the network, in most cases it is necessary to use additional, external means. At the same time, as a rule, the following parameters are checked:
– load on server software and hardware: abnormally high levels of CPU usage, sudden reduction of free disk space, a sharp increase in network traffic are often signs of a network attack;
– logs and error reports: individual error messages in the server program log files or the server OS event log are acceptable, but the accumulation and analysis of such messages helps to identify unexpectedly frequent or systematic failures;
– the state of potentially vulnerable objects – for example, those whose security is difficult to control directly (unreliable third-party software, changed/unverified network configuration): so, unwanted changes in access rights to some resource or file contents may indicate the penetration of an enemy.
In many cases, anomalies noticed during monitoring and control require an immediate response from technical specialists, respectively, the network monitoring tool should have ample opportunities for forwarding alerts (forwarding messages on a local network, by e-mail, Internet pager). Changes in other controlled reaction parameters do not require, but should be taken into account for subsequent analysis. Often, both are necessary – continuous collection of statistics plus immediate reactions to “outliers”: for example, to note and accumulate all cases of CPU utilization of more than 80%, and when loading more than 95% – to immediately inform specialists. A full-fledged monitoring software should allow you to organize all these (and more complex) scenarios.