ELK Monitoring – Part 1 – Understanding
This is the first blog article in the series of ELK stack. Here we will see some basic theoretical knowledge about the ELK stack.
Let’s start hunting the ELK 🙂
Fun fact: ELK is a large species deer!
What is ELK?
ELK – stands for Elastic, Logstash and Kibana.
Elastic – Search or analytics engine that stores data as documents categorized under Indexes
Logstash – Data processor to process the source data and ingests into engine like Elastic using pipelines
Kibana – Visualization tool to visualize the data using graphs, charts etc.
There is an additional small guy with ELK stack called as Beats. Why I referred to him as small is, he is lightweight data shipper that can ship data into either Logstash or Elastic.
So, the entire workflow for ELK stack is as below.
You see Elastic search is the heart of the ELK which stores the data as documents.
Let’s have a quick overview on each component.
ELK Family Components
1.Beats
– Open source server-side data shipper that has its own eco system. It is light-weighted
– Beats can either ship the logs to Logstash or it can directly source to Elastic search.
Important note: Beats has to be installed in the host machine from where you need to ship the data!
Now wear a system administrator hat and think more technically!
What is the data we need to effectively monitor any application?
We can name a few –
- Log entries
- System metrics like CPU usage, memory etc
- Application HTTP performance data like latency, errors
- System availability monitoring data
The list can go on.
In the beats ecosystem, you have different types of beats that serve different purpose
a) Filebeats
This is specially used to ship the log files from any application server. This can monitor a log file location in the running system, tail the log file continuously and ship the logs to either logstash or elastic search.
In Pega, we know different log entries like – applications, alerts, security are stored in different log files. The most commonly used log file is PegaRULES.log.
So filebeats can help us in shipping the pega log entries to elastic search.
Note: Since elastic search saves the data as a JSON document, we need to facilitate storing the log files in JSON. Just a sneakpeak we will see more in detail in the coming posts.
We will also quickly skim through other types of beats
b) Metricbeats
Can be deployed on Windows, Linux and Mac systems to gather system information like CPU Usage, memory, IO statistics etc.
c) Packetbeat
Monitors the network traffic and data and helps to ensure a high level of performance and security
d) Auditbeat
Can be deployed to Linux machines. Works in close conjunction with auditd linux deamon that writes audit records in disk. This beat can help in shipping those audit data seamlessly to elastic server.
e) Heartbeart
This beat helps in checking if the system is alive or not. It can ping the listed URLs and send the status of the server to elastic
There are a few more as well and these beats keep on growing its ecosystem J. No population control 😉
2. Logstash
Open source server-side data processor
Use pipeline that can receive input data from multiple sources, transform it and send it to any type of stash or data engine.
The main work of Logstash is Parsing the incoming data, Identifying the fields and enriching the data dynamically, and sends out to any stash.
The pipeline can use a variety of plugins to perform the stashing operation. There are three stages in the pipeline supported by three plugin categories
a) Input plugins – Enable specific source of input events to be read by Logstash.
b) Filter plugins – Enable the intermediate processing of the event.
c) Output plugins – Sends the event to a particular destination.
We will see more in detail about how to setup the logstash pipeline in a separate post. For now, the understanding is, that you use a different set of plugins to set the logstash pipeline that can enrich the data and send to the desired output 🙂
3. Elastic
Open source search and analytics engine
Build on Apache Lucene and uses simple REST APIs for communication and is mainly used for searching, analytics and monitoring
As we saw before, raw data from beats or enriched data from Logstash can reach Elasticsearch. All these data get INDEXED in elasticsearch. The index is the boss in the elastic world.
What is elasticsearch index?
It is a collection of documents that are related to each other.
Note: It is somewhat related to indexing in data tables.
All the incoming data gets indexed and stored as documents inside the index. If you think of log file, each entry line in a log file is stored as a document in elasticsearch. Each document contains list of fields that provide more details
To be honest, elasticsearch is a very vast topic. So, I am going to touch few more basics about elasticsearch. We will also learn more about elasticsearch in a dedicated post, so stay cool!
We will talk about 5 main modules in Elasticsearch.
- Ingest node
- Analyzer
- Mapper
- Inverted Indexer
- Sharding
Ingest node
node is a common term in IT world, where it can refer to a single server in a cluster environment. Something similar to our pega nodes.
Let’s say elastic cluster is using 3 nodes and all nodes can act as ingest nodes by default. So, what it does?
Pre-process the documents before the indexing happens.
It uses pipeline with processors to enrich the data.
On hearing enrich, do you recognise Logstash also does the same thing?
Logstash as a separate component, has wide varieties to offer, while ingest node has some limitations and can be compared to performing a subset of operations that Logstash does
Note: Beats can directly feed the data to elasticsearch skipping logstash. In such cases, you can use ingest node to enrich the data
On elastic search blog page, you will find more in details when to go for ingest node or logstash
https://www.elastic.co/blog/should-i-use-logstash-or-elasticsearch-ingest-nodes
Analyzer
It is responsible for executing following operations
Let’s say we are logging a message from Pega “The SOAP service failed with an error code 404”. We are injecting this message into elasticsearch. When you execute any search query to elasticsearch as “SOAP service fail” or “error code” it should return the document (log entry). Technically it should do a full text search to get the right document.
Tokenizer
This is made possible by using a tokenizer. Tokenizer breaks the entire message into individual tokens (words)
The, SOAP, service, failed, with, an, error, code, 404. You see the message resulted in 9 tokens.
It also uses a token filter that can filter out a few terms like – The, an, with etc.
Normalizer
Let’s say the developer executes search query as “SOAP fail”. But if you see the token value is “failed”and not “fail”. Normalizers help in normalizing the words. For example – failing, failed, fails everything can be normalized to fail.
Elasticsearch comes with standard analysers by default, but you can also customize to use an analyzer on your own.
Inverted Indexing
Elasticsearch uses a data structure called Inverted Index, which helps in full-text searches.
– It stores all the unique words in a document in an inverted index format
– Identifies all the documents that each word occurs
Let’s say there are two messages in elasticsearch
Document 1– “The SOAP service failed”
Document 2 – “The SOAP request is malformed”
This inverted indexing is based on the analyzer we use in elasticsearch and is also used in ranking the documents by checking the highest appearance of a word in a document.
This is the reason why elastic search returns full-text search within seconds.
Mapper
– It defines how a document and its fields are stored in elasticsearch
– You can format the fields in a variety of ways
– You can do dynamic mapping or explicit mapping
For example – if your message contains a field that stores the timestamp, then the mapper dynamically maps it tothe data type – Date.
Shards
This is a bit complex term – This is mostly used in conjunction with replication factor
We know that data in elasticsearch are organized into indexes. Each index is made up of one or more shards. All the data are written into the shards. Elasticsearch distributes the data into shards across clusters.
Follow the elastic blog to learn more about sharding and how it can impact the performance
https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
Let’s go to the final discussion topic
Kibana
– Open source visualization tool on elastic data
– It uses the REST API provided by elasticsearch and queries the data.
– It ships with a nice user interface to create visualization and a dashboard on the elasticsearch data.
We will see more in actions about Kibana in separate posts.
As a summary, In this blog article, we saw 4 components in Elasticsearch
a) Beats – Lightweight shipper, that can ship the data into either logstash or elasticsearch
b) Logstash – data processor, that transforms data and sends to elasticsearch
c) Elasticsearch – Search and analytics engine used for searching, analysing and monitoring. Exposes set of REST APIs to perform the operation
d) Kibana – Visualization tool to inspect the elastic data
Hope you understood something new in this interesting article.