In this post we will describe what is needed to get started with managing your EAP 6 logs with ElasticSearch, Logstash and Kibana. There are several reasons why you would want to collect your logging output in a central place.
- Aggregate (output from multiple applications / hosts)
- Correlate events in different systems
- Analyze (more than grep)
- Backup
- Integrate into monitoring
- Gather statistics
A common solution that supports all this use cases is provided by the ELK stack. It consists of ElasticSearch (ES), Logstash and Kibana. ElasticSearch provides persistence and analytics, Logstash provides the pipeline that brings your Logs into ES and Kibana provides a GUI for querying and dashboards.
As usual there are tradeoffs to handle. E.g. reliability vs performance. If must ensure that your log message will not be lost, you cannot use unreliable protocols like UDP to send your data. If you have on the other hand a requirement, that your logging should never hinder your production system, a blocking solution can be problematic. If you have hard requirements on reliability e.g. if you audit financial transactions, Logstash cannot help. But you can still run it side-by-side with your existing solution.
EAP 6 Logging
Using EAP 6 you have several choices regarding the logging API. There is JBoss Logging, sl4j, log4j and plain Java logging. We usually use sl4j as it fits our needs and is kind of a standard. In all cases JBoss Logging is the backend.
Configure JBoss Logging
The easiest way is to use the org.apache.log4j.net.SocketAppender. It can be configured like any other appender and will send log messages to a remote log4j server. The remote endpoint will be provided by Logstash. Another option is to let Logstash parse the server log file. In this case you will have to find same regex that can extract the relevant information from your log file (like timestamp, log level, exceptions, class/method name, …). This is not as easy as it may sound, and this is why we stick to SocketAppender which gives us most of this for free.
Add the following log handler to your EAP 6 configuration. You will have to add Remotelog4j to your root logger as well. You are free to use CLI or Web-console as well.
<custom-handler name="Remotelog4j" class="org.apache.log4j.net.SocketAppender" module="org.apache.log4j"> <level name="INFO"/> <properties> <property name="RemoteHost" value="localhost"/> <property name="Port" value="4712"/> <property name="BufferSize" value="1000"/> <property name="Blocking" value="false"/> </properties> </custom-handler>
Logstash
Logstash can pull data from various sources (like filesystem, message queues or databases) or it can provide network endpoints for various network protocols (like syslog, gelf or log4j sockets). In some cases like if you want to access log files on the local filesystem, the Logstash process will probably run on the same machine. For the other setups you have the option to run the Logstash process anywhere, as long as the necessary ports are reachable. In general it is preferable to have one logstash instance for each source, because you can then change your configuration without affecting other systems.
As already mentioned Logstash provides a pipeline that can process our logs and finally put them into some kind of sink (ES in our case). Logstash itself is quite versatile and be configured to accept various inputs. The documentation shows the complete list. In general it is preferable the use inputs that provide already structured data. But that is not always possible, because not every system can provide structured data or it is not possible to change the configuration. In that case you will need to use a filter to extract that information. After a log message is received from one of these endpoints it can be processed (manipulated, filtered, enriched) by one or more filters. A typical job for a filter is to extract information like timestamps and error levels from the log message. Another use case is to add information like host name or geo information to an IP address. Afterwards the message can be pushed into one or more outputs. We will focus on ES but various other options are possible as well.
Configure Logstash
Create the file ./log4j.conf with the following content:
input { log4j { mode => "server" host => "0.0.0.0" port => 4712 type => "log4j" } } output { elasticsearch { host => "127.0.0.1" cluster => "elasticsearch-logstash" index => "logstash-%{+YYYY.MM.dd}" } #stdout { codec => rubydebug } }
We define a log4j socket endpoint and send everything without filtering to our ES instance. Start Logstash with
./logstash-1.5.2/bin/logstash -f log4j.conf
ElasticSearch
Ealsticsearch is a distributed search and analytics server. It is based on Apache Lucene and can be scaled horizontally. Its API is based on HTTP and JSON. Beside full text search, it provides analytic capabilities, both in near real time. In our scenario we won’t talk directly to it, but will write through Logstash and read through Kibana.
Configure ElasticSearch
Edit elasticsearch-1.6.0/config/elasticsearch.yml and change the following options:
- cluster.name: elasticsearch-logstash
- discovery.zen.ping.multicast.enabled: false
This sets a name for our ES cluster, even though we will have only one node. The second option prevents ES from sending traffic into your local network to find other cluster nodes. To start ElasticSearch in the background use
./elasticsearch-1.6.0/bin/elasticsearch -d
Kibana
Kibana provides a browser based graphical user interface for ElasticSearch. It can be used to formulate queries and to generate charts. Charts can be placed on a dashboard, providing you a realtime overview. Kibana comes with its own HTTP-Server and can be run anywhere, as long as it can access ElasticSearch through the HTTP interface.
Configure Kibana
To see if our log messages reach ElasticSearch, we will need to download Kibana (I used Version 4.1.1). Start it with ./kibana-4.1.1-darwin-x64/bin/kibana
Open http://localhost:5601/ in your browser. When you run this for the first time, you have to select a pattern for indices. This is because Logstash creates an index for each day, following this pattern logstash-YYYY.MM.DD.
Now you can start exploring the logs. Klick on Discover in the main menu. It makes sense to deploy an application and generate some messages beforehand. Here are some tipps to get you started:
- Select time frame
In the upper part you can see a bar chart that shows the number of messages over time. Here you can zoom in with your mouse. If you click on the clock icon in the top right corner, you can select predefined time intervals. The chart will always reflect your current search results. - grep
A typical thing I do with log files is using grep to filter for some string. This can be emulated if you enter your string to match in the search field in the upper part. It will do a string search over all fields. If it will find only exact matches or in string matches as, depends on the index for this field. The message field which contains the text part of the log message, is always indexed in a way that you can match against parts of strings. - Select fields to display
On left side you can chose which fields should be shown in the search results. If you click on one of the selected fields on the left, you can filter for messages that have a certain value. - Count messages by severity over time
In the following screenshot you can see how create a chart that shows message count over time, grouped by severity
- Count messages for a java package over time
In the following screenshot you can see how create a chart that shows message count over time for logger that match “org.hornetq*”
This is really only a tiny part of what can be done with Kibana. For instance I completely left out how you can build dashboards using charts and querries.
Conclusion
I hope this guide could give you a start into ELK stack. I takes some time get used working with Kibana and ES, but once you start to feel comfortable, you don’t want to go back. Before you go into production, you should at least consider following points:
- Use the async handler provided by EAP 6. It helps to reduce the impact of lags while writing to Logstash. In the end you will have to decide what should happen if Logstash cannot process the messages as fast as your application produces them. Async appender supports both, blocking and discarding.
- In setups that involve several logging sources, it is a good idea to insert additional message queues in order to handle peaks. Common solutions are build with Redis or Kafka.
Filed under: DevOps, Enterprise Applications Tagged: EAP, EAP 6, elasticsearch, kibana, log4j, logging, logstash, wildfly
