1. Data Choices

The Data Choices module collects and publishes anonymous usage statistics to https://stats.opennms.org.

When a user with the Admin role logs into the system for the first time, they will be prompted as to whether or not they want to opt-in to publish these statistics. Statistics will only be published once an Administrator has opted-in.

Usage statistics can later be disabled by accessing the 'Data Choices' link in the 'Admin' menu.

When enabled, the following anonymous statistics will be collected and publish on system startup and every 24 hours after:

  • System ID (a randomly generated UUID)

  • OpenNMS Horizon Release

  • OpenNMS Horizon Version

  • OS Architecture

  • OS Name

  • OS Version

    1. Number of Alarms in the alarms table

    2. Number of Events in the events table

    3. Number of IP Interfaces in the ipinterface table

    4. Number of Nodes in the node table

    5. Number of Nodes, grouped by System OID

2. User Management

Users are entities with login accounts in the OpenNMS Horizon system. Ideally each user corresponds to a person. An OpenNMS Horizon User represents an actor which may be granted permissions in the system by associating Security Roles. OpenNMS Horizon stores by default User information and credentials in a local embedded file based storage. Credentials and user details, e.g. contact information, descriptions or Security Roles can be managed through the Admin Section in the Web User Interface.

Beside local Users, external LDAP service and SSO can be configured, but are not scope in this section. The following paragraphs describe how to manage the embedded User and Security Roles in OpenNMS Horizon.

2.1. Users

Managing Users is done through the Web User Interface and requires to login as a User with administrative permissions. By default the admin user is used to initially create and modify Users. The User, Password and other detail descriptions are persisted in users.xml file. It is not required to restart OpenNMS Horizon when User attributes are changed.

In case administrative tasks should be delegated to an User the Security Role named ROLE_ADMIN can be assigned.

Don’t delete the admin and rtc user. The RTC user is used for the communication of the Real-Time Console on the start page to calculate the node and service availability.
Change the default admin password to a secure password.
How to set a new password for any user
  1. Login as a User with administrative permissions

  2. Choose Configure OpenNMS from the user specific main navigation which is named as your login user name

  3. Choose Configure Users, Groups and On-Call roles and select Configure Users

  4. Click the Modify icon next to an existing User and select Reset Password

  5. Set a new Password, Confirm Password and click OK

  6. Click Finish to persist and apply the changes

How users can change their own password
  1. Login with user name and old password

  2. Choose Change Password from the user specific main navigation which is named as your login user name

  3. Select Change Password

  4. Identify yourself with the old password and set the new password and confirm

  5. Click Submit

  6. Logout and login with your new password

How to create or modify user
  1. Login as a User with administrative permissions

  2. Choose Configure OpenNMS from the user specific main navigation which is named as your login user name

  3. Choose Configure Users, Groups and On-Call roles and select Configure Users

  4. Use Add new user and type in a login name as User ID and a Password with confirmation or click Modify next to an existing User

  5. Optional: Fill in detailed User Information to provide more context information around the new user in the system

  6. Optional: Assign Security Roles to give or remove permissions in the system

  7. Optional: Provide Notification Information which are used in Notification targets to send messages to the User

  8. Optional: Set a schedule when a User should receive Notifications

  9. Click Finish to persist and apply the changes

By default a new User has the Security Role similar to ROLE_USER assigned. Acknowledgment and working with Alarms and Notifications is possible. The Configure OpenNMS administration menu is not available.
How to delete existing user
  1. Login as a User with administrative permissions

  2. Choose Configure OpenNMS from the user specific main navigation which is named as your login user name

  3. Choose Configure Users, Groups and On-Call roles and select Configure Users

  4. Use the trash bin icon next to the User to delete

  5. Confirm delete request with OK

2.2. Security Roles

A Security Roles is a set of permissions and can be assigned to an User. They regulate access to the Web User Interface and the ReST API to exchange monitoring and inventory information. In case of a distributed installation, the Minion or Remote Poller instances interact with OpenNMS Horizon and require specific permissions which are defined in the Security Roles ROLE_MINION and ROLE_REMOTING. The following Security Roles are available:

Table 1. Functions and existing system roles in OpenNMS Horizon
Security Role Name Description

anyone

In case the opennms-webapp-remoting package is installed, any user can download the Java Webstart installation package for the remote poller from http://opennms.server:8980/opennms-remoting/webstart/app.jnlp.

ROLE_ANONYMOUS

Allows HTTP OPTIONS request to show allowed HTTP methods on a ReST resources and the login and logout page of the Web User Interface.

ROLE_ADMIN

Permissions to create, read, update and delete in the Web User Interface and the ReST API.

ROLE_ASSET_EDITOR

Permissions to just update the asset records from nodes.

ROLE_DASHBOARD

Allow users to just have access to the Dashboard.

ROLE_DELEGATE

Allows actions (such as acknowledging an alarm) to be performed on behalf of another user.

ROLE_JMX

Allows retrieving JMX metrics but does not allow executing MBeans of the OpenNMS Horizon JVM, even if they just return simple values.

ROLE_MINION

Minimal amount of permissions required for a Minion to operate.

ROLE_MOBILE

Allow user to use OpenNMS COMPASS mobile application to acknowledge Alarms and Notifications via the ReST API.

ROLE_PROVISION

Allow user to use the Provisioning System and configure SNMP in OpenNMS Horizon to access management information from devices.

ROLE_READONLY

Limited to just read information in the Web User Interface and are no possibility to change Alarm states or Notifications.

ROLE_REMOTING

Permissions to allow access from a Remote Poller instance to exchange monitoring information.

ROLE_REST

Allow users interact with the whole ReST API of OpenNMS Horizon

ROLE_RTC

Exchange information with the OpenNMS Horizon Real-Time Console for availability calculations.

ROLE_USER

Default permissions of a new created user to interact with the Web User Interface which allow to escalate and acknowledge Alarms and Notifications.

How to manage Security Roles for Users:
  1. Login as a User with administrative permissions

  2. Choose Configure OpenNMS from the user specific main navigation which is named as your login user name

  3. Choose Configure Users, Groups and On-Call roles and select Configure Users

  4. Modify an existing User by clicking the modify icon next to the User

  5. Select the Role from Available Roles in the Security Roles section

  6. Use Add and Remove to assign or remove the Security Role from the User

  7. Click Finish to persist and apply the Changes

  8. Logout and Login to apply the new Security Role settings

How to add custom roles
  • Create a file called $OPENNMS_HOME/etc/security-roles.properties.

  • Add a property called roles, and for its value, a comma separated list of the custom roles, for example:

roles=operator,stage
  • After following the procedure to associate the security roles with users, the new custom roles will be available as shown on the following image:

custom roles :imagesdir: ../../images

2.3. Web UI Pre-Authentication

It is possible to configure OpenNMS Horizon to run behind a proxy that provides authentication, and then pass the pre-authenticated user to the OpenNMS Horizon webapp using a header.

The pre-authentication configuration is defined in $OPENNMS_HOME/jetty-webapps/opennms/WEB-INF/spring-security.d/header-preauth.xml. This file is automatically included in the Spring Security context, but is not enabled by default.

DO NOT configure OpenNMS Horizon in this manner unless you are certain the web UI is only accessible to the proxy and not to end-users. Otherwise, malicious attackers can craft queries that include the pre-authentication header and get full control of the web UI and ReST APIs.

2.3.1. Enabling Pre-Authentication

Edit the header-preauth.xml file, and set the enabled property:

<beans:property name="enabled" value="true" />

2.3.2. Configuring Pre-Authentication

There are a number of other properties that can be set to change the behavior of the pre-authentication plugin.

Property Description Default

enabled

Whether the pre-authentication plugin is active.

false

failOnError

If true, disallow login if the header is not set or the user does not exist. If false, fall through to other mechanisms (basic auth, form login, etc.)

false

userHeader

The HTTP header that will specify the user to authenticate as.

X-Remote-User

credentialsHeader

A comma-separated list of additional credentials (roles) the user should have.

3. Administrative Webinterface

3.1. Surveillance View

When networks are larger and contain devices of different priority, it becomes interesting to show at a glance how the "whole system" is working. The surveillance view aims to do that. By using categories, you can define a matrix which allows to aggregate monitoring results. Imagine you have 10 servers with 10 internet connections and some 5 PCs with DSL lines:

Servers Internet Connections

Super important

1 of 10

0 of 10

Slightly important

0 of 10

0 of 10

Vanity

4 of 10

0 of 10

The whole idea is to give somebody at a glance a hint on where the trouble is. The matrix-type of display allows a significantly higher aggregation than the simple list. In addition, the surveillance view shows nodes rather than services - an important tidbit of information when you look at categories. At a glance, you want to know how many of my servers have an issue rather than how many services in this category have an issue.

01 surveillance view
Figure 1. Example of a configured Surveillance View

The visual indication for outages in the surveillance view cells is defined as the following:

  • No services down: green as normal

  • One (1) service down: yellow as warning

  • More than one (1) services down: red as critical

This Surveillance View model also builds the foundation of the Dashboard View.

3.1.1. Default Surveillance View Configuration

Surveillance Views are defined in the surveillance-views.xml file. This file resides in the OpenNMS Horizon etc directory.

This file can be modified in a text editor and is reread every time the Surveillance View page is loaded. Thus, changes to this file do not require OpenNMS Horizon to be restarted.

The default configuration looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<surveillance-view-configuration
  xmlns:this="http://www.opennms.org/xsd/config/surveillance-views"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.opennms.org/xsd/config/surveillance-views http://www.opennms.org/xsd/config/surveillance-views.xsd"
  default-view="default" >
  <views >
    <view name="default" refresh-seconds="300" >
      <rows>
        <row-def label="Routers" >
          <category name="Routers"/>
        </row-def>
        <row-def label="Switches" >
          <category name="Switches" />
        </row-def>
        <row-def label="Servers" >
          <category name="Servers" />
        </row-def>
      </rows>
      <columns>
        <column-def label="PROD" >
          <category name="Production" />
        </column-def>
        <column-def label="TEST" >
          <category name="Test" />
        </column-def>
        <column-def label="DEV" >
          <category name="Development" />
        </column-def>
      </columns>
    </view>
  </views>
</surveillance-view-configuration>
Please note, that the old report-category attribute is deprecated and is no longer supported.

3.1.2. Configuring Surveillance Views

The Surveillance View configuration can also be modified using the Surveillance View Configurations editor on the OpenNMS Horizon Admin page.

02 surveillance view config ui
Figure 2. The Surveillance View Configurations UI

This page gives an overview of the configured Surveillance Views and allows the user to edit, remove or even preview the defined Surveillance View. Furthermore, the default Surveillance View can be selected using the checkbox in the DEFAULT column.

When editing a Surveillance View the user has to define the view’s title and the time in seconds between successive refreshes. On the left side of this dialog the defined rows, on the right side the defined columns are listed. Beside adding new entries an user can modify or delete existing entries. Furthermore, the position of an entry can be modified using the up/down buttons.

03 surveillance view config ui edit
Figure 3. Editing a Surveillance View

Editing row or column definitions require to choose an unique label for this entry and at least one OpenNMS Horizon category. When finished you can hit the Save button to persist your modified configuration or Cancel to close this dialog.

3.1.3. Categorizing Nodes

In order to categorize nodes in the Surveillance View, choose a node and click Edit beside Surveillance Category Memberships. Recalling from your Surveillance View, choose two categories that represent a column and a row, for example, Servers and Test, then click Add.

3.1.4. Creating Views for Users and Groups

You can use user and group names for Surveillance Views. When the Surveillance View page is invoked the following criteria selects the proper Surveillance View to be displayed. The first matching item wins:

  1. Surveillance View name equal to the user name they used when logging into OpenNMS Horizon.

  2. Surveillance View name equal to the user’s assigned OpenNMS Horizon group name

  3. Surveillance View name equal to the default-view attribute in the surveillance-views.xml configuration file.

3.2. Dashboard

In Network Operation Centers NOC an overview about issues in the network is important and often described as Dashboards. Large networks have people (Operator) with different responsibilities and the Dashboard should show only information for a given monitoring context. Network or Server operator have a need to customize or filter information on the Dashboard. A Dashboard as an At-a-glance overview is also often used to give an entry point for more detailed diagnosis through the information provided by the monitoring system. The Surveillance View allows to reduce the visible information by selecting rows, columns and cells to quickly limit the amount of information to navigate through.

3.2.1. Components

The Dashboard is built with five components:

  • Surveillance View: Allows to model a monitoring context for the Dashboard.

  • Alarms: Shows unacknowledged Alarms which should be escalated by an Operator.

  • Notifications: Shows outstanding and unacknowledged notifications sent to Engineers.

  • Node Status: Shows all ongoing network Outages.

  • Resource Graph Viewer: Shows performance time series reports for performance diagnosis.

The following screenshot shows a configured Dashboard and which information are displayed in the components.

01 dashboard overall
Figure 4. Dashboard with configured surveillance view and current outage

The following section describe the information shown in each component. All other components display information based on the Surveillance View.

Surveillance View

The Surveillance View has multiple functions.

  • Allows to model the monitoring context and shows service and node Outages in compact matrix view.

  • Allows to limit the number of information in the Dashboard by selecting rows, columns and cells.

You can select columns, rows, single cells and of course all entries in a Surveillance View. Please refer to the Surveillance View Section for details on how to configure Surveillance Views.

02 dashboard surveillance view
Figure 5. The Surveillance View forms the basis for the Dashboard page.
Alarms

The Alarms component gives an overview about all unacknowledged Alarms with a severity higher than Normal(1). Acknowledged Alarms will be removed from the responsibility of the Operator. The following information are shown in:

03 dashboard alarms
Figure 6. Information displayed in the Alarms component
  1. Node: Node label of the node the Alarm is associated

  2. Severity: Severity of the Alarm

  3. UEI: Shows the UEI of the Alarm

  4. Count: Number of Alarms deduplicated by the reduction key of the Alarm

  5. Last Time: Time for the last occurrence of the Alarm

  6. Log Msg: The log message from the Event which is the source for this Alarm. It is specified in the event configuration file in <logmsg />

The Alarms component shows the most recent Alarms and allows the user to scroll through the last 100 Alarms.

Notifications

To inform people on a duty schedule notifications are used and force action to fix or reconfigure systems immediately. In OpenNMS Horizon it is possible to acknowledge notifications to see who is working on a specific issue. The Dashboard should show outstanding notifications in the NOC to provide an overview and give the possibility for intervention.

04 dashboard notifications
Figure 7. Information displayed in the Notifications component
  1. Node: Label of the monitored node the notification is associated with

  2. Service: Name of the service the notification is associated with

  3. Message: Message of the notification

  4. Sent Time: Time when the notification was sent

  5. Responder: User name who acknowledged the notification

  6. Response Time: Time when the user acknowledged the notification

The Notifications component shows the most recent unacknowledged notifications and allows the user to scroll through the last 100 Notifications.

Node Status

An acknowledged Alarm doesn’t mean necessarily the outage is solved. To give an overview information about ongoing Outages in the network, the Dashboard shows an outage list in the Node Status component.

05 dashboard outages
Figure 8. Information displayed in the Node Status component
  1. Node: Label of the monitored node with ongoing outages.

  2. Current Outages: Number of services on the node with outages and total number of monitored services, e.g. with the natural meaning of "3 of 3 services are affected".

  3. 24 Hour Availability: Availability of all services provided by the node calculated by the last 24 hours.

Resource Graph Viewer

To give a quick entry point diagnose performance issues a Resource Graph Viewer allows to navigate to time series data reports which are filtered in the context of the Surveillance View.

06 dashboard resource graphs
Figure 9. Show time series based performance with the Resource Graph Viewer

It allows to navigate sequentially through resource graphs provided by nodes filtered by the Surveillance View context and selection and shows one graph report at a time.

3.2.2. Advanced configuration

The Surveillance View component allows to model multiple views for different monitoring contexts. It gives the possibility to create special view as example for network operators or server operators. The Dashboard shows only one configured Surveillance View. To give different users the possibility using their Surveillance View fitting there requirements it is possible to map a logged in user to a given Surveillance View used in the Dashboard.

The selected nodes from the Surveillance View are also aware of User Restriction Filter. If you have a group of users, which should see just a subset of nodes the Surveillance View will filter nodes which are not related to the assigned user group.

The Dashboard is designed to focus, and therefore also restrict, a user’s view to devices of their interest. To do this, a new role was added that can be assigned to a user that restricts them to viewing only the Dashboard if that is intended.

Using the Dashboard role

The following example illustrates how this Dashboard role can be used. For instance the user drv4doe is assigned the dashboard role. So, when logging in as drv4doe, the user is taking directly to the Dashboard page and is presented with a custom Dashboard based on the drv4doe Surveillance View definition.

Step 1: Create an user

The following example assigns a Dashboard to the user "drv4doe" (a router and switch jockey) and restricts the user for navigation to any other link in the OpenNMS Horizon WebUI.

07 dashboard add user
Figure 10. Creating the user drv4doe using the OpenNMS Horizon WebUI
Step 2: Change Security Roles

Now, add the ROLE_PROVISION role to the user through the WebUI or by manually editing the users.xml file in the /opt/opennms/etc directory for the user drv4doe.

08 dashboard user roles
Figure 11. Adding dashboard role to the user drv4doe using the OpenNMS Horizon WebUI
<user>
    <user-id>drv4doe</user-id>
    <full-name>Dashboard User</full-name>
    <password salt="true">6FOip6hgZsUwDhdzdPUVV5UhkSxdbZTlq8M5LXWG5586eDPa7BFizirjXEfV/srK</password>
    <role>ROLE_DASHBOARD</role>
</user>
Step 3: Define Surveillance View

Edit the $OPENNMS_HOME/etc/surveilliance-view.xml file to add a definition for the user drv4doe, which you created in step 1.

<?xml version="1.0" encoding="UTF-8"?>
<surveillance-view-configuration
  xmlns:this="http://www.opennms.org/xsd/config/surveillance-views"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.opennms.org/xsd/config/surveillance-views http://www.opennms.org/xsd/config/surveillance-views.xsd"
  default-view="default" >
  <views >
    <view name="drv4doe" refresh-seconds="300" >
      <rows>
        <row-def label="Servers" >
          <category name="Servers"/>
        </row-def>
      </rows>
      <columns>
        <column-def label="PROD" >
          <category name="Production" />
        </column-def>
        <column-def label="TEST" >
          <category name="Test" />
        </column-def>
      </columns>
    </view>
   <!-- default view here -->
    <view name="default" refresh-seconds="300" >
      <rows>
        <row-def label="Routers" >
          <category name="Routers"/>
        </row-def>
        <row-def label="Switches" >
          <category name="Switches" />
        </row-def>
        <row-def label="Servers" >
          <category name="Servers" />
        </row-def>
      </rows>
      <columns>
        <column-def label="PROD" >
          <category name="Production" />
        </column-def>
        <column-def label="TEST" >
          <category name="Test" />
        </column-def>
        <column-def label="DEV" >
          <category name="Development" />
        </column-def>
      </columns>
    </view>
  </views>
</surveillance-view-configuration>

This configuration and proper assignment of node categories will produce a default Dashboard for all users, other than drv4doe.

You can hide the upper navigation on any page by specifying ?quiet=true; adding it to the end of the OpenNMS Horizon URL. This is very handy when using the dashboard on a large monitor or tv screen for office wide viewing.

However, when logging in as drv4doe, the user is taking directly to the Dashboard page and is presented with a Dashboard based on the custom Surveillance View definition.

The drv4doe user is not allowed to navigate to URLs other than the dashboard.jsp URL. Doing so will result in an Access Denied error.
Anonymous dashboards

You can modify the configuration files for the security framework to give you access to one or more dashboards without logging in. At the end you’ll be able to point a browser at a special URL like http://…​/opennms/dashboard1 or http://…​/opennms/dashboard2 and see a dashboard without any authentication. First, configure surveillance views and create dashboard users as above. For example, make two dashboards and two users called dashboard1 and dashboard2. Test that you can log in as each of the new users and see the correct dashboard. Now create some aliases you can use to distinguish between dashboards. In /opt/opennms/jetty-webapps/opennms/WEB-INF, edit web.xml. Just before the first <servlet-mapping> tag, add the following servlet entries:

  <servlet>
       <servlet-name>dashboard1</servlet-name>
       <jsp-file>/dashboard.jsp</jsp-file>
  </servlet>

  <servlet>
       <servlet-name>dashboard2</servlet-name>
       <jsp-file>/dashboard.jsp</jsp-file>
  </servlet>

Just before the first <error-page> tag, add the following servlet-mapping entries:

  <servlet-mapping>
       <servlet-name>dashboard1</servlet-name>
       <url-pattern>/dashboard1</url-pattern>
  </servlet-mapping>

  <servlet-mapping>
       <servlet-name>dashboard2</servlet-name>
       <url-pattern>/dashboard2</url-pattern>
  </servlet-mapping>

After the last <filter-mapping> tag, add the following filter-mapping entries:

  <filter-mapping>
    <filter-name>AddRefreshHeader-120</filter-name>
    <url-pattern>/dashboard.jsp</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>AddRefreshHeader-120</filter-name>
    <url-pattern>/dashboard1</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>AddRefreshHeader-120</filter-name>
    <url-pattern>/dashboard2</url-pattern>
  </filter-mapping>

Next edit applicationContext-acegi-security.xml to enable anonymous authentication for the /dashboard1 and /dashboard2 aliases. Near the top of the file, find <bean id="filterChainProxy" …​>. Below the entry for /rss.jsp*, add an entry for each of the dashboard aliases:

  <bean id="filterChainProxy" class="org.acegisecurity.util.FilterChainProxy">
    <property name="filterInvocationDefinitionSource">
      <value>
        CONVERT_URL_TO_LOWERCASE_BEFORE_COMPARISON
        PATTERN_TYPE_APACHE_ANT
        /rss.jsp*=httpSessionContextIntegrationFilter,logoutFilter,authenticationProcessingFilter,basicProcessingFilter,securityContextHolderAwareRequestFilter,anonymousProcessingFilter,basicExceptionTranslationFilter,filterInvocationInterceptor
        /dashboard1*=httpSessionContextIntegrationFilter,logoutFilter,securityContextHolderAwareRequestFilter,dash1AnonymousProcessingFilter,filterInvocationInterceptor
        /dashboard2*=httpSessionContextIntegrationFilter,logoutFilter,securityContextHolderAwareRequestFilter,dash2AnonymousProcessingFilter,filterInvocationInterceptor
        /**=httpSessionContextIntegrationFilter,logoutFilter,authenticationProcessingFilter,basicProcessingFilter,securityContextHolderAwareRequestFilter,anonymousProcessingFilter,exceptionTranslationFilter,filterInvocationInterceptor

...

About halfway through the file, look for <bean id="filterInvocationInterceptor" …​>. Below the entry for /dashboard.jsp, add an entry for each of the aliases:

  <bean id="filterInvocationInterceptor" class="org.acegisecurity.intercept.web.FilterSecurityInterceptor">

...

        /frontpage.htm=ROLE_USER,ROLE_DASHBOARD
        /dashboard.jsp=ROLE_USER,ROLE_DASHBOARD
        /dashboard1=ROLE_USER,ROLE_DASHBOARD
        /dashboard2=ROLE_USER,ROLE_DASHBOARD
        /gwt.js=ROLE_USER,ROLE_DASHBOARD

...

Finally, near the bottom of the page, add a new instance of AnonymousProcessingFilter for each alias.

  <!-- Set the anonymous username to dashboard1 so the dashboard page
       can match it to a surveillance view of the same name. -->
  <bean id="dash1AnonymousProcessingFilter" class="org.acegisecurity.providers.anonymous.AnonymousProcessingFilter">
    <property name="key"><value>foobar</value></property>
    <property name="userAttribute"><value>dashboard1,ROLE_DASHBOARD</value></property>
  </bean>

  <bean id="dash2AnonymousProcessingFilter" class="org.acegisecurity.providers.anonymous.AnonymousProcessingFilter">
    <property name="key"><value>foobar</value></property>
    <property name="userAttribute"><value>dashboard2,ROLE_DASHBOARD</value></property>
  </bean>

Restart OpenNMS Horizon and you should bring up a dashboard at http://…​/opennms/dashboard1 without logging in.

There’s no way to switch dashboards without closing the browser (or deleting the JSESSIONID session cookie).
If you accidentally click a link that requires full user privileges (e.g. Node List), you’ll be given a login form. Once you get to the login form, there’s no going back to the dashboard without restarting the browser. If this problem bothers you, you can set ROLE_USER in addition to ROLE_DASHBOARD in your userAttribute property. However this will give full user access to anonymous browsers.

3.3. Grafana Dashboard Box

Grafana provides an API key which gives access for 3rd party application like OpenNMS Horizon. The Grafana Dashboard Box on the start page shows dashboards related to OpenNMS Horizon. To filter relevant dashboards, you can use a tag for dashboards and make them accessible. If no tag is provided all dashboards from Grafana will be shown.

The feature is by default deactivated and is configured through opennms.properties. Please note that this feature works with the Grafana API v2.5.0.

Quick access to Grafana dashboards from the OpenNMS Horizon start page

01 grafana box

Table 2. Grafana Dashboard configuration properties
Name Type Description Default

org.opennms.grafanaBox.show

Boolean

This setting controls whether a grafana box showing the available dashboards is placed on the landing page. The two valid options for this are true or false.

false

org.opennms.grafanaBox.hostname

String

If the box is enabled you also need to specify hostname of the Grafana server

localhost

org.opennms.grafanaBox.port

Integer

The port of the Grafana server ReST API

3000

org.opennms.grafanaBox.basePath

String

The Grafana base path to be used

org.opennms.grafanaBox.apiKey

String

The API key is needed for the ReST calls to work

org.opennms.grafanaBox.tag

String

When a tag is specified only dashboards with this given tag will be displayed. When no tag is given all dashboards will be displayed

org.opennms.grafanaBox.protocol

String

The protocol for the ReST call can also be specified

http

org.opennms.grafanaBox.connectionTimeout

Integer

Timeout in milliseconds for getting information from the Grafana server

500

org.opennms.grafanaBox.soTimeout

Integer

Socket timeout

500

org.opennms.grafanaBox.dashboardLimit

Integer

Maximum number of entries to be displayed (0 for unlimited)

0

If you have Grafana behind a proxy it is important the org.opennms.grafanaBox.hostname is reachable. This host name is used to generate links to the Grafana dashboards.

The process to generate an Grafana API Key can be found in the HTTP API documentation. Copy the API Key to opennms.properties as org.opennms.grafanaBox.apiKey.

3.4. Operator Board

In a network operation center (NOC) the Ops Board can be used to visualize monitoring information. The monitoring information for various use-cases are arranged in configurable Dashlets. To address different user groups it is possible to create multiple Ops Boards.

There are two visualisation components to display Dashlets:

  • Ops Panel: Shows multiple Dashlets on one screen, e.g. on a NOC operators workstation

  • Ops Board: Shows one Dashlet at a time in rotation, e.g. for a screen wall in a NOC

01 opspanel concept
Figure 12. Concept of Dashlets displayed in Ops Panel
02 opsboard concept
Figure 13. Concept to show Dashlets in rotation on the Ops Board

3.4.1. Configuration

To create and configure Ops Boards administration permissions are required. The configuration section is in admin area of OpenNMS Horizon and named Ops Board Config Web Ui.

03 admin configure opsboard
Figure 14. Navigation to the Ops Board configuration

Create or modify Ops Boards is described in the following screenshot.

04 add dashlet
Figure 15. Adding a Dashlet to an existing Ops Board
  1. Create a new Ops Board to organize and arrange different Dashlets

  2. The name to identify the Ops Board

  3. Add a Dashlet to show OpenNMS Horizon monitoring information

  4. Show a preview of the whole Ops Board

  5. List of available Dashlets

  6. Priority for this Dashlet in Ops Board rotation, lower priority means it will be displayed more often

  7. Duration in seconds for this Dashlet in the Ops Board rotation

  8. Change Priority if the Dashlet is in alert state, this is optional and maybe not available in all Dashlets

  9. Change Duration if the Dashlet is in alert state, it is optional and maybe not available in all Dashlets

  10. Configuration properties for this Dashlet

  11. Remove this Dashlet from the Ops Board

  12. Order Dashlets for the rotation on the Ops Board and the tile view in the Ops Panel

  13. Show a preview for the whole Ops Board

The configured Ops Board can be used by navigating in the main menu to Dashboard → Ops Board.

05 opsboard user
Figure 16. Navigation to use the Ops Board

3.4.2. Dashlets

Visualization of information is implemented in Dashlets. The different Dashlets are described in this section with all available configuration parameter.

To allow filter information the Dashlet can be configured with a generic Criteria Builder.

Alarm Details

This Alarm-Details Dashlet shows a table with alarms and some detailed information.

Table 3. Information of the alarms
Field Description

Alarm ID

OpenNMS Horizon ID for the alarm

Severity

Alarm severity (Cleared, Indeterminate, Normal, Warning, Minor, Major, Critical)

Node label

Node label of the node where the alarm occurred

Alarm count

Alarm count based on reduction key for deduplication

Last Event Time

Last time the alarm occurred

Log Message

Reason and detailed log message of the alarm

The Alarm Details Dashlet can be configured with the following parameters.

Boost support

Boosted Severity

Configuration

Criteria Builder

Alarms

This Alarms Dashlet shows a table with a short alarm description.

Table 4. Information of the alarm
Field Description

Time

Absolute time since the alarm appeared

Node label

Node label of the node where the alarm occurred

UEI

OpenNMS Horizon Unique Event Identifier for this alarm

The Alarms Dashlet can be configured with the following parameters.

Boost support

Boosted Severity

Configuration

Criteria Builder

Charts

This Dashlet displays an existing Chart.

Boost support

false

Chart

Name of the existing chart to display

Maximize Width

Rescale the image to fill display width

Maximize Height

Rescale the image to fill display height

Grafana

This Dashlet shows a Grafana Dashboard for a given time range. The Grafana Dashboard Box configuration defined in the opennms.properties file is used to access the Grafana instance.

Boost support

false

title

Title of the Grafana dashboard to be displayed

uri

URI to the Grafana Dashboard to be displayed

from

Start of time range

to

End of time range

Image

This Dashlet displays an image by a given URL.

Boost support

false

imageUrl

URL with the location of the image to show in this Dashlet

maximizeHeight

Rescale the image to fill display width

maximizeWidth

Rescale the image to fill display height

KSC

This Dashlet shows an existing KSC report. The view is exact the same as the KSC report is build regarding order, columns and time spans.

Boost support

false

KSC-Report

Name of the KSC report to show in this Dashlet

Map

This Dashlet displays the geographical map.

Boost support

false

search

Predefined search for a subset of nodes shown in the geographical map in this Dashlet

RRD

This Dashlet shows one or multiple RRD graphs. It is possible to arrange and order the RRD graphs in multiple columns and rows. All RRD graphs are normalized with a given width and height.

Boost support

false

Columns

Number of columns within the Dashlet

Rows

Number of rows with the Dashlet

KSC Report

Import RRD graphs from an existing KSC report and re-arrange them.

Graph Width

Generic width for all RRD graphs in this Dashlet

Graph Height

Generic height for all RRD graphs in this Dashlet

Timeframe value

Number of the given Timeframe type

Timeframe type

Minute, Hour, Day, Week, Month and Year for all RRD graphs

RTC

This Dashlet shows the configured SLA categories from the OpenNMS Horizon start page.

Boost support

false

-

-

Summary

This Dashlet shows a trend of incoming alarms in given time frame.

Boost support

Boosted Severity

timeslot

Time slot in seconds to evaluate the trend for alarms by severity and UEI.

Surveillance

This Dashlet shows a given Surveillance View.

Boost support

false

viewName

Name of the configured Surveillance View

Topology

This Dashlet shows a Topology Map. The Topology Map can be configured with the following parameter.

Boost support

false

focusNodes

Which node(s) is in focus for the topology

provider

Which topology should be displayed, e.g. Linkd, VMware

szl

Set the zoom level for the topology

URL

This Dashlet shows the content of a web page or other web application, e.g. other monitoring systems by a given URL.

Boost support

false

password

Optional password if a basic authentication is required

url

URL to the web application or web page

username

Optional username if a basic authentication is required

3.4.3. Boosting Dashlet

The behavior to boost a Dashlet describes the behavior of a Dashlet showing critical monitoring information. It can raise the priority in the Ops Board rotation to indicate a problem. This behavior can be configured with the configuration parameter Boost Priority and Boost Duration. These to configuration parameter effect the behavior on the Ops Board in rotation.

  • Boost Priority: Absolute priority of the Dashlet with critical monitoring information.

  • Boost Duration: Absolute duration in seconds of the Dashlet with critical monitoring information.

3.4.4. Criteria Builder

The Criteria Builder is a generic component to filter information of a Dashlet. Some Dashlets use this component to filter the shown information on a Dashlet for certain use case. It is possible to combine multiple Criteria to display just a subset of information in a given Dashlet.

Table 5. Generic Criteria Builder configuration possibilities
Restriction Property Value 1 Value 2 Description

Asc

-

-

-

ascending order

Desc

-

-

-

descending order

Between

database attribute

String

String

Subset of data between value 1 and value 2

Contains

database attribute

String

-

Select all data which contains a given text string in a given database attribute

Distinct

database attribute

-

-

Select a single instance

Eq

database attribute

String

-

Select data where attribute equals (==) a given text string

Ge

database attribute

String

-

Select data where attribute is greater equals than (>=) a given text value

Gt

database attribute

String

-

Select data where attribute is greater than (>) a given text value

Ilike

database attribute

String

-

unknown

In

database attribute

String

-

unknown

Iplike

database attribute

String

-

Select data where attribute matches an given IPLIKE expression

IsNull

database attribute

-

-

Select data where attribute is null

IsNotNull

database attribute

-

-

Select data where attribute is not null

IsNotNull

database attribute

-

-

Select data where attribute is not null

Le

database attribute

String

-

Select data where attribute is less equals than () a given text value

Lt

database attribute

String

-

Select data where attribute is less than (<) a given text value

Le

database attribute

String

-

Select data where attribute is less equals than () a given text value

Like

database attribute

String

-

Select data where attribute is like a given text value similar to SQL like

Limit

-

Integer

-

Limit the result set by a given number

Ne

database attribute

String

-

Select data where attribute is not equals (!=) a given text value

Not

database attribute

String

-

unknown difference between Ne

OrderBy

database attribute

-

-

Order the result set by a given attribute

3.5. JMX Configuration Generator

OpenNMS Horizon implements the JMX protocol to collect long term performance data for Java applications. There are a huge variety of metrics available and administrators have to select which information should be collected. The JMX Configuration Generator Tools is build to help generating valid complex JMX data collection configuration and RRD graph definitions for OpenNMS Horizon.

This tool is available as CLI and a web based version.

3.5.1. Web based utility

Complex JMX data collection configurations can be generated from a web based tool. It collects all available MBean Attributes or Composite Data Attributes from a JMX enabled Java application.

The workflow of the tool is:

  1. Connect with JMX or JMXMP against a MBean Server provided of a Java application

  2. Retrieve all MBean and Composite Data from the application

  3. Select specific MBeans and Composite Data objects which should be collected by OpenNMS Horizon

  4. Generate JMX Collectd configuration file and RRD graph definitions for OpenNMS Horizon as downloadable archive

The following connection settings are supported:

  • Ability to connect to MBean Server with RMI based JMX

  • Authentication credentials for JMX connection

  • Optional: JMXMP connection

The web based configuration tool can be used in the OpenNMS Horizon Web Application in administration section Admin → JMX Configuration Generator.

Configure JMX Connection

At the beginning the connection to an MBean Server of a Java application has to be configured.

01 webui connection
Figure 17. JMX connection configuration window
  • Service name: The name of the service to bind the JMX data collection for Collectd

  • Host: IP address or FQDN connecting to the MBean Server to load MBeans and Composite Data into the generation tool

  • Port: Port to connect to the MBean Server

  • Authentication: Enable / Disable authentication for JMX connection with username and password

  • Skip non-number values: Skip attributes with non-number values

  • JMXMP: Enable / Disable JMX Messaging Protocol instead of using JMX over RMI

By clicking the arrow ( > ) the MBeans and Composite Data will be retrieved with the given connection settings. The data is loaded into the MBeans Configuration screen which allows to select metrics for the data collection configuration.

Select MBeans and Composite

The MBeans Configuration section is used to assign the MBean and Composite Data attributes to RRD domain specific data types and data source names.

02 webui mbean selection
Figure 18. Select MBeans or Composite Data for OpenNMS Horizon data collection

The left sidebar shows the tree with the JMX Domain, MBeans and Composite Data hierarchy retrieved from the MBean Server. To select or deselect all attributes use Mouse right click → select/deselect.

The right panel shows the MBean Attributes with the RRD specific mapping and allows to select or deselect specific MBean Attriubtes or Composite Data Attributes for the data collection configuration.

03 webui mbean details
Figure 19. Configure MBean attributes for data collection configuration
04 webui composite details
Figure 20. Configure Composite attributes for data collection configuration
  • MBean Name or Composite Alias: Identifies the MBean or the Composite Data object

  • Selected: Enable/Disable the MBean attribute or Composite Member to be included in the data collection configuration

  • Name: Name of the MBean attribute or Composite Member

  • Alias: the data source name for persisting measurements in RRD or JRobin file

  • Type: Gauge or Counter data type for persisting measurements in RRD or JRobin file

The MBean Name, Composite Alias and Name are validated against special characters. For the Alias inputs are validated to be not longer then 19 characters and have to be unique in the data collection configuration.

Download and include configuration

The last step is generating the following configuration files for OpenNMS Horizon:

  • collectd-configuration.xml: Generated sample configuration assigned to a service with a matching data collection group

  • jmx-datacollection-config.xml: Generated JMX data collection configuration with the selected MBeans and Composite Data

  • snmp-graph.properties: Generated default RRD graph definition files for all selected metrics

The content of the configuration files can be copy & pasted or can be downloaded as ZIP archive.

If the content of the configuration file exceeds 2,500 lines, the files can only be downloaded as ZIP archive.

3.5.2. CLI based utility

The command line (CLI) based tool is not installed by default. It is available as Debian and RPM package in the official repositories.

Installation
RHEL based installation with Yum
yum install opennms-jmx-config-generator
Debian based installation with apt
apt-get install opennms-jmx-config-generator
Installation from source

It is required to have the Java 8 Development Kit with Apache Maven installed. The mvn binary has to be in the path environment. After cloning the repository you have to enter the source folder and compile an executable JAR.

cd opennms/features/jmx-config-generator
mvn package

Inside the newly created target folder a file named jmxconfiggenerator-<VERSION>-onejar.jar is present. This file can be invoked by:

java -jar target/jmxconfiggenerator-23.0.0-SNAPSHOT-onejar.jar
Usage

After installing the the JMX Config Generator the tool’s wrapper script is located in the ${OPENNMS_HOME}/bin directory.

$ cd /path/to/opennms/bin
$ ./jmx-config-generator
When invoked without parameters the usage and help information is printed.

The JMX Config Generator uses sub-commands for the different configuration generation tasks. Each of these sub-commands provide different options and parameters. The command line tool accepts the following sub-commands.

Sub-command Description

query

Queries a MBean Server for certain MBeans and attributes.

generate-conf

Generates a valid jmx-datacollection-config.xml file.

generate-graph

Generates a RRD graph definition file with matching graph definitions for a given jmx-datacollection-config.xml.

The following global options are available in each of the sub-commands of the tool:

Option/Argument Description Default

-h (--help)

Show help and usage information.

false

-v (--verbose)

Enables verbose mode for debugging purposes.

false

Sub-command: query

This sub-command is used to query a MBean Server for it’s available MBean objects. The following example queries the server myserver with the credentials myusername/mypassword on port 7199 for MBean objects in the java.lang domain.

./jmx-config-generator query --host myserver --username myusername --password mypassword --port 7199 "java.lang:*"
java.lang:type=ClassLoading
	description: Information on the management interface of the MBean
	class name: sun.management.ClassLoadingImpl
	attributes: (5/5)
		TotalLoadedClassCount
			id: java.lang:type=ClassLoading:TotalLoadedClassCount
			description: TotalLoadedClassCount
			type: long
			isReadable: true
			isWritable: false
			isIs: false
		LoadedClassCount
			id: java.lang:type=ClassLoading:LoadedClassCount
			description: LoadedClassCount
			type: int
			isReadable: true
			isWritable: false
			isIs: false

<output omitted>

The following command line options are available for the query sub-command.

Option/Argument Description Default

<filter criteria>

A filter criteria to query the MBean Server for. The format is <objectname>[:attribute name]. The <objectname> accepts the default JMX object name pattern to identify the MBeans to be retrieved. If null all domains are shown. If no key properties are specified, the domain’s MBeans are retrieved. To execute for certain attributes, you have to add :<attribute name>. The <attribute name> accepts regular expressions. When multiple <filter criteria> are provided they are OR concatenated.

-

--host <host>

Hostname or IP address of the remote JMX host.

-

--ids-only

Only show the ids of the attributes.

false

--ignore <filter criteria>

Set <filter criteria> to ignore while running.

-

--include-values

Include attribute values.

false

--jmxmp

Use JMXMP and not JMX over RMI.

false

--password <password>

Password for JMX authentication.

-

--port <port>

Port of JMX service.

 -

--show-domains

Only lists the available domains.

true

--show-empty

Includes MBeans, even if they do not have attributes. Either due to the <filter criteria> or while there are none.

false

--url <url>

Custom connection URL
<hostname>:<port>
service:jmx:<protocol>:<sap>
service:jmx:remoting-jmx://<hostname>:<port>

-

--username <username>

Username for JMX authentication.

 -

-h (--help)

Show help and usage information.

false

-v (--verbose)

Enables verbose mode for debugging purposes.

false

Sub-command: generate-conf

This sub-command can be used to generate a valid jmx-datacollection-config.xml for a given set of MBean objects queried from a MBean Server.

The following example generate a configuration file myconfig.xml for MBean objects in the java.lang domain of the server myserver on port 7199 with the credentials myusername/mypassword. You have to define either an URL or a hostname and port to connect to a JMX server.

jmx-config-generator generate-conf --host myserver --username myusername --password mypassword --port 7199 "java.lang:*" --output myconfig.xml
Dictionary entries loaded: '18'

The following options are available for the generate-conf sub-command.

Option/Argument Description Default

<attribute id>

A list of attribute Ids to be included for the generation of the configuration file.

 -

--dictionary <file>

Path to a dictionary file for replacing attribute names and part of MBean attributes. The file should have for each line a replacement, e.g. Auxillary:Auxil.

-

--host <host>

Hostname or IP address of JMX host.

-

--jmxmp

Use JMXMP and not JMX over RMI.

false

--output <file>

Output filename to write generated jmx-datacollection-config.xml.

-

--password <password>

Password for JMX authentication.

 -

--port <port>

Port of JMX service

-

--print-dictionary

Prints the used dictionary to STDOUT. May be used with --dictionary

false

--service <value>

The Service Name used as JMX data collection name.

anyservice

--skipDefaultVM

Skip default JavaVM Beans.

false

--skipNonNumber

Skip attributes with non-number values

false

--url <url>

Custom connection URL
<hostname>:<port>
service:jmx:<protocol>:<sap>
service:jmx:remoting-jmx://<hostname>:<port>

-

--username <username>

Username for JMX authentication

-

-h (--help)

Show help and usage information.

false

-v (--verbose)

Enables verbose mode for debugging purposes.

false

The option --skipDefaultVM offers the ability to ignore the MBeans provided as standard by the JVM and just create configurations for the MBeans provided by the Java Application itself. This is particularly useful if an optimized configuration for the JVM already exists. If the --skipDefaultVM option is not set the generated configuration will include the MBeans of the JVM and the MBeans of the Java Application.
Check the file and see if there are alias names with more than 19 characters. This errors are marked with NAME_CRASH_AS_19_CHAR_VALUE
Sub-command: generate-graph

This sub-command generates a RRD graph definition file for a given configuration file. The following example generates a graph definition file mygraph.properties using the configuration in file myconfig.xml.

./jmx-config-generator generate-graph --input myconfig.xml --output mygraph.properties
reports=java.lang.ClassLoading.MBeanReport, \
java.lang.ClassLoading.0TotalLoadeClassCnt.AttributeReport, \
java.lang.ClassLoading.0LoadedClassCnt.AttributeReport, \
java.lang.ClassLoading.0UnloadedClassCnt.AttributeReport, \
java.lang.Compilation.MBeanReport, \
<output omitted>

The following options are available for this sub-command.

Option/Argument Description Default

--input <jmx-datacollection.xml>

Configuration file to use as input to generate the graph properties file

 -

--output <file>

Output filename for the generated graph properties file.

 -

--print-template

Prints the default template.

 false

--template <file>

Template file using Apache Velocity template engine to be used to generate the graph properties.

 -

-h (--help)

Show help and usage information.

false

-v (--verbose)

Enables verbose mode for debugging purposes.

false

Graph Templates

The JMX Config Generator uses a template file to generate the graphs. It is possible to use a user-defined template. The option --template followed by a file lets the JMX Config Generator use the external template file as base for the graph generation. The following example illustrates how a custom template mytemplate.vm is used to generate the graph definition file mygraph.properties using the configuration in file myconfig.xml.

./jmx-config-generator generate-graph --input myconfig.xml --output mygraph.properties --template mytemplate.vm

The template file has to be an Apache Velocity template. The following sample represents the template that is used by default:

reports=#foreach( $report in $reportsList )
${report.id}#if( $foreach.hasNext ), \
#end
#end

#foreach( $report in $reportsBody )

#[[###########################################]]#
#[[##]]# $report.id
#[[###########################################]]#
report.${report.id}.name=${report.name}
report.${report.id}.columns=${report.graphResources}
report.${report.id}.type=interfaceSnmp
report.${report.id}.command=--title="${report.title}" \
 --vertical-label="${report.verticalLabel}" \
#foreach($graph in $report.graphs )
 DEF:${graph.id}={rrd${foreach.count}}:${graph.resourceName}:AVERAGE \
 AREA:${graph.id}#${graph.coloreB} \
 LINE2:${graph.id}#${graph.coloreA}:"${graph.description}" \
 GPRINT:${graph.id}:AVERAGE:" Avg \\: %8.2lf %s" \
 GPRINT:${graph.id}:MIN:" Min \\: %8.2lf %s" \
 GPRINT:${graph.id}:MAX:" Max \\: %8.2lf %s\\n" \
#end

#end

The JMX Config Generator generates different types of graphs from the jmx-datacollection-config.xml. The different types are listed below:

Type Description

AttributeReport

For each attribute of any MBean a graph will be generated. Composite attributes will be ignored.

MbeanReport

For each MBean a combined graph with all attributes of the MBeans is generated. Composite attributes will be ignored.

CompositeReport

For each composite attribute of every MBean a graph is generated.

CompositeAttributeReport

For each composite member of every MBean a combined graph with all composite attributes is generated.

3.6. Heatmap

The Heatmap can be either be used to display unacknowledged alarms or to display ongoing outages of nodes. Each of this visualizations can be applied on categories, foreign sources or services of nodes. The sizing of an entity is calculated by counting the services inside the entity. Thus, a node with fewer services will appear in a smaller box than a node with more services.

The feature is by default deactivated and is configured through opennms.properties.

Heatmap visualizations of alarms

heatmap

Table 6. Heatmap dashboard configuration properties
Name Type Description Default

org.opennms.heatmap.defaultMode

String

There exist two options for using the heatmap: alarms and outages. This option configures which are displayed per default.

alarms

org.opennms.heatmap.defaultHeatmap

String

This option defines which Heatmap is displayed by default. Valid options are categories, foreignSources and monitoredServices.

categories

org.opennms.heatmap.categoryFilter

String

The following option is used to filter for categories to be displayed in the Heatmap. This option uses the Java regular expression syntax. The default is .* so all categories will be displayed.

.*

org.opennms.heatmap.foreignSourceFilter

String

The following option is used to filter for foreign sources to be displayed in the Heatmap. This option uses the Java regular expression syntax. The default is .* so all foreign sources will be displayed.

.*

org.opennms.heatmap.serviceFilter

String

The following option is used to filter for services to be displayed in the Heatmap. This option uses the Java regular expression syntax. The default is .* so all services will be displayed.

.*

org.opennms.heatmap.onlyUnacknowledged

Boolean

This option configures whether only unacknowledged alarms will be taken into account when generating the alarm-based version of the Heatmap.

false

org.opennms.web.console.centerUrl

String

You can also place the Heatmap on the landing page by setting this option to /heatmap/heatmap-box.jsp.

/surveillance-box.jsp

You can use negative lookahead expressions for excluding categories you wish not to be displayed in the heatmap, e.g. by using an expression like ^(?!XY).* you can filter out entities with names starting with XY.

3.7. Trend

The Trend feature allows to display small inline charts of database-based statistics. These chart are accessible in the Status menu of the OpenNMS' web application. Furthermore it is also possible to configure these charts to be displayed on the OpenNMS' landing page. To achieve this alter the org.opennms.web.console.centerUrl property to also include the entry /trend/trend-box.htm.

Trend chart structure

trend chart

These charts can be configured and defined in the trend-configuration.xml file in your OpenNMS' etc directory. The following sample defines a Trend chart for displaying nodes with ongoing outages.

Sample Trend chart XML definition for displaying nodes with outages
   <trend-definition name="nodes">
        <title>Nodes</title> (1)
        <subtitle>w/ Outages</subtitle> (2)
        <visible>true</visible> (3)
        <icon>glyphicon-fire</icon> (4)
        <trend-attributes> (5)
            <trend-attribute key="sparkWidth" value="100%"/>
            <trend-attribute key="sparkHeight" value="35"/>
            <trend-attribute key="sparkChartRangeMin" value="0"/>
            <trend-attribute key="sparkLineColor" value="white"/>
            <trend-attribute key="sparkLineWidth" value="1.5"/>
            <trend-attribute key="sparkFillColor" value="#88BB55"/>
            <trend-attribute key="sparkSpotColor" value="white"/>
            <trend-attribute key="sparkMinSpotColor" value="white"/>
            <trend-attribute key="sparkMaxSpotColor" value="white"/>
            <trend-attribute key="sparkSpotRadius" value="3"/>
            <trend-attribute key="sparkHighlightSpotColor" value="white"/>
            <trend-attribute key="sparkHighlightLineColor" value="white"/>
        </trend-attributes>
        <descriptionLink>outage/list.htm?outtype=current</descriptionLink> (6)
        <description>${intValue[23]} NODES WITH OUTAGE(S)</description> (7)
        <query> (8)
            <![CDATA[
                select (
                    select
                        count(distinct nodeid)
                    from
                        outages o, events e
                    where
                        e.eventid = o.svclosteventid
                        and iflostservice < E
                        and (ifregainedservice is null
                            or ifregainedservice > E)
                ) from (
                    select
                        now() - interval '1 hour' * (O + 1) AS S,
                        now() - interval '1 hour' * O as E
                    from
                        generate_series(0, 23) as O
                ) I order by S;
            ]]>
        </query>
    </trend-definition>
1 title of the Trend chart, see below for supported variable substitutions
2 subtitle of the Trend chart, see below for supported variable substitutions
3 defines whether the chart is visible by default
4 icon for the chart, see Glyphicons for viable options
5 options for inline chart, see jQuery Sparklines for viable options
6 the description link
7 the description text, see below for supported variable substitutions
8 the SQL statement for querying the chart’s values
Don’t forget to limit the SQL query’s return values!

It is possible to use values or aggregated values in the title, subtitle and description fields. The following table describes the available variable substitutions.

Table 7. Variables usable in definition’s title, subtitle and description fields
Name Type Description

${intMax}

Integer

integer maximum value

${doubleMax}

Double

maximum value

${intMin}

Integer

integer minimum value

${doubleMin}

Double

minimum value

${intAvg}

Integer

integer average value

${doubleAvg}

Double

average value

${intSum}

Integer

integer sum of values

${doubleSum}

Double

sum of value

${intValue[]}

Integer

array of integer result values for the given SQL query

${doubleValue[]}

Double

array of result values for the given SQL query

${intValueChange[]}

Integer

array of integer value changes for the given SQL query

${doubleValueChange[]}

Double

array of value changes for the given SQL query

${intLastValue}

Integer

last integer value

${doubleLastValue}

Double

last value

${intLastValueChange}

Integer

last integer value change

${doubleLastValueChange}

Double

last value change

You can also display a single graph in your JSP files by including the file /trend/single-trend-box.jsp and specifying the name parameter.

Sample JSP snippet to include a single Trend chart with name 'example'
<jsp:include page="/trend/single-trend-box.jsp" flush="false">
    <jsp:param name="name" value="example"/>
</jsp:include>

4. Service Assurance

This section will cover the basic functionalities how OpenNMS Horizon tests if a service or device available and measure his latency.

In OpenNMS Horizon this task is provided by a Service Monitor framework. The main component is Pollerd which provides the following functionalities:

  • Track the status of a management resource or an application for availability calculations

  • Measure response times for service quality

  • Correlation of node and interface outages based on a Critical Service

The following image shows the model and representation of availability and response time.

01 node model
Figure 21. Representation of latency measurement and availability

This information is based on Service Monitors which are scheduled and executed by Pollerd. A Service can have any arbitrary name and is associated with a Service Monitor. For example, we can define two Services with the name HTTP and HTTP-8080, both are associated with the HTTP Service Monitor but use a different TCP port configuration parameter. The following figure shows how Pollerd interacts with other components in OpenNMS and applications or agents to be monitored.

The availability is calculated over the last 24 hours and is shown in the Surveillance Views, SLA Categories and the Node Detail Page. Response times are displayed as Resource Graphs of the IP Interface on the Node Detail Page. Configuration parameters of the Service Monitor can be seen in the Service Page by clicking on the Service Name on the Node Detail Page. The status of a Service can be Up or Down.

When a Service Monitor detects an outage, Pollerd sends an Event which is used to create an Alarm. Events can also be used to generate Notifications for on-call network or server administrators. The following images shows the interaction of Pollerd in OpenNMS Horizon.

02 service assurance
Figure 22. Service assurance with Pollerd in OpenNMS platform

Pollerd can generate the following Events in OpenNMS Horizon:

Event name Description

uei.opennms.org/nodes/nodeLostService

Critical Services are still up, just this service is lost.

uei.opennms.org/nodes/nodeRegainedService

Service came back up

uei.opennms.org/nodes/interfaceDown

Critical Service on an IP interface is down or all services are down.

uei.opennms.org/nodes/interfaceUp

Critical Service on that interface came back up again

uei.opennms.org/nodes/nodeDown

All critical services on all IP interfaces are down from node. The whole host is unreachable over the network.

uei.opennms.org/nodes/nodeUp

Some of the Critical Services came back online.

The behavior to generate interfaceDown and nodeDown events is described in the Critical Service section.

This assumes that node-outage processing is enabled.

4.1. Pollerd Configuration

Table 8. Configuration and log files related to Pollerd.
File Description

$OPENNMS_HOME/etc/poller-configuration.xml

Configuration file for monitors and global daemon configuration

$OPENNMS_HOME/logs/poller.log

Log file for all monitors and the global Pollerd

$OPENNMS_HOME/etc/response-graph.properties

RRD graph definitions for service response time measurements

$OPENNMS_HOME/etc/events/opennms.events.xml

Event definitions for Pollerd, i.e. nodeLostService, interfaceDown or nodeDown

To change the behavior for service monitoring, the poller-configuration.xml can be modified. The configuration file is structured in the following parts:

  • Global daemon config: Define the size of the used Thread Pool to run Service Monitors in parallel. Define and configure the Critical Service for Node Event Correlation.

  • Polling packages: Package to allow grouping of configuration parameters for Service Monitors.

  • Downtime Model: Configure the behavior of Pollerd to run tests in case of an Outage is detected.

  • Monitor service association: Based on the name of the service, the implementation for application or network management protocols are assigned.

Global configuration parameters for Pollerd
<poller-configuration threads="30" (1)
                      pathOutageEnabled="false" (2)
                      serviceUnresponsiveEnabled="false"> (3)
1 Size of the Thread Pool to run Service Monitors in parallel
2 Enable or Disable Path Outage functionality based on a Critical Node in a network path
3 In case of unresponsive service services a serviceUnresponsive event is generated and not an outage. It prevents to apply the Downtime Model to retest the service after 30 seconds and prevents false alarms.

Configuration changes are applied by restarting OpenNMS and Pollerd. It is also possible to send an Event to Pollerd reloading the configuration. An Event can be sent on the CLI or the Web User Interface.

Send configuration reload event on CLI
cd $OPENNMS_HOME/bin
./send-event.pl uei.opennms.org/internal/reloadDaemonConfig --parm 'daemonName Pollerd'
04 send event WebUI
Figure 23. Send configuration reload event with the Web User Interface
If you define new services in poller-configuration.xml a service restart of OpenNMS is necessary.

4.2. Critical Service

Monitoring services on an IP network can be resource expensive, especially in cases where many of these services are not available. When a service is offline, or unreachable, the monitoring system spends most of it’s time waiting for retries and timeouts.

In order to improve efficiency, {opennms-product-title} deems all services on a interface to be Down if the critical service is Down. By default {opennms-product-title} uses ICMP as the critical service.

The following image shows, how a Critical Services is used to generate these events.

03 node outage correlation
Figure 24. Service assurance with Pollerd in {opennms-product-title} platform
  • (1) Critical services are all Up on the Node and just a nodeLostService is sent.

  • (2) Critical service of one of many IP interface is Down and interfaceDown is sent. All other services are not tested and no events are sent, the services are assumed as unreachable.

  • (3) All Critical services on the Node are Down and just a nodeDown is sent. All other services on the other IP Interfaces are not tested and no events are sent, these services are assumed as unreachable.

The Critical Service is used to correlate outages from Services to a nodeDown or interfaceDown event. It is a global configuration of Pollerd defined in poller-configuration.xml. The {opennms-product-title} default configuration enables this behavior.

Critical Service Configuration in Pollerd
<poller-configuration threads="30"
                      pathOutageEnabled="false"
                      serviceUnresponsiveEnabled="false">

  <node-outage status="on" (1)
               pollAllIfNoCriticalServiceDefined="true"> (2)
    <critical-service name="ICMP" /> (3)
  </node-outage>
1 Enable Node Outage correlation based on a Critical Service
2 Optional: In case of nodes without a Critical Service this option controls the behavior. If set to true then all services will be polled. If set to false then the first service in the package that exists on the node will be polled until service is restored, and then polling will resume for all services.
3 Define Critical Service for Node Outage correlation

4.3. Downtime Model

By default the monitoring interval for a service is 5 minutes. To detect also short services outages, caused for example by automatic network rerouting, the downtime model can be used. On a detected service outage, the interval is reduced to 30 seconds for 5 minutes. If the service comes back within 5 minutes, a shorter outage is documented and the impact on service availability can be less than 5 minutes. This behavior is called Downtime Model and is configurable.

01 downtime model
Figure 25. Downtime model with resolved and ongoing outage

In figure Outages and Downtime Model there are two outages. The first outage shows a short outage which was detected as up after 90 seconds. The second outage is not resolved now and the monitor has not detected an available service and was not available in the first 5 minutes (10 times 30 second polling). The scheduler changed the polling interval back to 5 minutes.

Example default configuration of the Downtime Model
<downtime interval="30000" begin="0" end="300000" /><!-- 30s, 0, 5m -->(1)
<downtime interval="300000" begin="300000" end="43200000" /><!-- 5m, 5m, 12h -->(2)
<downtime interval="600000" begin="43200000" end="432000000" /><!-- 10m, 12h, 5d -->(3)
<downtime interval="3600000" begin="432000000" /><!-- 1h, 5d -->(4)
1 from 0 seconds after an outage is detected until 5 minutes the polling interval will be set to 30 seconds
2 after 5 minutes of an ongoing outage until 12 hours the polling interval will be set to 5 minutes
3 after 12 hours of an ongoing outage until 5 days the polling interval will be set to 10 minutes
4 after 5 days of an ongoing outage the service will polled only once a hour

4.4. Path Outages

An outage of a central network component can cause a lot of node outages. Path Outages can be used to suppress Notifications based on how Nodes depend on each other in the network which are defined in a Critical Path. The Critical Path needs to be configured from the network perspective of the monitoring system. By default the Path Outage feature is disabled and has to be enabled in the poller-configuration.xml.

The following image shows an example network topology.

02 path outage
Figure 26. Path Outage example

From the perspective of the monitoring system, a Router named default-gw-01 is on the Critical Path to reach two networks. If Router default-gw-01 is down, it is not possible to reach any node in the two networks behind and they will be all unreachable as well. In this case an administrator would like to have just one notification for default-gw-01 and not for all the other Nodes behind. Building this configuration in {opennms-product-title} requires the following information:

  • Parent Foreign Source: The Foreign Source where the parent node is defined.

  • Parent Foreign ID: The Foreign ID of the parent Node where this node depends on.

  • The IP Interface selected as Primary is used as Critical IP

In this example we have created all Nodes in a Provisioning Requisition named Network-ACME and we use as the Foreign ID the same as the Node Label.

In the Web UI go to Admin → Configure OpenNMS → Manage Provisioning Requisitions → Edit the Requisition → Edit the Node → Path Outage to configure the network path by setting the Parent Foreign Source, Parent Foreign ID and Provisioned Node.

Table 9. Provisioning for Topology Example
Parent Foreign Source Parent Foreign ID Provisioned Node

not defined

not defined

default-gw-01

Network-ACME

default-gw-01

node-01

Network-ACME

default-gw-01

node-02

Network-ACME

default-gw-01

default-gw02

Network-ACME

default-gw-02

node-03

Network-ACME

default-gw-02

node-04

The IP Interface which is set to Primary is selected as the Critical IP. In this example it is important the IP interface on default-gw-01 in the network 192.168.1.0/24 is set as Primary interface. The IP interface in the network 172.23.42.0/24 on default-gw-02 is set as Primary interface.

4.5. Poller Packages

To define more complex monitoring configuration it is possible to group Service configurations into Polling Packages. They allow to assign to Nodes different Service Configurations. To assign a Polling Package to nodes the Rules/Filters syntax can be used. Each Polling Package can have its own Downtime Model configuration.

Multiple packages can be configured, and an interface can exist in more than one package. This gives great flexibility to how the service levels will be determined for a given device.

Polling package assigned to Nodes with Rules and Filters
<package name="example1">(1)
  <filter>IPADDR != '0.0.0.0'</filter>(2)
  <include-range begin="1.1.1.1" end="254.254.254.254" />(3)
  <include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" />(3)
1 Unique name of the polling package.
2 Filter can be based on IP address, categories or asset attributes of Nodes based on Rules/Filters. The filter is evaluated first and is required. This package is used for all IP Interfaces which don’t have 0.0.0.0 as an assigned IP address and is required.
3 Allow to specify if the configuration of Services is applied on a range of IP Interfaces (IPv4 or IPv6).

Instead of the include-range it is possible to add one or more specific IP-Interfaces with:

Defining a specific IP Interfaces
<specific>192.168.1.59</specific>

It is also possible to exclude IP Interfaces with:

Exclude IP Interfaces
<exclude-range begin="192.168.0.100" end="192.168.0.104"/>

4.5.1. Response Time Configuration

The definition of Polling Packages allows to configure similar services with different polling intervals. All the response time measurements are persisted in RRD Files and require a definition. Each Polling Package contains a RRD definition

RRD configuration for Polling Package example1
<package name="example1">
  <filter>IPADDR != '0.0.0.0'</filter>
  <include-range begin="1.1.1.1" end="254.254.254.254" />
  <include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" />
  <rrd step="300">(1)
    <rra>RRA:AVERAGE:0.5:1:2016</rra>(2)
    <rra>RRA:AVERAGE:0.5:12:1488</rra>(3)
    <rra>RRA:AVERAGE:0.5:288:366</rra>(4)
    <rra>RRA:MAX:0.5:288:366</rra>(5)
    <rra>RRA:MIN:0.5:288:366</rra>(6)
</rrd>
1 Polling interval for all services in this Polling Package is reflected in the step of size 300 seconds. All services in this package have to polled in 5 min interval, otherwise response time measurements are not correct persisted.
2 1 step size is persisted 2016 times: 1 * 5 min * 2016 = 7 d, 5 min accuracy for 7 d.
3 12 steps average persisted 1488 times: 12 * 5 min * 1488 = 62 d, aggregated to 60 min for 62 d.
4 288 steps average persisted 366 times: 288 * 5 min * 366 = 366 d, aggregated to 24 h for 366 d.
5 288 steps maximum from 24 h persisted for 366 d.
6 288 steps minimum from 24 h persisted for 366 d.
The RRD configuration and the service polling interval has to be aligned. In other cases the persisted response time data is not correct displayed in the response time graph.
If the polling interval is changed afterwards, existing RRD files needs to be recreated with the new definitions.

4.5.2. Overlapping Services

With the possibility of specifying multiple Polling Packages it is possible to use the same Service like ICMP multiple times. The order how Polling Packages in the poller-configuration.xml are defined is important when IP Interfaces match multiple Polling Packages with the same Service configuration.

The following example shows which configuration is applied for a specific service:

Overwriting
<package name="less-specific">
  <filter>IPADDR != '0.0.0.0'</filter>
  <include-range begin="1.1.1.1" end="254.254.254.254" />
  <include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff" />
  <rrd step="300">(1)
    <rra>RRA:AVERAGE:0.5:1:2016</rra>
    <rra>RRA:AVERAGE:0.5:12:1488</rra>
    <rra>RRA:AVERAGE:0.5:288:366</rra>
    <rra>RRA:MAX:0.5:288:366</rra>
    <rra>RRA:MIN:0.5:288:366</rra>
  </rrd>
  <service name="ICMP" interval="300000" user-defined="false" status="on">(2)
    <parameter key="retry" value="5" />(3)
    <parameter key="timeout" value="10000" />(4)
    <parameter key="rrd-repository" value="/var/lib/opennms/rrd/response" />
    <parameter key="rrd-base-name" value="icmp" />
    <parameter key="ds-name" value="icmp" />
  </service>
  <downtime interval="30000" begin="0" end="300000" />
  <downtime interval="300000" begin="300000" end="43200000" />
  <downtime interval="600000" begin="43200000" end="432000000" />
</package>

<package name="more-specific">
  <filter>IPADDR != '0.0.0.0'</filter>
  <include-range begin="192.168.1.1" end="192.168.1.254" />
  <include-range begin="2600::1" end="2600:::ffff" />
  <rrd step="30">(1)
    <rra>RRA:AVERAGE:0.5:1:20160</rra>
    <rra>RRA:AVERAGE:0.5:12:14880</rra>
    <rra>RRA:AVERAGE:0.5:288:3660</rra>
    <rra>RRA:MAX:0.5:288:3660</rra>
    <rra>RRA:MIN:0.5:288:3660</rra>
  </rrd>
  <service name="ICMP" interval="30000" user-defined="false" status="on">(2)
    <parameter key="retry" value="2" />(3)
    <parameter key="timeout" value="3000" />(4)
    <parameter key="rrd-repository" value="/var/lib/opennms/rrd/response" />
    <parameter key="rrd-base-name" value="icmp" />
    <parameter key="ds-name" value="icmp" />
  </service>
  <downtime interval="10000" begin="0" end="300000" />
  <downtime interval="300000" begin="300000" end="43200000" />
  <downtime interval="600000" begin="43200000" end="432000000" />
</package>
1 Polling interval in the packages are 300 seconds and 30 seconds
2 Different polling interval for the service ICMP
3 Different retry settings for the service ICMP
4 Different timeout settings for the service ICMP

The last Polling Package on the service will be applied. This can be used to define a less specific catch all filter for a default configuration. A more specific Polling Package can be used to overwrite the default setting. In the example above all IP Interfaces in 192.168.1/24 or 2600:/64 will be monitored with ICMP with different polling, retry and timeout settings.

Which Polling Packages are applied to the IP Interface and Service can be found in the Web User Interface. The IP Interface and Service page show which Polling Package and Service configuration is applied for this specific service.

03 polling package
Figure 27. Polling Package applied to IP interface and Service

4.5.3. Test Services on manually

For troubleshooting it is possible to run a test via the Karaf Shell:

ssh -p 8101 admin@localhost

Once in the shell, you can print show the commands help as follows:

opennms> poller:test --help
DESCRIPTION
        poller:test

        Execute a poller test from the command line using current settings from poller-configuration.xml

SYNTAX
        poller:test [options]

OPTIONS
        -s, --service
                Service name
        -p, --param
                Service parameter ~ key=value
        -i, --ipaddress
                IP Address to test
        -P, --package
                Poller Package
        -c, --class
                Monitor Class
        --help
                Display this help message

The following example runs the ICMP monitor on a specific IP Interface.

Run ICMP monitor configuration defined in specific Polling Package
opennms> poller:test -i 10.23.42.1 -s ICMP -P example1

The output is verbose which allows debugging of Monitor configurations. Important output lines are shown as the following:

Important output testing a service on the CLI
Checking service ICMP on IP 10.23.42.1 (1)
Package: example1 (2)
Monitor: org.opennms.netmgt.poller.monitors.IcmpMonitor (3)
Parameter ds-name : icmp (4)
Parameter rrd-base-name : icmp (4)
Parameter rrd-repository : /var/lib/opennms/rrd/response (4)
Parameter retry : 2 (5)
Parameter timeout : 3000 (5)

Available ? true (status Up[1])
1 Service and IP Interface to run the test
2 Applied Service configuration from Polling Package for this test
3 Service Monitor used for this test
4 RRD configuration for response time measurement
5 Retry and timeout settings for this test

4.5.4. Test filters on Karaf Shell

Filters are ubiquitous in opennms configurations with <filter> syntax. This karaf shell can be used to verify filters. For more info, refer to Filters.

ssh -p 8101 admin@localhost

Once in the shell, print command help as follows

opennms> filters:filter --help
DESCRIPTION
        filters:filter
	Enumerates nodes/interfaces that match a give filter
SYNTAX
        filters:filter filterRule
ARGUMENTS
        filterRule
                A filter Rule

For ex: Run a filter rule that match a location

filters:filter  "location='MINION'"

Output is displayed as follows

nodeId=2 nodeLabel=00000000-0000-0000-0000-000000ddba11 location=MINION
	IpAddresses:
		127.0.0.1

Another ex: Run a filter that match a node location and for a given IP Address range. Refer to IPLIKE for more info on using IPLIKE syntax.

filters:filter "location='Default' & (IPADDR IPLIKE 172.*.*.*)"

Output is displayed as follows

nodeId=3 nodeLabel=label1 location=Default
	IpAddresses:
		172.10.154.1
		172.20.12.12
		172.20.2.14
		172.01.134.1
		172.20.11.15
		172.40.12.18

nodeId=5 nodeLabel=label2 location=Default
	IpAddresses:
		172.17.0.111

nodeId=6 nodeLabel=label3 location=Default
	IpAddresses:
		172.20.12.22
		172.17.0.123
Node info displayed will have nodeId, nodeLabel, location and optional fileds like foreignId, foreignSource, categories when they exist.

5. Performance Management

In OpenNMS Horizon collection of performance data is done by the Collectd daemon. Management Agents and protocols to access performance data is implemented in Collectors. These Collectors are scheduled and run in parallel in a global defined Thread Pool in Collectd.

This section describes how to configure Collectd for performance data collection with all available Collectors coming with OpenNMS Horizon.

5.1. Collectd Configuration

Table 10. Configuration and log files related to Collectd
File Description

$OPENNMS_HOME/etc/collectd-configuration.xml

Configuration file for global Collectd daemon and Collectors configuration

$OPENNMS_HOME/logs/collectd.log

Log file for all Collectors and the global Collectd daemon

$OPENNMS_HOME/etc/snmp-graph.properties

RRD graph definitions to render performance data measurements in the Web UI

$OPENNMS_HOME/etc/snmp-graph.properties.d

Directory with RRD graph definitions for devices and applications to render performance data measurements in the Web UI

$OPENNMS_HOME/etc/events/opennms.events.xml

Event definitions for Collectd, i.e. dataCollectionSucceeded, and dataCollectionFailed

$OPENNMS_HOME/etc/resource-types.d

Directory to store generic resource type definitions.

To change the behavior for performance data collection, the collectd-configuration.xml file can be modified. The configuration file is structured in the following parts:

  • Global daemon config: Define the size of the used Thread Pool to run Collectors in parallel.

  • Collection packages: Packages to allow the grouping of configuration parameters for Collectors.

  • Collection service association: Based on the name of the collection service, the implementation for application or network management protocols are assigned.

01 collectd overview
Figure 28. Collectd overview for associated files and configuration

The global behavior, especially the size of the Thread Pool for Collectd, is configured in the collectd-configuration.xml.

Global configuration parameters for Collectd
<collectd-configuration
        threads="50"> (1)
1 Size of the Thread Pool to run Collectors in parallel

5.1.1. Resource Types

Resource Types

Resource Types are used to group sets of performance data measurements for persisting, indexing, and display in the Web UI. Each resource type has a unique name, label definitions for display in the Web UI, and strategy definitions for archiving the measurements for long term analysis.

There are two labels for a resource type. The first, label, defines a string to display in the Web UI. The second, resourceLabel, defines the template used when displaying each unique group of measurements name for the resource type.

There are two types of strategy definitions for resource types, persistence selector and storage strategies. The persistence selector strategy filters the group indexes down to a subset for storage on disk. The storage strategy is used to convert an index into a resource path label for persistence. There are two special resource types that do not have a resource-type definition. They are node and ifIndex.

Resource Types can be defined inside files in either $OPENNMS_HOME/etc/resource-types.d or $OPENNMS_HOME/etc/datacollection, with the latter being specific for SNMP.

Here is the diskIOIndex resource type definition from $OPENNMS_HOME/etc/datacollection/netsnmp.xml:

<resourceType name="diskIOIndex" label="Disk IO (UCD-SNMP MIB)" resourceLabel="${diskIODevice} (index ${index})">
  <persistenceSelectorStrategy class="org.opennms.netmgt.collectd.PersistRegexSelectorStrategy">
    <parameter key="match-expression" value="not(#diskIODevice matches '^(loop|ram).*')" />
  </persistenceSelectorStrategy>
  <storageStrategy class="org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy">
    <parameter key="sibling-column-name" value="diskIODevice" />
    <parameter key="replace-all" value="s/^-//" />
    <parameter key="replace-all" value="s/\s//" />
    <parameter key="replace-all" value="s/:\\.*//" />
  </storageStrategy>
</resourceType>
Persistence Selector Strategies
Table 11. Persistence Selector Strategies
Class Description

org.opennms.netmgt.collectd.PersistAllSelectorStrategy

Persist All indexes

org.opennms.netmgt.collectd.PersistRegexSelectorStrategy

Persist indexes based on JEXL evaluation

PersistRegexSelectorStrategy

The PersistRegexSelectorStrategy class takes a single parameter, match-expression, which defines a JEXL expressions. On evaulation, this expression should return either true, persist index to storage, or false, discard data.

Storage Strategies
Table 12. Storage Strategies
Class Storage Path Value

org.opennms.netmgt.collection.support.IndexStorageStrategy

Index

org.opennms.netmgt.collection.support.JexlIndexStorageStrategy

Value after JexlExpression evaluation

org.opennms.netmgt.collection.support.ObjectNameStorageStrategy

Value after JexlExpression evaluation

org.opennms.netmgt.dao.support.FrameRelayStorageStrategy

interface label + '.' + dlci

org.opennms.netmgt.dao.support.HostFileSystemStorageStrategy

Uses the value from the hrStorageDescr column in the hrStorageTable, cleaned up for unix filesystems.

org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy

Uses the value from an SNMP lookup of OID in sibling-column-name parameter, cleaned up for unix filesystems.

org.opennms.protocols.xml.collector.XmlStorageStrategy

Index, but cleaned up for unix filesystems.

IndexStorageStrategy

The IndexStorageStrategy takes no parameters.

JexlIndexStorageStrategy

The JexlIndexStorageStrategy takes two parameters, index-format which is required, and clean-output which is optional.

Parameter Description

index-format

The JexlExpression to evaluate

clean-output

Boolean to indicate whether the index value is cleaned up.

If the index value will be cleaned up, then it will have all whitespace, colons, forward and back slashes, and vertical bars replaced with underscores. All equal signs are removed.

This class can be extended to create custom storage strategies by overriding the updateContext method to set additional key/value pairs to use in your index-format template.

public class ExampleStorageStrategy extends JexlIndexStorageStrategy {

    private static final Logger LOG = LoggerFactory.getLogger(ExampleStorageStrategy.class);
    public ExampleStorageStrategy() {
        super();
    }

    @Override
    public void updateContext(JexlContext context, CollectionResource resource) {
        context.set("Example", resource.getInstance());
    }
}
ObjectNameStorageStrategy

The ObjectNameStorageStrategy extends the JexlIndexStorageStrategy, so its requirements are the same. Extra key/values pairs are added to the JexlContext which can then be used in the index-format template. The original index string is converted to an ObjectName and can be referenced as ${ObjectName}. The domain from the ObjectName can be referenced as ${domain}. All key properties from the ObjectName can also be referenced by ${key}.

This storage strategy is meant to be used with JMX MBean datacollections where multiple MBeans can return the same set of attributes. As of OpenNMS Horizon 20, this is only supported using a HTTP to JMX proxy and using the XmlCollector as the JmxCollector does not yet support indexed groups.

Given an MBean like java.lang:type=MemoryPool,name=Survivor Space, and a storage strategy like this:

<storageStrategy class="org.opennms.netmgt.collection.support.ObjectNameStorageStragegy">
  <parameter key="index-format" value="${domain}_${type}_${name}" />
  <parameter key="clean-output" value="true" />
</storageStrategy>

Then the index value would be java_lang_MemoryPool_Survivor_Space.

FrameRelayStorageStrategy

The FrameRelayStorageStrategy takes no parameters.

HostFileSystemStorageStrategy

The HostFileSystemStorageStrategy takes no parameters. This class is marked as deprecated, and can be replaced with:

<storageStrategy class="org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy">
  <parameter key="sibling-column-name" value="hrStorageDescr" />
  <parameter key="replace-first" value="s/^-$/_root_fs/" />
  <parameter key="replace-all" value="s/^-//" />
  <parameter key="replace-all" value="s/\\s//" />
  <parameter key="replace-all" value="s/:\\\\.*//" />
</storageStrategy>
SiblingColumnStorageStrategy
Parameter Description

sibling-column-name

Alternate string value to use for index

replace-first

Regex Pattern, replaces only the first match

replace-all

Regex Pattern, replaces all matches

Values for replace-first, and replace-all must match the pattern s/regex/replacement/ or an error will be thrown.

XmlStorageStrategy

This XmlStorageStrategy takes no parameters. The index value will have all whitespace, colons, forward and back slashes, and vertical bars replaced with underscores. All equal signs are removed.

5.2. Collection Packages

To define more complex collection configuration it is possible to group Service configurations which provide performance metrics into Collection Packages. They allow to assign to Nodes different Service Configurations to differentiate collection of performance metrics and connection settings. To assign a Collection Package to nodes the Rules/Filters syntax can be used.

Multiple packages can be configured, and an interface can exist in more than one package. This gives great flexibility how the service levels will be determined for a given device. The order how Collection Packages are defined is important when IP Interfaces match multiple Collection Packages with the same Service configuration. The last Collection Package on the service will be applied. This can be used to define a less specific catch all filter for a default configuration. A more specific Collection Package can be used to overwrite the default setting.

Collection Package Attributes
<package name="package1">(1)
  <filter>IPADDR != '0.0.0.0'</filter>(2)
  <include-range begin="1.1.1.1" end="254.254.254.254"/>(3)
  <include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"/>(4)
1 Unique name of the collection package.
2 Apply this package to all IP interfaces with a configured IPv4 address (not equal 0.0.0.0)
3 Evaluate IPv4 rule to collect for all IPv4 interfaces in the given range
4 Evaluate IPv6 rule to collect for all IPv6 interfaces in the given range

5.2.1. Service Configurations

Service Configurations define what Collector to use and which performance metrics needs to be collected. Service Configurations contains common Service Attributes as well as Collector specific parameters.

Service Configuration Attributes
<service name="SNMP"(1)
         interval="300000"(2)
         user-defined="false"(3)
         status="on">(4)
  <parameter key="collection" value="default"/>(5)
  <parameter key="thresholding-enabled" value="true"/>(6)
</service>

<collector service="SNMP" class-name="org.opennms.netmgt.collectd.SnmpCollector"/>(7)
1 Service Configuration name which is mapped to a specific Collector implementation.
2 The interval at which the service is to be collected. (in milliseconds).
3 Marker to say if service is user defined, used specifically for UI purposes.
4 Service is collected only if on.
5 Assign performance data collection metric groups named default.
6 Enable threshold evaluation for metrics provided by this service.
7 Run the SnmpCollector implementation for the service named SNMP
02 collectd configuration xml
Figure 29. Configuration overview for data collection with Collectd

6. Events

Events are central to the operation of the OpenNMS Horizon platform, so it’s critical to have a firm grasp of this topic.

Whenever something in OpenNMS Horizon appears to work by magic, it’s probably events working behind the curtain.

6.1. Anatomy of an Event

Events are structured historical records of things that happen in OpenNMS Horizon and the nodes, interfaces, and services it manages. Every event has a number of fixed fields and zero or more parameters.

Mandatory Fields
UEI (Universal Event Identifier)

A string uniquely identifying the event’s type. UEIs are typically formatted in the style of a URI, but the only requirement is that they start with the string uei..

Event Label

A short, static label summarizing the gist of all instances of this event.

Description

A long-form description describing all instances of this event.

Log Message

A long-form log message describing this event, optionally including expansions of fields and parameters so that the value is tailored to the event at hand.

Severity

A severity for this event type. Possible values range from Cleared to Critical.

Event ID

A numeric identifier used to look up a specific event in the OpenNMS Horizon system.

Notable Optional Fields
Operator Instruction

A set of instructions for an operator to respond appropriately to an event of this type.

Alarm Data

If this field is provided for an event, OpenNMS Horizon will create, update, or clear alarms for events of that type according to the alarm-data specifics.

6.2. Sources of Events

Events may originate within OpenNMS Horizon itself or from outside.

Internally-generated events can be the result of the platform’s monitoring and management functions (e.g. a monitored node becoming totally unavailable results in an event with the UEI uei.opennms.org/nodes/nodeDown) or they may act as inputs or outputs of housekeeping processes.

The following subsections summarize the mechanisms by which externally-created events can arrive.

6.3. The Event Bus

At the heart of OpenNMS Horizon lies an event bus. Any OpenNMS Horizon component can publish events to the bus, and any component can subscribe to receive events of interest that have been published on the bus. This publish-subscribe model enables components to use events as a mechanism to send messages to each other. For example, the provisioning subsystem of OpenNMS Horizon publishes a node-added event whenever a new node is added to the system. Other subsystems with an interest in new nodes subscribe to the node-added event and automatically receive these events, so they know to start monitoring and managing the new node if their configuration dictates. The publisher and subscriber components do not need to have any knowledge of each other, allowing for a clean division of labor and lessening the programming burden to add entirely new OpenNMS Horizon subsystems or modify the behavior of existing ones.

6.3.1. Associate an Event to a given node

There are 2 ways to associate an existing node to a given event prior sending it to the Event Bus:

  • Set the nodeId of the node in question to the event.

  • For requisitioned nodes, set the _foreignSource and _foreignId as parameters to the event. Then, any incoming event without a nodeId and these 2 parameters will trigger a lookup on the DB; if a node is found, the nodeId attribute will be dynamically set into the event, regardless which method has been used to send it to the Event Bus. :imagesdir: ../../images

6.4. Event Configuration

The back-end configuration surrounding events is broken into two areas: the configuration of Eventd itself, and the configuration of all types of events known to OpenNMS Horizon.

6.4.1. The eventd-configuration.xml file

The overall behavior of Eventd is configured in the file OPENNMS_HOME/etc/eventd-configuration.xml. This file does not need to be changed in most installations. The configurable items include:

TCPAddress

The IP address to which the Eventd XML/TCP listener will bind. Defaults to 127.0.0.1.

TCPPort

The TCP port number on TCPAddress to which the Eventd XML/TCP listener will bind. Defaults to 5817.

UDPAddress

The IP address to which the Eventd XML/UDP listener will bind. Defaults to 127.0.0.1.

UDPPort

The UDP port number on TCPAddress to which the Eventd XML/UDP listener will bind. Defaults to 5817.

receivers

The number of threads allocated to service the event intake work done by Eventd.

queueLength

The maximum number of events that may be queued for processing. Additional events will be dropped. Defaults to unlimited.

getNextEventID

An SQL query statement used to retrieve the ID of the next new event. Changing this setting is not recommended.

socketSoTimeoutRequired

Whether to set a timeout value on the Eventd receiver socket.

socketSoTimeoutPeriod

The socket timeout, in milliseconds, to set if socketSoTimeoutRequired is set to yes.

logEventSummaries

Whether to log a simple (terse) summary of every event at level INFO. Useful when troubleshooting event processing on busy systems where DEBUG logging is not practical.

6.4.2. The eventconf.xml file and its tributaries

The set of known events is configured in OPENNMS_HOME/etc/eventconf.xml. This file opens with a <global> element, whose <security> child element defines which event fields may not be overridden in the body of an event submitted via any Eventd listener. This mechanism stops a mailicious actor from, for instance, sending an event whose operator-action field amounts to a phishing attack.

After the <global> element, this file consists of a series of <event-file> elements. The content of each <event-file> element specifies the path of a tributary file whose contents will be read and incorporated into the event configuration. These paths are resolved relative to the OPENNMS_HOME/etc directory; absolute paths are not allowed.

Each tributary file contains a top-level <events> element with one or more <event> child elements. Consider the following event definition:

   <event>
      <uei>uei.opennms.org/nodes/nodeLostService</uei>
      <event-label>OpenNMS-defined node event: nodeLostService</event-label>
      <descr>&lt;p>A %service% outage was identified on interface
            %interface% because of the following condition: %parm[eventReason]%.&lt;/p> &lt;p>
            A new Outage record has been created and service level
            availability calculations will be impacted until this outage is
            resolved.&lt;/p></descr>
      <logmsg dest="logndisplay">
            %service% outage identified on interface %interface%.
        </logmsg>
      <severity>Minor</severity>
      <alarm-data reduction-key="%uei%:%dpname%:%nodeid%:%interface%:%service%" alarm-type="1" auto-clean="false"/>
   </event>

Every event definition has this same basic structure. See Anatomy of an Event for a discussion of the structural elements.

A word about severities

When setting severities of events, it’s important to consider each event in the context of your infrastructure as a whole. Events whose severity is critical at the zoomed-in level of a single device may not merit a Critical severity in the zoomed-out view of your entire enterprise. Since an event with Critical severity can never have its alarms escalated, this severity level should usually be reserved for events that unequivocally indicate a truly critical impact to the business. Rock legend Nigel Tufnel offered some wisdom on the subject.

Replacement tokens

Various tokens can be included in the description, log message, operator instruction and automatic actions for each event. These tokens will be replaced by values from the current event when the text for the event is constructed. Not all events will have values for all tokens, and some refer specifically to information available only in events derived from SNMP traps.

%eventid%

The event’s numeric database ID

%uei%

The Universal Event Identifier for the event.

%source%

The source of the event (which OpenNMS Horizon service daemon created it).

%time%

The time of the event.

%dpname%

The ID of the Minion (formerly distributed poller) that the event was received on.

%nodeid%

The numeric node ID of the device that caused the event, if any.

%nodelabel%

The node label for the node given in %nodeid% if available.

%host%
%interface%

The IP interface associated with the event, if any.

%interfaceresolv%

Does a reverse lookup on the %interface% and returns its name if available.

%service%

The service associated with the event, if any.

%severity%

The severity of the event.

%snmphost%

The host of the SNMP agent that generated the event.

%id%

The SNMP Enterprise OID for the event.

%idtext%

The decoded (human-readable) SNMP Enterprise OID for the event (?).

%ifalias%

The interface' SNMP ifAlias.

%generic%

The Generic trap-type number for the event.

%specific%

The Specific trap-type number for the event.

%community%

The community string for the trap.

%version%

The SNMP version of the trap.

%snmp%

The SNMP information associated with the event.

%operinstruct%

The operator instructions for the event.

%mouseovertext%

The mouse over text for the event.

Parameter tokens

Many events carry additional information in parameters (see Anatomy of an Event). These parameters may start life as SNMP trap variable bindings, or varbinds for short. You can access event parameters using the parm replacement token, which takes several forms:

%parm[all]%

Space-separated list of all parameter values in the form parmName1="parmValue1" parmName2="parmValue2" and so on.

%parm[values-all]%

Space-separated list of all parameter values (without their names) associated with the event.

%parm[names-all]%

Space-separated list of all parameter names (without their values) associated with the event.

%parm[<name>]%

Will return the value of the parameter named <name> if it exists.

%parm[##]%

Will return the total number of parameters as an integer.

%parm[#<num>]%

Will return the value of parameter number <num> (one-indexed).

%parm[name-#<num>]%

Will return the name of parameter number <num> (one-indexed).

The structure of the eventconf.xml tributary files

The ordering of event definitions is very important, as an incoming event is matched against them in order. It is possible and often useful to have several event definitions which could match variant forms of a given event, for example based on the values of SNMP trap variable bindings.

The tributary files included via the <event-file> tag have been broken up by vendor. When OpenNMS Horizon starts, each tributary file is loaded in order. The ordering of events inside each tributary file is also preserved.

The tributary files listed at the very end of eventconf.xml contain catch-all event definitions. When slotting your own event definitions, take care not to place them below these catch-all files; otherwise your definitions will be effectively unreachable.

A few tips
  • To save memory and shorten startup times, you may wish to remove event definition files that you know you do not need.

  • If you need to customize some events in one of the default tributary files, you may wish to make a copy of the file containing only the customized events, and slot the copy above the original; this practice will make it easier to maintain your customizations in case the default file changes in a future release of OpenNMS Horizon.

6.4.3. Reloading the event configuration

After making manual changes to OPENNMS_HOME/etc/eventconf.xml or any of its tributary files, you can trigger a reload of the event configuration by issuing the following command on the OpenNMS Horizon server:

OPENNMS_HOME/bin/send-event.pl uei.opennms.org/internal/reloadDaemonConfig -p 'daemonName Eventd'

6.5. Debugging

When debugging events, it may be helpful to lower the minimum severity at which Eventd will log from the default level of WARN. To change this setting, edit OPENNMS_HOME/etc/log4j2.xml and locate the following line:

        <KeyValuePair key="eventd"               value="WARN" />

Changes to log42.xml will be take effect within 60 seconds with no extra action needed. At level DEBUG, Eventd will log a verbose description of every event it handles to OPENNMS_HOME/logs/eventd.log. On busy systems, this setting may create so much noise as to be impractical. In these cases, you can get terse event summaries by setting Eventd to log at level INFO and setting logEventSummaries="yes" in OPENNMS_HOME/etc/eventd-configuration.xml. Note that changes to eventd-configuration.xml require a full restart of OpenNMS Horizon.

7. Alarms

7.1. Alarm Notes

OpenNMS Horizon creates an Alarm for issues in the network. Working with a few people in a team, it is helpful to share information about a current Alarm. Alarm Notes can be used to assign comments to a specific Alarm or a whole class of Alarms. . The figure Alarm Detail View shows the component to add these information in Memos to the Alarm.

Alarm Detail View

01 alarm notes

The Alarm Notes allows to add two types of notes on an existing Alarm or Alarm Class:

  • Sticky Memo: A user defined note for a specific instance of an Alarm. Deleting the Alarm will also delete the sticky memo.

  • Journal Memo: A user defined note for a whole class of alarms based on the resolved reduction key. The Journal Memo will be shown for all Alarms matching a specific reduction key. Deleting an Alarm doesn’t remove the Journal Memo, they can be removed by pressing the "Clear" button on an Alarm with the existing Journal Memo.

If an Alarm has a sticky and/or a Journal Memo it is indicated with two icons on the "Alarm list Summary" and "Alarm List Detail".

7.2. Alarm Sounds

Often users want an audible indication of a change in alarm state. The OpenNMS Horizon alarm list page has the optional ability to generate a sound either on each new alarm or (more annoyingly) on each change to an alarm event count on the page.

The figure Alarm Sounds View shows the alarm list page when alarms sounds are enabled.

Alarm Sounds View

01 alarm sound

By default the alarm sound feature is disabled. System Administrators must activate the sound feature and also set the default sound setting for all users. However users can modify the default sound setting for the duration of their logged-in session using a drop down menu with the following options:

  • Sound off: no sounds generated by the page.

  • Sound on new alarm: sounds generated for every new alarm on the page.

  • Sound on new alarm count: sounds generated for every increase in alarm event count for alarms on the page.

7.3. Flashing Unacknowledged Alarms

By default OpenNMS Horizon displays the alarm list page with acknowledged and unacknowledged alarms listed in separate search tabs. In a number of operational environments it is useful to see all of the alarms on the same page with unacknowledged alarms flashing to indicate that they haven’t yet been noticed by one of the team. This allows everyone to see at a glance the real time status of all alarms and which alarms still need attention.

The figure Alarm Sounds View also shows the alarm list page when flashing unacknowledged alarms are enabled. Alarms which are unacknowledged flash steadily. Alarms which have been acknowledged do not flash and also have a small tick beside the selection check box. All alarms can be selected to be escalated, cleared, acknowledged and unacknowledged.

7.4. Configuring Alarm Sounds and Flashing

By default OpenNMS Horizon does not enable alarm sounds or flashing alarms. The default settings are included in opennms.properties. However rather than editing the default opennms.properties file, the system administrator should enable these features by creating a new file in opennms.properties.d and applying the following settings;

${OPENNMS_HOME}/etc/opennms.properties.d/alarm.listpage.properties

Configuration properties related to Alarm sound and flashing visualization
# ###### Alarm List Page Options ######
# Several options are available to change the default behaviour of the Alarm List Page.
# <opennms url>/opennms/alarm/list.htm
#
# The alarm list page has the ability to generate a sound either on each new alarm
# or (more annoyingly) on each change to an alarm event count on the page.
#
# Turn on the sound feature. Set true and Alarm List Pages can generate sounds in the web browser.
opennms.alarmlist.sound.enable=true
#
# Set the default setting for how the Alarm List Pages generates sounds. The default setting can be
# modified by users for the duration of their logged-in session using a drop down menu .
#    off = no sounds generated by the page.
#    newalarm = sounds generated for every new alarm in the page
#    newalarmcount = sounds generated for every increase in alarm event count for alarms on the page
#
opennms.alarmlist.sound.status=off

# By default the alarm list page displays acknowledged and unacknowledged alarms in separate search tabs
# Some users have asked to be able to see both on the same page. This option allows the alarm list page
# to display acknowledged and unacknowledged alarms on the same list but unacknowledged alarms
# flash until they are acknowledged.
#
opennms.alarmlist.unackflash=true

The sound played is determined by the contents of the following file ${OPENNMS_HOME}/jetty-webapps/opennms/sounds/alert.wav

If you want to change the sound, create a new wav file with your desired sound, name it alert.wav and replace the default file in the same directory.

8. Notifications

8.1. Introduction

OpenNMS Horizon uses notifications to make users aware of an event. Common notification methods are email and paging, but notification mechanisms also exist for:

  • Arbitrary HTTP GET and POST operations

  • Arbitrary external commands

  • Asterisk call origination

  • IRCcat Internet Relay Chat bot

  • SNMP Traps

  • Slack, Mattermost, and other API-compatible team chat platforms

  • Twitter, GNU Social, and other API-compatible microblog services

  • User-provided scripts in any JSR-223 compatible language

  • XMPP

The notification daemon Notifd creates and sends notifications according to configured rules when selected events occur in OpenNMS Horizon.

8.2. Getting Started

The status of notifications is indicated by an icon at the top right of the web UI’s navigation bar. OpenNMS Horizon installs with notifications globally disabled by default.

8.2.1. Enabling Notifications

To enable notifications in OpenNMS Horizon, log in to the web UI as a user with administrator privileges. Hover over the user icon and click the Configure OpenNMS link. The controls for global notification status appear in the top-level configuration menu as Notification Status. Click the On radio button and then the Update button. Notifications are now globally enabled.

The web workflow above is functionally equivalent to editing the notifd-configuration.xml file and setting status="on" in the top-level notifd-configuration element. This configuration file change is picked up on the fly with no need to restart or send an event.

8.2.2. Configuring Destination Paths

To configure notification destination paths in OpenNMS Horizon, navigate to Configure OpenNMS and, in the Event Management section, choose Configure Notifications. In the resulting dialog choose Configure Destination Paths.

The destination paths configuration is stored in the destinationPaths.xml file. Changes to this file are picked up on the fly with no need to restart or send an event.

8.2.3. Configuring Event Notifications

To configure notifications for individual events in OpenNMS Horizon, navigate to Configure OpenNMS and, in the Event Management section, choose _Configure Notifications. Then choose Configure Event Notifications.

The event notification configuration is stored in the notifications.xml file. Changes to this file are picked up on the fly with no need to restart or send an event.

8.3. Concepts

Notifications are how OpenNMS Horizon informs users about an event that happened in the network, without the users having to log in and look at the UI. The core concepts required to understand notifications are:

  • Events and UEIs

  • Users, Groups, and On-Call Roles

  • Duty Schedules

  • Destination Paths

  • Notification Commands

These concepts fit together to form an Event Notification Definition. Also related, but presently only loosely coupled to notifications, are Alarms and Acknowledgments.

8.3.1. Events and UEIs

As discussed in the chapter on Events, events are central to the operation of OpenNMS Horizon. Almost everything that happens in the system is the result of, or the cause of, one or more events; Every notification is triggered by exactly one event. A good understanding of events is therefore essential to a working knowledge of notifications.

Every event has a UEI (Uniform Event Identifier), a string uniquely identifying the event’s type. UEIs are typically formatted in the style of a URI, but the only requirement is that they start with the string uei.. Most notifications are triggered by an exact UEI match (though they may also be triggered with partial UEI matches using regular expression syntax).

8.3.2. Users, Groups, and On-Call Roles

Users are entities with login accounts in the OpenNMS Horizon system. Ideally each user corresponds to a person. They are used to control access to the web UI, but also carry contact information (e-mail addresses, phone numbers, etc.) for the people they represent. A user may receive a notification either individually or as part of a Group or On-Call Role. Each user has several technology-specific contact fields, which must be filled if the user is to receive notifications by the associated method.

Groups are lists of users. In large systems with many users it is helpful to organize them into Groups. A group may receive a notification, which is often a more convenient way to operate than on individual user. Groups allow to assign a set of users to On Call Roles to build more complex notification workflows.

How to create or modify membership of Users in a Group
  1. Login as a User with administrative permissions

  2. Choose Configure OpenNMS from the user specific main navigation which is named as your login user name

  3. Choose Configure Users, Groups and On-Call roles and select Configure Groups

  4. Create a new Group with Add new group or modify an existing Group by clicking the Modify icon next to the Group

  5. Select User from Available Users and use the >> to add them to the Currently in Group or select the users in the Currently in Group list and use << to remove them from the list.

  6. Click Finish to persist and apply the changes

The order of the Users in the group is relevant and is used as the order for Notifications when this group is used as Target in a Destination Path.
How to delete a Group
  1. Login as a User with administrative permissions

  2. Choose Configure OpenNMS from the user specific main navigation which is named as your login user name

  3. Choose Configure Users, Groups and On-Call roles and select Configure Groups

  4. Use the trash bin icon next to the Group to delete

  5. Confirm delete request with OK

On-Call Roles are an overlay on top of groups, designed to enable OpenNMS Horizon to target the appropriate user or users according to a calendar configuration. A common use case is to have System Engineers in On-Call rotations with a given schedule. The On-Call Roles allow to assign a predefined Duty Schedule to an existing Group with Users. For each On-Call Role a User is assigned as a Supervisor to be responsible for the group of people in this On-Call Role.

How to assign a Group to an On-Call Role
  1. Login as a User with administrative permissions

  2. Choose Configure OpenNMS from the user specific main navigation which is named as your login user name

  3. Choose Configure Users, Groups and On-Call roles and select Configure On-Call Roles

  4. Use Add New On-Call Role and set a Name for this On-Call Role, assign an existing Group and give a meaningful description

  5. Click Save to persist

  6. Define a Duty Schedule in the calendar for the given date by click on the Plus (+) icon of the day and provide a notification time for a specific User from the associated Group

  7. Click Save to persist the Schedule

  8. Click Done to apply the changes

8.3.3. Duty Schedules

Every User and Group may have a Duty Schedule, which specifies that user’s (or group’s) weekly schedule for receiving notifications. If a notification should be delivered to an individual user, but that user is not on duty at the time, the notification will never be delivered to that user. In the case of notifications targeting a user via a group, the logic differs slightly. If the group is on duty at the time the notification is created, then all users who are also on duty will be notified. If the group is on duty, but no member user is currently on duty, then the notification will be queued and sent to the next user who comes on duty. If the group is off duty at the time the notification is created, then the notification will never be sent.

8.3.4. Destination Paths

A Destination Path is a named, reusable set of rules for sending notifications. Every destination path has an initial step and zero or more escalation steps.

Each step in a destination path has an associated delay which defaults to zero seconds. The initial step’s delay is called the initial delay, while an escalation step’s delay is simply called its delay.

Each step has one or more targets. A target may be a user, a group, an on-call role, or a one-off e-mail address.

While it may be tempting to use one-off e-mail addresses any time an individual user is to be targeted, it’s a good idea to reserve one-off e-mail addresses for special cases. If a user changes her e-mail address, for instance, you’ll need to update in every destination path where it appears. The use of one-off e-mail addresses is meant for situations where a vendor or other external entity is assisting with troubleshooting in the short term.

When a step targets one or more groups, a delay may also be specified for each group. The default is zero seconds, in which case all group members are notified simultaneously. If a longer delay is set, the group members will be notified in alphabetical order of their usernames.

Avoid using the same name for a group and a user. The destination path configuration does not distinguish between users and groups at the step level, so the behavior is undefined if you have both a user and a group named admin. It is for this reason that the default administrators group is called Admin (with a capital A) — case matters.

Within a step, each target is associated with one or more notification commands. If multiple commands are selected, they will execute simultaneously.

Each step also has an auto-notify switch, which may be set to off, on, or auto. This switch specifies the logic used when deciding whether or not to send a notice for an auto-acknowledged notification to a target that was not on duty at the time the notification was first created. If off, notices will never be sent to such a target; if on, they will always be sent; if auto, the system employs heuristics aimed at "doing the right thing".

8.3.5. Notification Commands

A Notification Command is a named, reusable execution profile for a Java class or external program command used to convey notices to targets. The following notification commands are included in the default configuration:

callHomePhone, callMobilePhone, and callWorkPhone

Ring one of the phone numbers configured in the user’s contact information. All three are implemented using the in-process Asterisk notification strategy, and differ only in which contact field is used.

ircCat

Conveys a notice to an instance of the IRCcat Internet Relay Chat bot. Implemented by the in-process IRCcat notification strategy.

javaEmail and javaPagerEmail

By far the most commonly used commands, these deliver a notice to a user’s email or pagerEmail contact field value. By configuring a user’s pagerEmail contact field value to target an email-to-SMS gateway, SMS notifications are trivially easy to configure. Both are implemented using the in-process JavaMail notification strategy.

microblogDM, microblogReply, and microblogUpdate

Sends a notice to a user as a direct message, at a user via an at-reply, or to everybody as an update via a microblog service with a Twitter v1-compatible API. Each command is implemented with a separate, in-process notification strategy.

numericPage and textPage

Sends a notice to a user’s numeric or alphanumeric pager. Implemented as an external command using the qpage utility.

xmppGroupMessage and xmppMessage

Sends a message to an XMPP group or user. Implemented with the in-process XMPP notification strategy.

Notification commands are customizable and extensible by editing the notificationCommands.xml file.

Use external binary notification commands sparingly to avoid fork-bombing your OpenNMS Horizon system. Originally, all notification commands were external. Today only the numericPage and textPage commands use external programs to do their work.

9. Provisioning

9.1. Introduction

The introduction of OpenNMS version 1.8 empowers enterprises and services providers like never before with a new service daemon for maintaining the managed entity inventory in OpenNMS. This new daemon, Provisiond, unifies all previous entity control mechanisms available in 1.6 (Capsd and the Importer), into a new and improved, massively parallel, policy based provisioning system. System integrators should note, Provisiond comes complete with a RESTFul Web Service API for easy integration with external systems such as CRM or external inventory systems as well as an adapter API for interfacing with other management systems such as configuration management.

OpenNMS 1.0, introduced almost a decade ago now, provided a capabilities scanning daemon, Capsd, as the mechanism for provisioning managed entities. Capsd, deprecated with the release of 1.8.0, provided a rich automatic provisioning mechanism that simply required an IP address to seed its algorithm for creating and maintaining the managed entities (nodes, interfaces, and IP based services). Version 1.2 added and XML-RPC API as a more controlled (directed) strategy for provisioning services that was mainly used by non telco based service providers (i.e. managed hosting companies). Version 1.6 followed this up with yet another and more advanced mechanism called the Importer service daemon. The Importer provided large service providers with the ability to strictly control the OpenNMS entity provisioning with an XML based API for completely defining and controlling the entities where no discovery and service scanning scanning was feasible.

The Importer service improved OpenNMS' scalability for maintaining managed entity databases by an order of magnitude. This daemon, while very simple in concept and yet extremely powerful and flexible provisioning improvement, has blazed the trail for Provisiond. The Importer service has been in production for 3 years in service provider networks maintaining entity counts of more than 50,000 node level entities on a single instances of OpenNMS. It is a rock solid provisioning tool.

Provisiond begins a new era of managed entity provisioning in OpenNMS.

9.2. Concepts

Provisioning is a term that is familiar to service providers (a.k.a. operators, a.k.a. telephone companies) and OSS systems but not so much in the non OSS enterprises.

Provisiond receives "requests" for adding managed entities via 2 basic mechanisms, the OpenNMS Horizon traditional "New Suspect" event, typically via the Discovery daemon, and the import requisition (XML definition of node entities) typically via the Provisioning Groups UI. If you are familiar with all previous releases of OpenNMS, you will recognize the New Suspect Event based Discovery to be what was previously the Capsd component of the auto discovery behavior. You will also recognize the import requisition to be of the Model Importer component of OpenNMS. Provisiond now unifies these two separate components into a massively parallel advanced policy based provisioning service.

9.2.1. Terminology

The following terms are used with respect to the OpenNMS Horizon provisioning system and are essential for understanding the material presented in this guide.

Entity

Entities are managed objects in OpenNMS Horizon such as Nodes, IP interfaces, SNMP Interfaces, and Services.

Foreign Source and Foreign ID

The Importer service from 1.6 introduced the idea of foreign sources and foreign IDs. The Foreign Source uniquely identifies a provisioning source and is still a basic attribute of importing node entities into OpenNMS Horizon. The concept is to provide an external (foreign) system with a way to uniquely identify itself and any node entities that it is requesting (via a requisition) to be provisioned into OpenNMS Horizon.

The Foreign ID is the unique node ID maintained in foreign system and the foreign source uniquely identifies the external system in OpenNMS Horizon.

OpenNMS Horizon uses the combination of the foreign source and foreign ID become the unique foreign key when synchronizing the set of nodes from each source with the nodes in the OpenNMS Horizon DB. This way the foreign system doesn’t have to keep track of the OpenNMS Horizon node IDs that are assigned when a node is first created. This is how Provisiond can decided if a node entity from an import requisition is new, has been changed, or needs to be deleted.

Foreign Source Definition

Additionally, the foreign source has been extended to also contain specifications for how entities should be discovered and managed on the nodes from each foreign source. The name of the foreign source has become pervasive within the provisioning system and is used to simply some of the complexities by weaving this name into:

  • the name of the provisioning group in the Web-UI

  • the name of the file containing the persisted requisition (as well as the pending requisition if it is in this state)

  • the foreign-source attribute value inside the requisition (obviously, but, this is pointed out to indicate that the file name doesn’t necessarily have to equal the value of this attribute but is highly recommended as an OpenNMS Horizon best practice)

  • the building attribute of the node defined in the requisition (this value is called “site” in the Web-UI and is assigned to the building column of the node’s asset record by Provisiond and is the default value used in the Site Status View feature)

Import Requisition

Import requisition is the terminology OpenNMS Horizon uses to represent the set of nodes, specified in XML, to be provisioned from a foreign source into OpenNMS Horizon. The requisition schema (XSD) can be found at the following location. http://xmlns.opennms.org/xsd/config/model-import

Auto Discovery

Auto discovery is the term used by OpenNMS Horizon to characterize the automatic provisioning of nodes entities. Currently, OpenNMS Horizon uses an ICMP ping sweep to find IP address on the network. For the IPs that respond and that are not currently in the DB, OpenNMS Horizon generates a new suspect event. When this event is received by Provisiond, it creates a node and it begins a node scan based on the default foreign source definition.

Directed Discovery

Provisiond takes over for the Model Importer found in version 1.6 which implemented a unique, first of its kind, controlled mechanism for specifying managed entities directly into OpenNMS Horizon from one or more data sources. These data sources often were in the form of an in-housed developed inventory or stand-alone provisioning system or even a set of element management systems. Using this mechanism, OpenNMS Horizon is directed to add, update, or delete a node entity exactly as defined by the external source. No discovery process is used for finding more interfaces or services.

Enhanced Directed Discovery

Directed discovery is enhanced with the capability to scan nodes that have been directed nodes for entities (interfaces.

Policy Based Discovery

The phrase, Policy based Directed Discovery, is a term that represents the latest step in OpenNMS Horizon provisioning evolution and best describes the new provisioning architecture now in OpenNMS Horizon for maintaining its inventory of managed entities. This term describes the control that is given over the Provisioning system to OpenNMS Horizon users for managing the behavior of the NMS with respect to the new entities that are being discovered. Current behaviors include persistence, data collection, service monitoring, and categorization policies.

9.2.2. Addressing Scalability

The explosive growth and density of the IT systems being deployed today to support not traditional IP services is impacting management systems like never before and is demanding from them tremendous amounts of scalability. The scalability of a management system is defined by its capacity for maintaining large numbers of managing entities coupled with its efficiency of managing the entities.

Today, It is not uncommon for OpenNMS Horizon deployments to find node entities with tens of thousands of physical interfaces being reported by SNMP agents due to virtualization (virtual hosts, interfaces, as well as networks). An NMS must be capable of using the full capacity every resource of its computing platform (hardware and OS) as effectively as possible in order to manage these environments. The days of writing scripts or single threaded applications will just no longer be able to do the work required an NMS when dealing with the scalability challenges facing systems and systems administrators working in this domain.

Parallelization and Non-Blocking I/O

Squeezing out every ounce of power from a management system’s platform (hardware and OS) is absolutely required to complete all the work of a fully functional NMS such as OpenNMS Horizon. Fortunately, the hardware and CPU architecture of a modern computing platform provides multiple CPUs with multiple cores having instruction sets that include support for atomic operations. While these very powerful resources are being provided by commodity systems, it makes the complexity of developing applications to use them vs. not using them, orders of magnitude more complex. However, because of scalability demands of our complex IT environments, multi-threaded NMS applications are now essential and this has fully exposed the complex issues of concurrency in software development.

OpenNMS Horizon has stepped up to this challenge with its new concurrency strategy. This strategy is based on a technique that combines the efficiency of parallel (asynchronous) operations (traditionally used by most effectively by single threaded applications) with the power of a fully current, non-blocking, multi-threaded design. The non-blocking component of this new concurrency strategy added greater complexity but OpenNMS Horizon gained orders of magnitude in increased scalability.

Java Runtimes, based on the Sun JVM, have provided implementations for processor based atomic operations and is the basis for OpenNMS Horizon’ non-blocking concurrency algorithms.
Provisioning Policies

Just because you can, doesn’t mean you should! Because the massively parallel operations being created for Provisiond allows tremendous numbers of nodes, interfaces, and services to be very rapidly discovered and persisted, doesn’t mean it should. A policy API was created for Provisiond that allows implementations to be developed that can be applied to control the behavior of Provisiond. The 1.8 release includes a set of flexible provisioning policies that control the persistence of entities and their attributes constrain monitoring behavior.

When nodes are imported or re-scanned, there is, potentially, a set of zero or more provisioning policies that are applied. The policies are defined in the foreign source’s definition. The policies for an auto-discovered node or nodes from provisioning groups that don’t have a foreign source definition, are the policies defined in the default foreign source definition.

The Default Foreign Source Definition

Contained in the libraries of the Provisioning service is the "template" or default foreign source. The template stored in the library is used until the OpenNMS Horizon admin user alters the default from the Provisioning Groups WebUI. Upon edit, this template is exported to the OpenNMS Horizon etc/ directory with the file name: default-foreign-source.xml.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<foreign-source date-stamp="2009-10-16T18:04:12.844-05:00"
                name="default"
                xmlns="http://xmlns.opennms.org/[http://xmlns.opennms.org/xsd/config/foreign-source">
    <scan-interval>1d</scan-interval>
    <detectors>
      <detector class="org.opennms.netmgt.provision.detector.datagram.DnsDetector" name="DNS"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.FtpDetector" name="FTP"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.HttpDetector" name="HTTP"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.HttpsDetector" name="HTTPS"/>
      <detector class="org.opennms.netmgt.provision.detector.icmp.IcmpDetector" name="ICMP"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.ImapDetector" name="IMAP"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.LdapDetector" name="LDAP"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.NrpeDetector" name="NRPE"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.Pop3Detector" name="POP3"/>
      <detector class="org.opennms.netmgt.provision.detector.radius.RadiusAuthDetector" name="Radius"/>
      <detector class="org.opennms.netmgt.provision.detector.simple.SmtpDetector" name="SMTP"/>
      <detector class="org.opennms.netmgt.provision.detector.snmp.SnmpDetector" name="SNMP"/>
      <detector class="org.opennms.netmgt.provision.detector.ssh.SshDetector" name="SSH"/>
  </detectors>
  <policies/>
</foreign-source>
Automatic Rescanning

The default foreign source defines a scan-interval of 1d, which will cause all nodes in the requisition to be scanned daily. You may set the scan interval using any combination of the following signifiers:

  • w: Weeks

  • d: Days

  • h: Hours

  • m: Minutes

  • s: Seconds

  • ms: Milliseconds

For example, to rescan every 6 days and 53 minutes, you would set the scan-interval to 6d 53m.

Don’t forget, for the new scan interval to take effect, you will need to import the requisition one more time so that the foreign source becomes active.

Disabling Rescan

For a large number of devices, you may want to set the scan-interval to 0 to disable automatic rescan altogether. OpenNMS Horizon will not attempt to rescan the nodes in the requisition unless you trigger a manual (forced) rescan through the web UI or Provisioning ReST API.

9.3. Getting Started

An NMS is of no use until it is setup for monitoring and entities are added to the system. OpenNMS Horizon installs with a base configuration with a configuration that is sufficient get service level monitoring and performance management quickly up and running. As soon as managed entities are provisioned, the base configuration will automatically begin monitoring and reporting.

Generally speaking, there are two methods of provisioning in OpenNMS Horizon: Auto Discovery and Directed Discovery. We’ll start with Auto Discovery, but first, we should quickly review the configuration of SNMP so that newly discovered devices can be immediately scanned for entities as well as have reporting and thresholding available.

9.3.1. Provisioning the SNMP Configuration

OpenNMS Horizon requires SNMP configuration to be properly setup for your network in order to properly understand Network and Node topology as well as to automatically enable performance data collection. Network topology is updated as nodes (a.k.a. devices or hosts) are provisioned. Navigate to the Admin/Configure SNMP Community Names by IP address as shown below.

Configuring SNMP community names

00029

Provisiond includes an option to add community information in the Single Node provisioning interface. This, is equivalent of entering a single IP address in the screen with the convenience of setting the community string at the same time a node is provisioned. See the Quick Node Add feature below for more details about this capability.

This screen sets up SNMP within OpenNMS Horizon for agents listening on IP addresses 10.1.1.1 through 10.254.254.254. These settings are optimized into the snmp-configuration.xml file. Optimization means that the minimal configuration possible will be written. Any IP addresses already configured that are eclipsed by this range will be removed. Here is the resulting configuration.

Sample snmp-config.xml
<?xml version="1.0" encoding="UTF-8"?>

<snmp-config
xmlns="http://xmlns.opennms.org/xsd/config/snmp[http://xmlns.opennms.org/xsd/config/snmp]"
port="161" retry="3" timeout="800" read-community="public"

version="v1" max-vars-per-pdu="10">

<definition retry="1" timeout="2000"

read-community="public" version="v2c">

<specific>10.12.23.32</specific>

</definition>

</snmp-config>

However, If an IP address is then configured that is within the range, the range will be split into two separate ranges and a specific entry will be added. For example, if a configuration was added through the same UI for the IP: 10.12.23.32 having the community name public, then the resulting configuration will be:

<?xml version="1.0" encoding="UTF-8"?>
<snmp-config xmlns="http://xmlns.opennms.org/xsd/config/snmp"
             port="161"
             retry="3"
             timeout="800"
             read-community="public"
             version="v1"
             max-vars-per-pdu="10">

    <definition retry="1" timeout="2000" read-community="YrusoNoz" version="v2c">
        <range begin="10.1.1.1" end="10.12.23.31"/>
        <range begin="10.12.23.33" end="10.254.254.254"/>
    </definition>

    <definition retry="1" timeout="2000" read-community="public" version="v2c">
        <specific>10.12.23.32</specific>
    </definition>
</snmp-config>
the bold IP addresses show where the range was split and the specific with community name "public" was added.

Now, with SNMP configuration provisioned for our 10 networks, we are ready to begin adding nodes. Our first example will be to automatically discover and add all managed entities (nodes, IP interfaces, SNMP Interfaces, and Monitored IP based Services). We will then give an example of how to be more directed and deliberate about your discovery by using Provisioning Groups.

Automatically discovered entities are analyzed, persisted to the relational data store, and then managed based on the policies defined in the default foreign source definition. This is very similar to the way that entities were previously handled by the (now obsolete) Capsd daemon but with finer grained sense of control.

9.3.2. Automatic Discovery

Currently in OpenNMS Horizon, the ICMP is used to automatically provision node entities into OpenNMS Horizon. This functionality has been in OpenNMS since is 1.0 release, however, in 1.8, a few of the use cases have been updated with Provisiond’s replacement of Capsd.

Separation of Concerns

Version 1.8 Provisiond separates what was called Capsd scanning in to 3 distinct phases: entity scanning, service detection, and node merging. These phases are now managed separately by Provisiond. Immediately following the import of a node entity, tasks are created for scanning a node to discover the node entity’s interfaces (SNMP and IP). As interfaces are found, they are persisted and tasks are scheduled for service detection of each IP interface.

For auto discovered nodes, a node merging phase is scheduled; Nodes that have been directly provisioned will not be included in the node merging process. Merging will only occur when 2 automatically discovered nodes appear to be the same node.

the use case and redesign of node merging is still an outstanding issue with the 1.8.0 release

9.3.3. Enhanced Directed Discovery

This new form of provisioning first appears in OpenNMS with version 1.8 and the new Provisiond service. It combines the benefits of the Importer’s strictly controlled methodology of directed provisioning (from version 1.6) with OpenNMS’ robustly flexible auto discovery. Enhanced Directed discovery begins with an enhanced version of the same import requisition used in directed provisioning and completes with a policy influenced persistence phase that sorts though the details of all the entities and services found during the entity and service scanning phase.

If you are planning to use this form of provisioning, it important to understand the conceptual details of how Provisiond manages entities it is directed to provision. This knowledge will enable administrators and systems integrators to better plan, implement, and resolve any issues involved with this provisioning strategy.

Understanding the Process

There are 3 phases involved with directing entities to be discovered: import, node scan, and service scan. The import phase also has sub phases: marshal, audit, limited SNMP scan, and re-parent.

Marshal and Audit Phases

It is important to understand that the nodes requisitioned from each foreign source are managed as a complete set. Nodes defined in a requisition from the foreign source CRM and CMDB, for example, will be managed separately from each other even if they should contain exactly the same node definitions. To OpenNMS Horizon, these are individual entities and they are managed as a set.

Requisitions are referenced via a URL. Currently, the URL can be specified as one of the following protocols: FILE, HTTP, HTTPS, and DNS. Each protocol has a protocol handler that is used to stream the XML from a foreign source, i.e. http://inv.corp.org/import.cgi?customer=acme or file:/opt/opennms/etc/imports/acme.xml. The DNS protocol is a special handler developed for Provisioning sets of nodes as a foreign-source from a corporate DNS server. See DNS Protocol Handler for details.

Upon the import request (either on schedule or on demand via an Event) the requisition is marshaled into Java objects for processing. The nodes defined in the requisition represent what OpenNMS Horizon should have as the current set of managed entities from that foreign source. The audit phase determines for each node defined (or not defined) in the requisition which are to be processed as an Add, Update, or Delete operation during the Import Phase. This determination is made by comparing the set foreign IDs of each node in the requisition set with the set of foreign IDs of currently managed entities in OpenNMS Horizon.

The intersection of the IDs from each set will become the Update operations, the extra set of foreign IDs that are in the requisition become the Add operations, and the extra set of foreign IDs from the managed entities become the Delete operations. This implies that the foreign IDs from each foreign source must be unique.

Naturally, the first time an import request is processed from a foreign source there will be zero (0) node entities from the set of nodes currently being managed and each node defined in the requisition will become an Add Operation. If a requisition is processed with zero (0) node definitions, all the currently managed nodes from that foreign source will become Delete operations (all the nodes, interfaces, outages, alarms, etc. will be removed from OpenNMS Horizon).

When nodes are provisioned using the Provisioning Groups Web-UI, the requisitions are stored on the local file system and the file protocol handler is used to reference the requisition. Each Provisioning Group is a separate foreign source and unique foreign IDs are generated by the Web-UI. An MSP might use Provisioning Groups to define the set of nodes to be managed by customer name where each customer’s set of nodes are maintained in a separate Provisioning Group.

Import Phase

The import phase begins when Provisiond receives a request to import a requisition from a URL. The first step in this phase is to load the requisition and marshal all the node entities defined in the requisition into Java objects.

If any syntactical or XML structural problems occur in the requisition, the entire import is abandoned and no import operations are completed.

Once the requisition is marshaled, the requisition nodes are audited against the persisted node entities. The set of requisitioned nodes are compared with a subset of persisted nodes and this subset is generated from a database query using the foreign source defined in the requisition. The audit generates one of three operations for each requisition node: insert, update, delete based on each requisitioned node’s foreign ID. Delete operations are created for any nodes that are not in the requisition but are in the DB subset, update operations are created for requisition nodes that match a persisted node from the subset (the intersection), and insert operations are created from the remaining requisition nodes (nodes in the requisition that are not in the DB subset).

If a requisition node has an interface defined as the Primary SNMP interface, then during the update and insert operations the node will be scanned for minimal SNMP attribute information. This scan find the required node and SNMP interface details required for complete SNMP support of the node and only the IP interfaces defined in the requisition.

this not the same as Provisiond SNMP discovery scan phases: node scan and interface scan.
Node Scan Phase

Where directed discovery leaves off and enhanced directed discovery begins is that after all the operations have completed, directed discovery is finished and enhanced directed discovery takes off. The requisitioned nodes are scheduled for node scans where details about the node are discovered and interfaces that were not directly provisioned are also discovered. All physical (SNMP) and logical (IP) interfaces are discovered and persisted based on any Provisioning Policies that may have been defined for the foreign source associated with the import requisition.

Service Scan (detection) Phase

Additionally, the new Provisiond enhanced directed discovery mechanism follows interface discovery with service detection on each IP interface entity. This is very similar to the Capsd plugin scanning found in all former releases of OpenNMS except that the foreign source definition is used to define what services should be detected on these interfaces found for nodes in the import requisition.

9.4. Import Handlers

The new Provisioning service in OpenNMS Horizon is continuously improving and adapting to the needs of the community.

One of the most recent enhancements to the system is built upon the very flexible and extensible API of referencing an import requisition’s location via a URL. Most commonly, these URLs are files on the file system (i.e. file:/opt/opennms/etc/imports/<my-provisioning-group.xml>) as requisitions created by the Provisioning Groups UI. However, these same requisitions for adding, updating, and deleting nodes (based on the original model importer) can also come from URLs. For example a requisition can be retrieving the using HTTP protocol: http://myinventory.server.org/nodes.cgi

In addition to the standard protocols supported by Java, we provide a series of custom URL handlers to help retrieve requisitions from external sources.

9.4.1. Generic Handler

The generic handler is made available using URLs of the form: requisition://type?param=1;param=2

Using these URLs various type handlers can be invoked, both locally and via a Minion.

In addition to the type specific parameters, the following parameters are supported:

Table 13. General parameters
Parameter Description Required Default value

location

The name of location at which the handler should be run

optional

Default

ttl

The maximum number of miliseconds to wait for the handler when ran remotely

optional

20000

See the relevant sections bellow for additional details on the support types.

The provision:show-import command available via the Karaf Shell can be used to show the results of an import (without persisting or triggering the import):

provision:show-import -l MINION http url=http://127.0.0.1:8000/req.xml

9.4.2. File Handler

Examples:

Simple
file:///path/to/my/requisition.xml
Using the generic handler
requisition://file?path=/path/to/my/requisition.xml;location=MINION

9.4.3. HTTP Handler

Examples:

Simple
http://myinventory.server.org/nodes.cgi
Using the generic handler
requisition://http?url=http%3A%2F%2Fmyinventory.server.org%2Fnodes.cgi
When using the generic handler, the URL should be "URL encoded".

9.4.4. DNS Handler

The DNS handler requests a Zone Transfer (AXFR) request from a DNS server. The A records are recorded and used to build an import requisition. This is handy for organizations that use DNS (possibly coupled with an IP management tool) as the data base of record for nodes in the network. So, rather than ping sweeping the network or entering the nodes manually into OpenNMS Horizon Provisioning UI, nodes can be managed via 1 or more DNS servers.

The format of the URL for this new protocol handler is: dns://<host>[:port]/<zone>[/<foreign-source>/][?expression=<regex>]

DNS Import Examples:

Simple
dns://my-dns-server/myzone.com

This URL will import all A records from the host my-dns-server on port 53 (default port) from zone "myzone.com" and since the foreign source (a.k.a. the provisioning group) is not specified it will default to the specified zone.

Using a Regular Expression Filter
dns://my-dns-server/myzone.com/portland/?expression=^por-.*

This URL will import all nodes from the same server and zone but will only manage the nodes in the zone matching the regular expression ^port-.* and will and they will be assigned a unique foreign source (provisioning group) for managing these nodes as a subset of nodes from within the specified zone.

If your expression requires URL encoding (for example you need to use a ? in the expression) it must be properly encoded.

dns://my-dns-server/myzone.com/portland/?expression=^por[0-9]%3F
DNS Setup

Currently, the DNS server requires to be setup to allow a zone transfer from the OpenNMS Horizon server. It is recommended that a secondary DNS server is running on OpenNMS Horizon and that the OpenNMS Horizon server be allowed to request a zone transfer. A quick way to test if zone transfers are working is:

dig -t AXFR @<dnsServer> <zone>
Configuration

The configuration of the Provisoning system has moved from a properties file (model-importer.properties) to an XML based configuration container. The configuration is now extensible to allow the definition of 0 or more import requisitions each with their own cron based schedule for automatic importing from various sources (intended for integration with external URL such as http and this new dns protocol handler.

A default configuration is provided in the OpenNMS Horizon etc/ directory and is called: provisiond-configuration.xml. This default configuration has an example for scheduling an import from a DNS server running on the localhost requesting nodes from the zone, localhost and will be imported once per day at the stroke of midnight. Not very practical but is a good example.

<?xml version="1.0" encoding="UTF-8"?>
    <provisiond-configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.opennms.org/xsd/config/provisiond-configuration"
                              foreign-source-dir="/opt/opennms/etc/foreign-sources"
                              requistion-dir="/opt/opennms/etc/imports"
                              importThreads="8"
                              scanThreads="10"
                              rescanThreads="10"
                              writeThreads="8" >

    <!--http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger
        Field Name Allowed Values Allowed Special Characters
        Seconds 0-59 , - * / Minutes 0-59 , - * / Hours 0-23 , - * /
        Day-of-month1-31, - * ? / L W C Month1-12 or JAN-DEC, - * /
        Day-of-Week1-7 or SUN-SAT, - * ? / L C # Year (Opt)empty, 1970-2099, - * /
    -->

    <requisition-def import-name="localhost"
                     import-url-resource="dns://localhost/localhost">

        <cron-schedule>0 0 0 * * ? *</cron-schedule> <!-- daily, at midnight -->
    </requisition-def>
</provisiond-configuration>
Configuration Reload

Like many of the daemon configuration in the 1.7 branch, the configurations are reloadable without having to restart OpenNMS Horizon, using the reloadDaemonConfig uei:

/opt/opennms/bin/send-event.pl
uei.opennms.org/internal/reloadDaemonConfig --parm 'daemonName Provisiond'

This means that you don’t have to restart OpenNMS Horizon every time you update the configuration.

9.5. Provisioning Examples

Here are a few practical examples of enhanced directed discovery to help with your understanding of this feature.

9.5.1. Basic Provisioning

This example adds three nodes and requires no OpenNMS Horizon configuration other than specifying the node entities to be provisioned and managed in OpenNMS Horizon.

Defining the Nodes via the Web-UI

Using the Provisioning Groups Web-UI, three nodes are created given a single IP address. Navigate to the Admin Menu and click Provisioning Groups Menu from the list of Admin options and create the group Bronze.

Creating a new Provisioning Group

00006

Clicking the Add New Group button will create the group and will redisplay the page including this new group among the list of any group(s) that have already been created.

00028

At this point, the XML structure for holding the new provisioning group (a.k.a. an import requisition) has been persisted to the '$OPENNMS_ETC/imports/pending' directory.

Clicking the Edit link will bring you to the screen where you can begin the process of defining node entities that will be imported into OpenNMS Horizon. Click the Add Node button will begin the node entity creation process fill in the node label and click the Save button.

Creating a new Node definition in the Provisioning Group

00026

At this point, the provisioning group contains the basic structure of a node entity but it is not complete until the interface(s) and interface service(s) have been defined. After having clicked the Save button, as we did above presents, in the Web-UI, the options Add Interface, Add Node Category, and Add Node Asset. Click the Add Interface link to add an interface entity to the node.

Adding an Interface to the node definition

00009

Enter the IP address for this interface entity, a description, and specify the Primary attribute as P (Primary), S (Secondary), N (Not collected), or C (Collected) and click the save button. Now the node entity has an interface for which services can be defined for which the Web-UI now presents the Add Service link. Add two services (ICMP, SNMP) via this link.

A complete node definition with all required elements defined.

00007

Now the node entity definition contains all the required elements necessary for importing this requisition into OpenNMS Horizon. At this point, all the interfaces that are required for the node should be added. For example, NAT interfaces should be specified there are services that they provide because they will not be discovered during the Scan Phase.

Two more node definitions will be added for the benefit of this example.

The completed requisition for the example Bronze Provisioning Group

00021

This set of nodes represents an import requisition for the Bronze provisioning group. As this requisition is being edited via the WebUI, changes are being persisted into the OpenNMS Horizon configuration directory '$OPENNMS_etc/imports/' pending as an XML file having the name bronze.xml.

The name of the XML file containing the import requisition is the same as the provisioning group name. Therefore naming your provisioning group without the use of spaces makes them easier to manage on the file system.

Click the Done button to return to the Provisioning Groups list screen. The details of the “Bronze” group now indicates that there are 3 nodes in the requisition and that there are no nodes in the DB from this group (a.k.a. foreign source). Additionally, you can see that time the requisition was last modified and the time it last imported are given (the time stamps are stored as attributes inside the requisition and are not the file system time stamps). These details are indicative of how well the DB represents what is in the requisition.

00013

You can tell that this is a pending requisition for 2 reasons: 1) there are 3 nodes defined and 0 nodes in the DB, 2) the requisition has been modified since the last import (in this case never).
Import the Nodes

In this example, you see that there are 3 nodes in the pending requisition and 0 in the DB. Click the Import button to submit the requisition to the provisioning system (what actually happens is that the Web-UI sends an event to the Provisioner telling it to begin the Import Phase for this group).

Do not refresh this page to check the values of these details. To refresh the details to verify the import, click the Provisioning Groups bread crumb item.

You should be able to immediately verify the importation of this provisioning group because the import happens very quickly. Provisiond has several threads ready for processing the import operations of the nodes defined in this requisition.

A few SNMP packets are sent and received to get the SNMP details of the node and the interfaces defined in the requisition. Upon receipt of these packets (or not) each node is inserted as a DB transaction.

The nodes are now added to OpenNMS Horizon and are under management.

000014

Following the import of a node with thousands of interfaces, you will be able to refresh the Interface table browser on the Node page and see that interfaces and services are being discovered and added in the background. This is the discovery component of directed discovery.

Adding a Node

To direct that another node be added from a foreign source (in this example the Bronze Provisioning Group) simply add a new node definition and re-import. It is important to remember that all the node definitions will be re-imported and the existing managed nodes will be updated, if necessary.

Changing a Node

To direct changes to an existing node, simply add, change, or delete elements or attributes of the node definition and re- import. This is a great feature of having directed specific elements of a node in the requisition because that attributes will simply be changed. For example, to change the IP address of the Primary SNMP interface for the node, barbrady.opennms.org, just change the requisition and re-import.

Each element in the Web-UI has an associated Edit icon Click this icon to change the IP address for barbrady.opennms.org, click save, and then Click the Done button.

Changing the IP address of barbrady.opennms.org from 10.1.1.2 to 192.168.1.1

00027

The Web-UI will return you to the Provisioning Groups screen where you will see that there are the time stamp showing that the requisition’s last modification is more recent that the last import time.

The Provisioning Group must be re-imported

000012

This provides an indication that the group must be re-imported for the changes made to the requisition to take effect. The IP Interface will be simply updated and all the required events (messages) will be sent to communicate this change within OpenNMS Horizon.

The IP interface for barbrady.opennms.org is immediately updated

000008

Deleting a Node

Barbrady has not been behaving, as one might expect, so it is time to remove him from the system. Edit the provisioning group, click the delete button next to the node barbrady.opennms.org, click the Done button.

Bronze Provisioning Group definition indicates a node has been removed and requires an import to delete the node entity from the OpenNMS Horizon system

000010

Click the Import button for the Bronze group and the Barbrady node and its interfaces, services, and any other related data will be immediately deleted from the OpenNMS Horizon system. All the required Events (messages) will be sent by Provisiond to provide indication to the OpenNMS Horizon system that the node Barbrady has been deleted.

Barbrady has been deleted

000011

Deleting all the Nodes

There is a convenient way to delete all the nodes that have been provided from a specific foreign source. From the main Admin/Provisioning Groups screen in the Web-UI, click the Delete Nodes button. This button deletes all the nodes defined in the Bronze requisition. It is very important to note that once this is done, it cannot be undone! Well it can’t be undone from the Web-UI and can only be undone if you’ve been good about keeping a backup copy of your '$OPENMS_ETC/' directory tree. If you’ve made a mistake, before you re-import the requisition, restore the Bronze.xml requisition from your backup copy to the '$OPENNMS_ETC/imports' directory.

All node definitions have been removed from the Bronze requisition. The Web-UI indicates an import is now required to remove them from OpenNMS Horizon.

000019

Clicking the Import button will cause the Audit Phase of Provisiond to determine that all the nodes from the Bronze group (foreign source) should be deleted from the DB and will create Delete operations. At this point, if you are satisfied that the nodes have been deleted and that you will no longer require nodes to be defined in this Group, you will see that the Delete Nodes button has now changed to the Delete Group button. The Delete Group button is displayed when there are no nodes entities from that group (foreign source) in OpenNMS Horizon.

When no node entities from the group exist in OpenNMS Horizon, then the Delete Group button is displayed.

9.5.2. Advanced Provisioning Example

In the previous example, we provisioned 3 nodes and let Provisiond complete all of its import phases using a default foreign source definition. Each Provisioning Group can have a separate foreign source definition that controls:

  • The rescan interval

  • The services to be detected

  • The policies to be applied

This example will demonstrate how to create a foreign source definition and how it is used to control the behavior of Provisiond when importing a Provisioning Group/foreign source requisition.

First let’s simply provision the node and let the default foreign source definition apply.

The node definition used for the Advanced Provisioning Example

00025

Following the import, All the IP and SNMP interfaces, in addition to the interface specified in the requisition, have been discovered and added to the node entity. The default foreign source definition has no polices for controlling which interfaces that are discovered either get persisted or managed by OpenNMS Horizon.

000005

Logical and Physical interface and Service entities directed and discovered by Provisiond.

000002

000018

Service Detection

As IP interfaces are found during the node scan process, service detection tasks are scheduled for each IP interface. The service detections defined in the foreign source determines which services are to be detected and how (i.e. the values of the parameters that parameters control how the service is detected, port, timeout, etc.).

Applying a New Foreign Source Definition

This example node has been provisioned using the Default foreign source definition. By navigating to the Provisioning Groups screen in the OpenNMS Horizon Web-UI and clicking the Edit Foreign Source link of a group, you can create a new foreign source definition that defines service detection and policies. The policies determine entity persistence and/or set attributes on the discovered entities that control OpenNMS Horizon management behaviors.

When creating a new foreign source definition, the default definition is used as a template.

000017

In this UI, new Detectors can be added, changed, and removed. For this example, we will remove detection of all services accept ICMP and DNS, change the timeout of ICMP detection, and a new Service detection for OpenNMS Horizon Web-UI.

Custom foreign source definition created for NMS Provisioning Group (foreign source).

00022

Click the Done button and re-import the NMS Provisioning Group. During this and any subsequent re-imports or re- scans, the OpenNMS Horizon detector will be active, and the detectors that have been removed will no longer test for the related services for the interfaces on nodes managed in the provisioning group (requisition), however, the currently detected services will not be removed. There are 2 ways to delete the previously detected services:

  1. Delete the node in the provisioning group, re-import, define it again, and finally re-import again

  2. Use the ReST API to delete unwanted services. Use this command to remove each unwanted service from each interface, iteratively:

    curl -X DELETE -H "Content-Type: application/xml" -u admin:admin http://localhost:8980/opennms/rest/nodes/6/ipinterfaces/172.16.1.1/services/DNS
There is a sneaky way to do #1. Edit the provisioning group and just change the foreign ID. That will make Provisiond think that a node was deleted and a new node was added in the same requisition! Use this hint with caution and an full understanding of the impact of deleting an existing node.
Provisioning with Policies

The Policy API in Provisiond allow you to control the persistence of discovered IP and SNMP Interface entities and Node Categories during the Scan phase.

Matching IP Interface Policy

The Matching IP Interface policy controls whether discovered interfaces are to be persisted and if they are to be persisted, whether or not they will be forced to be Managed or Unmanaged.

Continuing with this example Provisioning Group, we are going to define a few policies that:

  1. Prevent discovered 10 network addresses from being persisted

  2. Force 192.168 network addresses to be unmanaged

From the foreign source definition screen, click the Add Policy button and the definition of a new policy will begin with a field for naming the policy and a drop down list of the currently installed policies. Name the policy no10s, make sure that the Match IP Interface policy is specified in the class list and click the Save button. This action will automatically add all the parameters required for the policy.

The two required parameters for this policy are action and matchBehavior.

The action parameter can be set to DO_NOT_PERSIST, Manage, or UnManage.

00001

Creating a policy to prevent persistence of 10 network IP interfaces.

The DO_NOT_PERSIST action does just what it indicates, it prevents discovered IP interface entities from being added to OpenNMS Horizon when the matchBehavior is satisfied. The Manage and UnManage values for this action allow the IP interface entity to be persisted by control whether or not that interface should be managed by OpenNMS Horizon.

The matchBehavior action is a boolean control that determines how the optional parameters will be evaluated. Setting this parameter’s value to ALL_PARAMETERS causes Provisiond to evaluate each optional parameter with boolean AND logic and the value ANY_PARAMETERS will cause OR logic to be applied.

Now we will add one of the optional parameters to filter the 10 network addresses. The Matching IP Interface policy supports two additional parameters, hostName and ipAddress. Click the Add Parameter link and choose ipAddress as the key. The value for either of the optional parameters can be an exact or regular expression match. As in most configurations in OpenNMS Horizon where regular expression matching can be optionally applied, prefix the value with the ~ character.

Example Matching IP Interface Policy to not Persist 10 Network addresses

00023

Any subsequent scan of the node or re-imports of NMS provisioning group will force this policy to be applied. IP Interface entities that already exist that match this policy will not be deleted. Existing interfaces can be deleted by recreating the node in the Provisioning Groups screen (simply change the foreign ID and re-import the group) or by using the ReST API:

curl -X DELETE -H "Content-Type: application/xml" -u admin:admin http://localhost:8980/opennms/rest/nodes/6/ipinterfaces/10.1.1.1

The next step in this example is to define a policy that sets discovered 192.168 network addresses to be unmanaged (not managed) in OpenNMS Horizon. Again, click the Add Policy button and let’s call this policy noMgt192168s. Again, choose the Mach IP Interface policy and this time set the action to UNMANAGE.

Policy to not manage IP interfaces from 192.168 networks

00015

The UNMANAGE behavior will be applied to existing interfaces.
Matching SNMP Interface Policy

Like the Matching IP Interface Policy, this policy controls the whether discovered SNMP interface entities are to be persisted and whether or not OpenNMS Horizon should collect performance metrics from the SNMP agent for Interface’s index (MIB2 IfIndex).

In this example, we are going to create a policy that doesn’t persist interfaces that are AAL5 over ATM or type 49 (ifType). Following the same steps as when creating an IP Management Policy, edit the foreign source definition and create a new policy. Let’s call it: noAAL5s. We’ll use Match SNMP Interface class for each policy and add a parameter with ifType as the key and 49 as the value.

Matching SNMP Interface Policy example for Persistence and Data Collection

00003

At the appropriate time during the scanning phase, Provisiond will evaluate the policies in the foreign source definition and take appropriate action. If during the policy evaluation process any policy matches for a “DO_NOT_PERSIST” action, no further policy evaluations will happen for that particular entity (IP Interface, SNMP Interface).
Node Categorization Policy

With this policy, nodes entities will automatically be assigned categories. The policy is defined in the same manner as the IP and SNMP interface polices. Click the Add Policy button and give the policy name, cisco and choose the Set Node Category class. Edit the required category key and set the value to Cisco. Add a policy parameter and choose the sysObjectId key with a value ~^\.1\.3\.6\.1\.4\.1\.9\..*.

Example: Node Category setting policy

00020

Script Policy

This policy allows to use Groovy scripts to modify provisioned node data. These scripts have to be placed in the OpenNMS Horizon etc/script-policies directory. An example would be the change of the node’s primary interface or location. The script will be invoked for each matching node. The following example shows the source code for setting the 192.168.100.0/24 interface to PRIMARY while all remaining interfaces are set to SECONDARY. Furthermore the node’s location is set to Minneapolis.

import org.opennms.netmgt.model.OnmsIpInterface;
import org.opennms.netmgt.model.monitoringLocations.OnmsMonitoringLocation;
import org.opennms.netmgt.model.PrimaryType;

for(OnmsIpInterface iface : node.getIpInterfaces()) {
    if (iface.getIpAddressAsString().matches("^192\\.168\\.100\\..*")) {
        LOG.warn(iface.getIpAddressAsString() + " set to PRIMARY")
        iface.setIsSnmpPrimary(PrimaryType.PRIMARY)
    } else {
        LOG.warn(iface.getIpAddressAsString() + " set to SECONDARY")
        iface.setIsSnmpPrimary(PrimaryType.SECONDARY)
    }
}

node.setLocation(new OnmsMonitoringLocation("Minneapolis", ""));

return node;
New Import Capabilities

Several new XML entities have been added to the import requisition since the introduction of the OpenNMS Importer service in version 1.6. So, in addition to provisioning the basic node, interface, service, and node categories, you can now also provision asset data.

Provisiond Configuration

The configuration of the Provisioning system has moved from a properties file (model-importer.properties) to an XML based configuration container. The configuration is now extensible to allow the definition of 0 or more import requisitions each with their own Cron based schedule for automatic importing from various sources (intended for integration with external URL such as HTTP and this new DNS protocol handler.

A default configuration is provided in the OpenNMS Horizon etc/ directory and is called: provisiond-configuration.xml. This default configuration has an example for scheduling an import from a DNS server running on the localhost requesting nodes from the zone, localhost and will be imported once per day at the stroke of midnight. Not very practical but is a good example.

<?xml version="1.0" encoding="UTF-8"?>
    <provisiond-configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.opennms.org/xsd/config/provisiond-configuration"
        foreign-source-dir="/opt/opennms/etc/foreign-sources"
        requistion-dir="/opt/opennms/etc/imports"
        importThreads="8"
        scanThreads="10"
        rescanThreads="10"
        writeThreads="8" >
    <!--
        http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger[http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger]
        Field Name Allowed Values Allowed Special Characters
        Seconds 0-59 , - * / Minutes 0-59 , - * / Hours 0-23 , - * /
        Day-of-month1-31, - * ? / L W C Month1-12 or JAN-DEC, - * /
        Day-of-Week1-7 or SUN-SAT, - * ? / L C # Year (Opt)empty, 1970-2099, - * /
    -->

    <requisition-def import-name="NMS"
                     import-url-resource="file://opt/opennms/etc/imports/NMS.xml">
        <cron-schedule>0 0 0 * * ? *</cron-schedule> <!-- daily, at midnight -->
    </requisition-def>
</provisiond-configuration>
Configuration Reload

Like many of the daemon configurations in the 1.7 branch, Provisiond’s configuration is re-loadable without having to restart OpenNMS. Use the reloadDaemonConfig uei:

/opt/opennms/bin/send-event.pl uei.opennms.org/internal/reloadDaemonConfig --parm 'daemonName Provisiond'

This means that you don’t have to restart OpenNMS Horizon every time you update the configuration!

Provisioning Asset Data

The Provisioning Groups Web-UI had been updated to expose the ability to add Node Asset data in an import requisition. Click the Add Node Asset link and you can select from a drop down list all the possible node asset attributes that can be defined.

00024

After an import, you can navigate to the Node Page and click the Asset Info link and see the asset data that was just provided in the requisition.

000004

External Requisition Sources

Because Provisiond takes a URL as the location service for import requisitions, OpenNMS Horizon can be easily extended to support sources in addition to the native URL handling provided by Java: file://, http://, and https://. When you configure Provisiond to import requisitions on a schedule you specify using a URL Resource. For requisitions created by the Provisioning Groups WebUI, you can specify a file based URL.

<need further documentation>
Provisioning Nodes from DNS

The new Provisioning service in OpenNMS Horizon is continuously improving and adapting to the needs of the community. One of the most recent enhancements to the system is built upon the very flexible and extensible API of referencing an import requisition’s location via a URL. Most commmonly, these URLs are files on the file system (i.e. file:/opt/opennms/etc/imports/<my-provisioning-group.xml>) as requisitions created by the Provisioning Groups UI. However, these same requistions for adding, updating, and deleting nodes (based on the original model importer) can also come from URLs specifying the HTTP protocol: http://myinventory.server.org/nodes.cgi)

Now, using Java’s extensible protocol handling specification, a new protocol handler was created so that a URL can be specified for requesting a Zone Transfer (AXFR) request from a DNS server. The A records are recorded and used to build an import requisition. This is handy for organizations that use DNS (possibly coupled with an IP management tool) as the data base of record for nodes in the network. So, rather than ping sweeping the network or entering the nodes manually into OpenNMS Horizon Provisioning UI, nodes can be managed via 1 or more DNS servers. The format of the URL for this new protocol handler is:

dns://<host>[:port]/<zone>[/<foreign-source>/][?expression=<regex>]
Simple Example
dns://my-dns-server/myzone.com

This will import all A records from the host my-dns-server on port 53 (default port) from zone myzone.com and since the foreign source (a.k.a. the provisioning group) is not specified it will default to the specified zone.

Using a Regular Expression Filter

You can also specify a subset of the A records from the zone transfer using a regular expression:

dns://my-dns-server/myzone.com/portland/?expression=^por-.*

This will import all nodes from the same server and zone but will only manage the nodes in the zone matching the regular expression ^port-.* and will and they will be assigned a unique foreign source (provisioning group) for managing these nodes as a subset of nodes from within the specified zone.

URL Encoding

If your expression requires URL encoding (for example you need to use a ? in the expression) it must be properly encoded.

dns://my-dns-server/myzone.com/portland/?expression=^por[0-9]%3F
DNS Setup

Currently, the DNS server requires to be setup to allow a zone transfer from the OpenNMS Horizon server. It is recommended that a secondary DNS server is running on OpenNMS Horizon and that the OpenNMS Horizon server be allowed to request a zone transfer. A quick way to test if zone transfers are working is:

dig -t AXFR @<dn5Server> <zone>

9.6. Integrating with Provisiond

The ReST API should be used for integration from other provisioning systems with OpenNMS Horizon. The ReST API provides an interface for defining foreign sources and requisitions.

9.6.1. Provisioning Groups of Nodes

Just as with the WebUI, groups of nodes can be managed via the ReST API from an external system. The steps are:

  1. Create a Foreign Source (if not using the default) for the group

  2. Update the SNMP configuration for each node in the group

  3. Create/Update the group of nodes

9.6.2. Example

Step 1 - Create a Foreign Source

If policies for this group of nodes are going to be specified differently than the default policy, then a foreign source should be created for the group. Using the ReST API, a foreign source can be provided. Here is an example:

The XML can be imbedded in the curl command option -d or be referenced from a file if the @ prefix is used with the file name as in this case.

The XML file: customer-a.foreign-source.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<foreign-source date-stamp="2009-10-12T17:26:11.616-04:00" name="customer-a" xmlns="http://xmlns.opennms.org/xsd/config/foreign-source">
    <scan-interval>1d</scan-interval>
    <detectors>
        <detector class="org.opennms.netmgt.provision.detector.icmp.IcmpDetector" name="ICMP"/>
        <detector class="org.opennms.netmgt.provision.detector.snmp.SnmpDetector" name="SNMP"/>
    </detectors>
    <policies>
        <policy class="org.opennms.netmgt.provision.persist.policies.MatchingIpInterfacePolicy" name="no-192-168">
            <parameter value="UNMANAGE" key="action"/>
            <parameter value="ALL_PARAMETERS" key="matchBehavior"/>
            <parameter value="~^192\.168\..*" key="ipAddress"/>
        </policy>
    </policies>
</foreign-source>

Here is an example curl command used to create the foreign source with the above foreign source specification above:

curl -v -u admin:admin -X POST -H 'Content-type: application/xml' -d '@customer-a.foreign-source.xml' http://localhost:8980/opennms/rest/foreignSources

Now that you’ve created the foreign source, it needs to be deployed by Provisiond. Here an the example using the curl command to deploy the foreign source:

curl -v -u admin:admin http://localhost:8980/opennms/rest/foreignSources/pending/customer-a/deploy -X PUT
The current API doesn’t strictly follow the ReST design guidelines and will be updated in a later release.
Step 2 - Update the SNMP configuration

The implementation only supports a PUT request because it is an implied "Update" of the configuration since it requires an IP address and all IPs have a default configuration. This request is is passed to the SNMP configuration factory in OpenNMS Horizon for optimization of the configuration store snmp-config.xml. This example changes the community string for the IP address 10.1.1.1 to yRuSonoZ.

Community string is the only required element
curl -v -X PUT -H "Content-Type: application/xml" -H "Accept: application/xml" -d <snmp-info><community>yRuSonoZ</community><port>161</port><retries>1</retries><timeout>2000</timeout><version>v2c</version></snmp-info>" -u admin:admin http://localhost:8980/opennms/rest/snmpConfig/10.1.1.1
Step 3 - Create/Update the Requisition

This example adds 2 nodes to the Provisioning Group, customer-a. Note that the foreign-source attribute typically has a 1 to 1 relationship to the name of the Provisioning Group requisition. There is a direct relationship between the foreign- source attribute in the requisition and the foreign source policy specification. Also, typically, the name of the provisioning group will also be the same. In the following example, the ReST API will automatically create a provisioning group based on the value foreign-source attribute specified in the XML requisition.

curl -X POST -H "Content-Type: application/xml" -d "<?xml version="1.0" encoding="UTF-8"?><model-import xmlns="http://xmlns.opennms.org/xsd/config/model-import" date-stamp="2009-03-07T17:56:53.123-05:00" last-import="2009-03-07T17:56:53.117-05:00" foreign-source="customer-a"><node node-label="p-brane" foreign-id="1" ><interface ip-addr="10.0.1.3" descr="en1" status="1" snmp-primary="P"><monitored-service service-name="ICMP"/><monitored-service service-name="SNMP"/></interface><category name="Production"/><category name="Routers"/></node><node node-label="m-brane" foreign-id="1" ><interface ip-addr="10.0.1.4" descr="en1" status="1" snmp-primary="P"><monitored-service service-name="ICMP"/><monitored-service service-name="SNMP"/></interface><category name="Production"/><category name="Routers"/></node></model-import>" -u admin:admin http://localhost:8980/opennms/rest/requisitions

A provisioning group file called etc/imports/customer-a.xml will be found on the OpenNMS Horizon system following the successful completion of this curl command and will also be visible via the WebUI.

Add, Update, Delete operations are handled via the ReST API in the same manner as described in detailed specification.

9.7. Provisioning Single Nodes (Quick Add Node)

Adding a Node to a Current Requisition

Often, it is requested that a single node add/update be completed for an already defined provisioning group. There is a ReST API for the Add Node implementation found in the OpenNMS Horizon Web-UI. For this to work, the provisioning group must already exist in the system even if there are no nodes defined in the group.

  1. Create a foreign source (if required)

  2. Specify SNMP configuration

  3. Provide a single node with the following specification

9.8. Fine Grained Provisioning Using provision.pl

provision.pl provides an example command-line interface to the provisioning-related OpenNMS Horizon REST API endpoints.

The script has many options but the first 3 optional parameters are described here:

You can use --help to the script to see all the available options.
--username (default: admin)
--password (default: admin)
--url (default: http://localhost:8980/opennms/rest)

9.8.1. Create a new requisition

provision.pl provides easy access to the requisition REST service using the requisition option:

${OPENNMS_HOME}/bin/provision.pl requisition customer1

This command will create a new, empty (containing no nodes) requisition in OpenNMS Horizon.

The new requisition starts life in the pending state. This allows you to iteratively build the requisition and then later actually import the nodes in the requisition into OpenNMS Horizon. This handles all adds/changes/deletes at once. So, you could be making changes all day and then at night either have a schedule in OpenNMS Horizon that imports the group automatically or you can send a command through the REST service from an outside system to have the pending requisition imported/reimported.

You can get a list of all existing requisitions with the list option of the provision.pl script:

${OPENNMS_HOME}/bin/provision.pl list
Create a new Node
${OPENNMS_HOME}/bin/provision.pl node add customer1 1 node-a

This command creates a node element in the requisition customer1 called node-a using the script’s node option. The node’s foreign-ID is 1 but it can be any alphanumeric value as long as it is unique within the requisition. Note the node has no interfaces or services yet.

Add an Interface Element to that Node
${OPENNMS_HOME}/bin/provision.pl interface add customer1 1 127.0.0.1

This command adds an interface element to the node element using the interface option to the provision.pl command and it can now be seen in the pending requisition by running provision.pl requisition list customer1.

Add a Couple of Services to that Interface
${OPENNMS_HOME}/bin/provision.pl service add customer1 1 127.0.0.1 ICMP
${OPENNMS_HOME}/bin/provision.pl service add customer1 1 127.0.0.1 SNMP

This adds the 2 services to the specified 127.0.0.1 interface and is now in the pending requisition.

Set the Primary SNMP Interface
${OPENNMS_HOME}/bin/provision.pl interface set customer1 1 127.0.0.1 snmp-primary P

This sets the 127.0.0.1 interface to be the node’s Primary SNMP interface.

Add a couple of Node Categories
${OPENNMS_HOME}/bin/provision.pl category add customer1 1 Routers
${OPENNMS_HOME}/bin/provision.pl category add customer1 1 Production

This adds the two categories to the node and is now in the pending requisition.

These categories are case-sensitive but do not have to be already defined in OpenNMS Horizon. They will be created on the fly during the import if they do not already exist.

Setting Asset Fields on a Node
${OPENNMS_HOME}/bin/provision.pl asset add customer1 1 serialnumber 9999

This will add value of 9999 to the asset field: serialnumber.

Deploy the Import Requisition (Creating the Group)
${OPENNMS_HOME}/bin/provision.pl requisition import customer1

This will cause OpenNMS Horizon Provisiond to import the pending customer1 requisition. The formerly pending requisition will move into the deployed state inside OpenNMS Horizon.

Deleting a Node from a Requisition

Very much the same as the add, except that a single delete command and a re-import is required. What happens is that the audit phase is run by Provisiond and it will be determined that a node has been removed from the requisition and the node will be deleted from the DB and all services will stop activities related to it.

${OPENNMS_HOME}/bin/provision.pl node delete customer1 1 node-a
${OPENNMS_HOME}/bin/provision.pl requisition import customer1

This completes the life cycle of managing a node element, iteratively, in a import requisition.

9.9. Yet Other API Examples

List the Nodes in a Provisioning Group

The provision.pl script doesn’t supply this feature but you can get it via the REST API. Here is an example using curl:

#!/bin/bash
REQ=$1
curl -X GET -H "Content-Type: application/xml" -u admin:admin http://localhost:8980/opennms/rest/requisitions/$REQ 2>/dev/null | xmllint --format -

10. Business Service Monitoring

While OpenNMS Horizon detects issues in your network by device, interface or service, the Business Service Monitoring (BSM) takes it one step further. The BSM components allows you to monitor and model high level Business Services (BS) and helps quickly identify the most critical problems affecting these. With the BSM feature it is possible to model a high level BS context around the technical Service Monitors provided in OpenNMS Horizon. To indicate which BS is effected an Operational Status is calculated.

As an example, let’s assume a company runs an online store. Customers enter through a login system, select items, place them in the shopping cart and checkout using a payment system. The whole service is provided by a few web servers and access data from databases. To monitor the status of the databases, a SQL service monitor on each database server is configured. For testing the web servers a HTTP service monitor is used for each of them. Covering the overall functionality a Page Sequence Monitor (PSM) is used to test the login, shop and payment workflow through the provided web portal. A possible representation of the whole system hierarchy is shown in figure Example scenario for a web shop.

Example scenario for a web shop

01 bsm example scenario

To be able to model this scenarios the BSM functions can be used. Business Service Monitoring (BSM) includes the following components:

  • Business Service Monitoring Daemon (BSMD): Maintains and drives the state of all BS

  • Business Service Editor: Web application which allows you to create, update or delete BS

  • Topology View for Business Services: Visual representation of the Business Service Hierarchy as a component of the Topology User Interface.

  • BSM ReST API: ReST based API to create, read, update or delete BS

10.1. Business Service Hierarchy

BS can depend on each other and build together a Business Service Hierarchy. It can be visualized using the Topology User Interface with the Business Services View. The Operational Status of a BS is ultimately calculated from Alarms and their Severity. To define the class of Alarms a Reduction Key is used and is represented as an Edge of a BS. Giving more granularity than just Up or Down, the Operational Status uses the Severities, i.e. Normal, Warning, Minor, Major, Critical.

Based on the hierarchy, the Operational Status is calculated with Map and Reduce Functions. A Map Function influences which Severity from the Edge is used as an input to the BS. A Reduce Function is used to consolidate the Severities from all Edges of a BS and uses them as inputs and reduces them into a single Severity, which is the Operational Status.

The Topology User Interface allows users to traverse Business Service Hierarchies using the Semantic Zoom Level (SZL). The Semantic Zoom Level (SZL, pronounced as 'sizzle') defines how many Neighbors are shown related to the elements which are in Focus. The number can be interpreted as how many Hops from the Focus should be shown on the Topology User Interface.

02 bsm service hierarchy
Figure 30. Business Service Hierarchy components
1 A top-level Business Service which depends on other Business Services, Monitored Services and Alarms (referenced by Reduction Key)
2 Business Service as child an the Operational Status is used as input for the top-level Business Service
3 IP Service Edge used as an input with auto generated Reduction Keys for node down, interface down and node lost service
4 Reduction Key Edge used as an input to the top-level BS, which references just a node lost service of a Page Sequence Monitor for the user login

To add or remove an additional selected BS or Edge to Focus use in the context menu Add To Focus or Remove From Focus. If you want to have a specific _BS or Edge as a single focus use Set as Focal Point. The Eye icon highlights all elements in the Topology UI which are set to Focus.

10.2. Operational status

Every Business Service maintains an Operational Status that represents the overall status calculated by the Map and Reduce Functions from the Edges. The Operational Status uses the Severities known from Events and Alarms.

Table 14. Operational Status representation
Name Numerical code Color / Code Description

Critical

7

Purple / #c00

This event means that a severe service affecting event has occurred.

Major

6

Red / #f30

Indicates serious disruption or malfunction of a service or system.

Minor

5

Orange / #f90

Used for troubles that have not immediate effect on service or system performance.

Warning

4

Yellow / #fc0

An event has occurred that may require action. This severity can also be used to indicate a condition that should be noted (logged) but does not require immediate action.

Normal

3

Dark green / #360

Informational message. No action required.

Cleared

2

Grey / #eee

This severity is reserved for use in alarms to indicate that an alarm describes a self-clearing error condition has been corrected and service is restored. This severity should never be used in event definitions. Please use "Normal" severity for events that clear an alarm.

Indeterminate

1

Light green / #990

No Severity could be associated with this event.

If a Business Service changes its Operational Status an OpenNMS event of the type uei.opennms.org/bsm/serviceOperationalStatusChanged is generated and sent to the OpenNMS Event Bus. In case the Operational Status changed from Normal to a higher Severity an Event of the type uei.opennms.org/bsm/serviceProblem is generated and has the Severity of the BS. When the BS goes back to normal a Event of the type uei.opennms.org/bsm/serviceProblemResolved is generated.

The Service Problem and Service Problem Resolved events can be used for notifications or ticketing integration.

The log message of the events have the following information:

  • Business Service Name: businessServiceName

  • Business Service Identifier: id

  • Previous Severity Identifier: prevSeverityId

  • Previous Severity Label: prevSeverityLabel

  • New Severity Identifier: newSeverityId

  • New Severity Label: newSeverityLabel

The BSM events are not associated to a Node, Interface or Service.

10.3. Root Cause and Impact Analysis

The Root Cause operation can be used to quickly identify the underlying Reduction Keys as Edges that contribute to the current Operational Status of an element. The Impact Analysis operation, converse to the Root Cause operation, can be used to identify all of the BS affected by a given element. Both of these options are available in the context menu of the Topology User Interface when visualizing BS.

The following example shows how to identify the Root Cause of the critical status of the Shop service. Use the Context Menu on the BS to investigate the Root Cause shown in figure View before performing Root Cause Analysis.

View before performing Root Cause Analysis

03 bsm rca action

The Topology UI sets only elements to Focus which are the reason for the Operational Status of the selected BS. In figure View after performing Root Cause Analysis the Page Sequence Monitor which tests the user login is down and has set the BS to a critical status.

View after performing Root Cause Analysis

04 bsm rca results

Similar to identifying a root cause for a BS it is also possible to identfy which Business Services from a specific Edge are affected. Use the Context Menu on a specific Edge element and select Impact Analysis shown in figure View before performing Impact Analysis.

View before performing Impact Analysis

05 bsm ia action

In figure View after performing Impact Analysis the Business Services for Login, Shop and Payment are affected if this HTTP service is unavailable.

View after performing Impact Analysis

06 bsm ia results

For the reason the service PSM Shop is introducing the critical status for the Business Service Shop, the HTTP service has no impact on the Operational Status of the PSM Shop and is not shown.

10.4. Simulation Mode

To visualize if the configured behavior works as expected, the Simulation Mode can be used to manually set an Alarm status of an Edge element. The Operational Status is calculated with the given Map and Reduce Functions. This allows users to validate and tune their Business Service Hierarchies until the desired status propagation is achieved.

In order to enter Simulation Mode, open the Business Service View in the Topology User Interface and toggle the Simulation Mode option in the Simulate menu at the top of the screen. The Info Panel on the left hand side allows to set the Severity of the selected Edge element. In figure BSM Simulation Mode the Menu and Severity setting is shown.

BSM Simulation Mode

Simulation Mode

The Info Panel can be hidden with the Arrow button in the top left corner.

In the Simulate menu there are Inherite State and Reset State as options available. With Inherite State the current Severities and Operational Status from monitoring is used for the Simulation Mode. By selecting Reset State all states will be set to Normal for simulation.

10.5. Share View

In some cases it is useful to share a specific view on a Business Service Hierarchy. For this reason the menu function Share can be used and generates a link for the current view and can be copied and sent to another user. In figure Share Business Service View the Share menu item was used and a link is generated. The link can be used with Copy & Paste and sent to another user to have access to exactly the same configured _Business Service View.

Share Business Service View

08 bsm share view

The user receiving the link needs an account in OpenNMS to be able to see the Business Service View.

10.6. Change Icons

Each element in the Business Service View has an icon which is assigned to a BS or an Edge. To be able to customize the Business Service View the icons for each element can be changed. Select the element in the Business Service View and choose Change Icon from the Context Menu. As shown in figure Change Icon for Business Service or Edges select the the new icon for the selected element and click Ok to permanently assign the new icon to the element.

Change Icon for Business Service or Edges

09 bsm change icon

It is also possible create custom Icon Sets which is described in the Business Service Monitoring section of the Developer Guide.

10.7. Business Service Definition

The status of Service Monitors and any kind of Alarm can be used to drive the Operational Status of a BS. A BS is defined with the following components:

  • Business Service Name: A unique name used to identify the BS

  • Edges: A set of elements on which this BS relies which can include other BS, or Reduction Keys.

  • Reduce Function: Function used to aggregate the Operational Status from all the Edges. Specific functions may take additional parameters.

  • Attributes: Optional key/value pairs that can be used to tag or enrich the Busines Service with additional information.

Each Business Service can contain a list of optional key/value attributes. These can be used to identify or tag the BS, and may be reference in other workflows. These attributes do not affect the dependencies or the status calculation of the BS.

Attributes can be used to filter BS in Ops Board dashlets.

The Business Service Editor is used to manage and model the Business Services and their hierarchy. It is required to have administrative permissions and is available in "Login Name → Configure OpenNMS → Manage Business Services" in the Service Monitoring section.

Managing Business Services with the Business Service Editor

01 bsm editor

1 Create a new Business Service definition
2 Collapse tree view for all Business Services in the view
3 Expand tree view for all Business Services in the view
4 Reload all Business Services in the view with current Business Services from the system
5 Reload the Business Service Monitoring Daemon to use the Business Service definition as configured
6 Business Service dependency hierarchy as tree view
7 Show the current Business Service with dependencies in the Topology UI
8 Edit and delete existing Business Service defintions

As shown in figure Managing Business Services with the Business Service Editor the Business Services can be created or changed. The hierarchy is created by assigning an existing Business Service as Child Service.

10.8. Edges

Edges map the Alarm status monitoring with OpenNMS

The following types can be used:

  • Child Service: A reference to an existing Business Service on which to depend

  • IP Service: A convenient way to refer to the alarms that can be generated by a monitored IP Service. This will automatically provided edges for the nodeLostService, interfaceDown and nodeDown reductions keys of the specified service.

  • Reduction Key: A resolved Reduction Key used to refer to a specific Alarm, e.g. generated by a SNMP Trap or Threshold violation

If you need help determining the reduction key used by alarm, trigger the alarm in question and pull the reduction key from the Alarm details page.

All edge types have the following parameters:

  • Map Function: The associated Map Function for this Edge

  • Weight: The relative Weight of this edge. Used by certain Reduce Functions.

Both IP Service and Reduction Key type edges also support a Friendly Name parameter which gives the user control on how the edge is labeled in the Topology User Interface. The editor changing the Edge attributes is shown in figure Editor to add Business Service Edges.

Editor to add Business Service Edges

Business Service Edge edit

10.8.1. Child Services

To create a hierarchy of Business Services they need to be created first. The hierarchy is build by selecting the Business Service as_Child Service_ as dependency.

10.8.2. IP Services

The IP Service is a predefined set of Reduction Keys which allows easily to assign a specific Monitored Service to the given BS. As an example you have multiple Servers with a Monitored Service SMTP and you want to model a BS named Mail Communication. If just the Reduction Key for a nodeLostService is assgined, the BS would not be affected in case the IP Interface or the whole Node goes down. OpenNMS generates Alarms with different UEI which needs to be assigned to the BS as well. To make it easier to model this use case the IP Service generates the following Reduction Keys automatically:

  • uei.opennms.org/nodes/nodeLostService:%nodeId%:%ipAddress%:%serviceName%: Matches Alarms when the given Monitored Service goes down

  • uei.opennms.org/nodes/interfaceDown:%nodeId%:%ipAddress%: Matches Alarms when the given IP Interface of the Monitored Service goes down

  • uei.opennms.org/nodes/nodeDown:%nodeId%: Matches Alarms when the given Node of the Monitored Service goes down

10.8.3. Custom Reduction Key

The Reduction Key edge is used to refer to specific instance of alarms. When an alarm with the given Reduction Key is present, the alarms' severity will be used to calculate the Operational Status of the BS. To give a better explanation a Friendly Name can be set and is used in the Business Service View. The format of the Reduction Key is build by a set of attributes as a key separated by : and enclosed in %, i.e (%attribute%:%attribute%).

Example of a Reduction Key for a specific nodeLostService
%uei.opennms.org/nodes/nodeLostService%:%nodeId%:%ipAddress%:%serviceName%

10.9. Map Functions

The Map Functions define how the Severity of the edge will be used in the Reduce Function of the parent when calculating the Operational Status.

The available Map Functions are:

Table 15. Calculation of the Operational Status with Map Functions
Name Description

Identity

Use the same Severity as Operational Status of the BS

Increase

Increase the Severity by one level and use it as Operational Status of the BS

Decrease

Decrease the Severity by one level and use it as Operational Status of the BS

SetTo

Set the Operational Status to a constant Severity value

Ignore

The input of the Edge is ignored for Operational Status calculation

10.10. Reduce Functions

A Reduce Function is used to aggregate the Operational Status for the BS. The Alarm Severity from the Edges are used as input for the Reduce Function. For this operation the following Reduce Functions are available:

Table 16. Status calculation Reduce Functions
Name Description

Highest Severity

Uses the value of the highest severity, Weight is ignored.

Threshold

Uses the highest severity found more often than the given threshold, e.g. 0.26 can also be seen as 26%, which means at least 2 of 4 Alarms need to be raised to change the BS.

Highest Severity Above

Uses the highest severity greater than the given threshold severity.

Exponential Propagation

This reduce function computes the sum of the given child severities based on a base number. For this computation the severities are mapped to numbers:

\$WARNING=0, MINOR=1, MAJOR=2, CRITICAL=3\$

All other severities are ignored.

For the aggregation the following formula will be used to compute the resulting Business Service severity from its n child entities based on the base number b:

\$severity = |__log_{b}( sum_(i=1)^n b^(ch\ildSeverity_{i}) )__|\$

In summary the base value defines how many items of a severity x will result in a severity x+1. Results lower as 0 are treated as NORMAL and results higher than 3 are treated as CRITICAL. If all input values are of severity INDETERMINATE, the result is INDETERMINATE.

For example if the Business Service depends on four child entities with the severities WARNING, WARNING, NORMAL and NORMAL and the base defined by the number 2 the following computation will be made:

\$severity = |__log_{2}( 2^{0} + 2^{0} + 0 + 0 )__| = |__log_{2}( 1 + 1 + 0 + 0)__| = |__log_{2}( 2 )__| = |__1__| = 1\$

which corresponds to the severity MINOR. The same computation with the base value of 3 results in:

\$severity = |__log_{3}( 3^{0} + 3^{0} + 0 + 0 )__| = |__log_{3}( 1 + 1 + 0 + 0)__| = |__log_{3}( 2 )__| = |__0.63__| = 0\$

which means WARNING.

The following table shows the status calculation with Edges assigned to an IP Service. The IP-Service is driven by the monitoring of the ICMP service for three Web Server. In the table below you find a configuration where Web Server 3 is weighted 3 times higher than the other and a threshold of 0.33 (33%) is configured.

Table 17. Example for status calculation using the Threshold function
Name Weight Weight Factor Input Severity Operational Status Critical Major Minor Warning Normal

Web-ICMP-1

1

0.2

Critical

Critical

0.2

0.2

0.2

0.2

0.2

Web-ICMP-2

1

0.2

Normal

Normal

0

0

0

0

0.2

Web-ICMP-3

3

0.6

Warning

Warning

0

0

0

0.6

0.6

Total

1.0

0.2

0.2

0.2

0.8

1

Percentage

100%

20%

20%

20%

80%

100%

The Operational Status Severity is evaluated from left to right, the first value higher then the configured Threshold is used. In this case the Operational Status is set to Warning because the first threshold which exceeds 33% is Warning with 80%.

10.11. Business Service Daemon

The calculation of the Operational Status of the BS is driven by the Business Service Monitoring Daemon (bsmd). The daemon is responsible for tracking the operational status of all BS and for sending events in case of operational status changes.

In order to calculate the Operational Status the reduction key associated with a Business Service is used. The reduction key is obtained from an alarm generated by OpenNMS Horizon. This means that the alarm’s reduction key of a defined Business Service must not change afterwards. Otherwise bsmd is not able to calculate the Operational Status correctly. This also applies for removing the alarm data from events associated to Business Services In addition the child type "IP Service" from the Business Service Configuration Page requires the following events with the default reduction keys being defined: * uei.opennms.org/nodes/nodeLostService * uei.opennms.org/nodes/nodeDown * uei.opennms.org/nodes/interfaceDown

Every time the configuration of a Business Service is changed a reload of the daemon’s configuration is required. This includes changes like the name of the Business Service or its attributes as well as changes regarding the Reduction Keys, contained Business Services or IP Services. The bsmd configuration can be reloaded with the following mechanisms:

  • Click the Reload Daemon button in the Business Service Editor

  • Send the reloadDaemonConfig event using send-event.pl or use the WebUI in Manually Send an Event with parameter daemonName bsmd

  • Use the ReST API to perform a POST request to /opennms/api/v2/business-services/daemon/reload

If the reload of the configuration is done an event of type uei.opennms.org/internal/reloadDaemonConfigSuccessful is fired.

Example reloading bsmd configuration from CLI
$OPENNMS_HOME/bin/send-event.pl -p 'daemonName bsmd' uei.opennms.org/internal/reloadDaemonConfig
Example reloading bsmd configuration through ReST POST
curl -X POST -u admin:admin -v http://localhost:8980/opennms/api/v2/business-services/daemon/reload

11. Topology Map

This section describes how to configure the Topology Map.

11.1. Properties

The Topology Map supports the following properties, which can be influenced by changing the file etc/org.opennms.features.topology.app.cfg:

Property Type Default Description

showHeader

Boolean

true

Defines if the OpenNMS Horizon header is shown.

autoRefresh.enabled

Boolean

false

If enabled, auto refresh is enabled by default.

autoRefresh.interval

Integer

60

Defines the auto refresh interval in seconds.

hiddenCategoryPrefix

String

empty String

A String which allows hiding categories. For example a value of server will hide all categories starting with server. Be aware, that this setting is case-sensistive, so Servers will be shown. The resolution is only enabled if no longitude/latitude information is available.

11.2. Icons

Each Vertex on the Topology Map is represented by an icon. The default icon is configured in the icon mapping file: ${OPENNMS_HOME}/etc/org.opennms.features.topology.app.icons.<topology-namespace>.cfg. If an icon mapping file does not exist for a Topology Provider, the provider does not support customization.

List of available icon mapping files (may not be complete)
org.opennms.features.topology.app.icons.default.cfg (1)
org.opennms.features.topology.app.icons.application.cfg (2)
org.opennms.features.topology.app.icons.bsm.cfg (3)
org.opennms.features.topology.app.icons.linkd.cfg (4)
org.opennms.features.topology.app.icons.vmware.cfg (5)
1 Default icon mapping
2 Icon mapping for the Application Topology Provider
3 Icon mapping for the Business Services Topology Provider
4 Icon mapping for the Linkd Topology Provider
5 Icon mapping for the Vmware Topology Provider

Each File contains a mapping in form of <icon key> = <icon id>.

Icon key

A Topology Provider dependent string which maps to an icon id. An icon key consists of one to multiple segments. Each segment must contain only numbers or characters. If multiple segments exist they must be separated by ., e.g. my.custom.key. Any existing default icon keys are not configurable and should not be changed.

Icon id

The icon id is a unique icon identifier to reference an icon within one of the available SVG icons located in ${OPENNMS_HOME}/jetty-webapps/opennms/svg. For more details see Add new icons.

Icon key and icon id specification using BNF
icon key ::= segment["."segment]*
segment ::= text+ [( "-" | "_" | ":" ) text]*
text ::== (char | number)+
char ::== A | B | ... | Z | a | b | ... | z
number ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
icon id ::= segment
Example icon mapping file
# Business Service Topology
bsm.business-service = business_service (1)
bsm.ip-service = IP_service (2)
bsm.reduction-key = reduction_key (3)
1 Icon definition for Business Services
2 Icon definition for IP Services
3 Icon definition for Reduction Keys

11.2.1. Icon resolution

The icon of a vertex is resolved as follows:

  • If a vertex id to icon id mapping is defined, the icon referenced by the icon id is used

  • If a mapping for the icon key determined by the Topology Provider for the vertex is defined, the icon referenced by the icon id is used

    • If no mapping exists and the icon key has more than one segments, reduce the icon key by the last segment and try resolving that icon key

  • If no mapping is defined, the fallback icon key default is used.

The following example icon mapping is defined for the Linkd Topology Provider to illustrate this behaviour.

linkd.system.snmp.1.3.6.1.4.1.9.1.485 = server1
linkd.system.snmp.1.3.6 = server2

If the Enterprise OID of a node is 1.3.6.1.4.1.9.1.485 the icon with id server1 is used. If the Enterprise OID of a node is 1.3.6 the icon with id server2 is used. However, if the Enterprise OID of a node is 1.3.6.1.4.1.9.1.13 the icon with id server2 is used.

Linkd Topology Provider

The Linkd Topology Provider uses the Enterprise OID from each node to determine the icon of a vertex.

11.2.2. Change existing icon mappings

The easiest way to change an icon representation of an existing Vertex is to use the Icon Selection Dialog from the Vertex' context menu in the Topology Map. This will create a custom icon key to icon id mapping in the Topology Provider specific icon mapping file. As icon key the Vertex id is used. This allows each Vertex to have it’s own icon.

If a more generic approach is preferred the icon mapping file can be modified manually.

Do NOT remove the default mappings and do NOT change the icon keys in the default mappings.

11.2.3. Add new icons

All available icons are stored in SVG files located in ${OPENNMS_HOME}/jetty-webapps/opennms/svg. To add new icons, either add definitions to an existing SVG file or create a new SVG file in that directory.

Whatever way new icons are added to OpenNMS it is important that each new icon id describes a set of icons, rather than a single icon. The following example illustrates this.

Example SVG file with a custom icon with id my-custom
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg id="icons" xmlns="http://www.w3.org/2000/svg">
  <g id="my-custom_icon"> (1)
      <g id="my-custom_active"> (2)
          <!-- rect, path, circle, etc elements, supported by SVG -->
      </g>
      <g id="my-custom_rollover"> (3)
          <!-- rect, path, circle, etc elements, supported by SVG -->
      </g>
      <g id="my-custom"> (4)
          <!-- rect, path, circle, etc elements, supported by SVG -->
      </g>
  </g>
  <!-- Additional groups ... -->
</svg>
1 Each icon must be in a SVG group with the id <icon id>_icon. Each SVG <icon id>_icon group must contain three sub groups with the ids: <icon id>_active, <icon id>_rollover and <icon id>.
2 The icon to use when the Vertex is selected.
3 The icon to use when the Vertex is moused over.
4 The icon to use when the Vertex is not selected or not moused over (just visible).
It is important that each icon id is unique overall SVG files. This means there cannot be another my-custom icon id in any other SVG file.

If the new icons should be selectable from the Topology Map’s Icon Selection Dialog an entry with the new icon id must be added to the file ${OPENNMS_HOME}/etc/org.opennms.features.topology.app.icons.properties.

Snippet of org.opennms.features.topology.app.icons.list
access_gateway (1)
accesspoint
cloud
fileserver
linux_file_server
opennms_server
printer
router
workgroup_switch
my-custom (2)
1 Already existing icon ids
2 New icon id
The order of the entries in org.opennms.features.topology.app.icons.list determine the order in the Icon Selection Dialog in the Topology Map.

12. Database Reports

Reporting on information from the OpenNMS Horizon monitoring system is important for strategical or operational decisions. Database Reports give access to the embedded JasperReports engine and allows to create and customize report templates. These reports can be executed on demand or on a pre-defined schedule within OpenNMS Horizon.

Originally Database Reports were introduced to create reports working on data stored in the OpenNMS Horizon database only. This is no longer mandatory, also performance data can be used. Theoretically the reports do not necessarily need to be OpenNMS Horizon related.
The OpenNMS Horizon Report Engine allows the creation of various kinds of reports and also supports distributed report repositories. At the moment these features are not covered by this documentation. Only reports using JasperReports are described here.

12.1. Overview

The OpenNMS Horizon Report Engine uses the JasperReport library to create reports in various output formats. Each report template must be a *.jrxml file. The OpenNMS Horizon Report Engine passes a JDBC Connection to the OpenNMS Horizon Database to each report on execution.

Table 18. feature overview

Supported Output Formats

PDF, CSV

JasperReport Version

6.3.0

For more details on how JasperReports works, please have a look at the official documentation of Jaspersoft Studio.

12.2. Modify existing reports

All default reports of OpenNMS Horizon are located in $OPENNMS_HOME/etc/report-templates. Each .jrxml file located there can be modified and the changes are applied the next time a report is created by OpenNMS Horizon.

When a subreport has been modified OpenNMS Horizon will detect a change based on the report’s lastModified time and will recompile the report. A compiled version of the report is represented by a .jasper file with the same name as the .jrxml file. Subreports are located in $OPENNMS_HOME/etc/report-templates/subreports.

If unsure, simply delete all .jasper files and OpenNMS Horizon will automatically compile the subreports if needed.

12.3. Add a custom report

To add a new JasperReport report to the Local OpenNMS Horizon Report Repository, the following steps are required.

At first a new entry in the file $OPENNMS_HOME/etc/database-reports.xml must be created.

<report
  id="MyReport" (1)
  display-name="My Report" (2)
  online="true" (3)
  report-service="jasperReportService" (4)
  description="This is an example description. It shows up in the web ui when creating an online report"  (5)
/>
1 A unique identifier.
2 The name of the report. Is shown when using the web ui.
3 Defines if this report can be executed on demand, otherwise only scheduling is possible.
4 The report service implementation to use. In most cases this is jasperReportService.
5 A description of the report. Is shown when using the web ui.

In addition a new entry in the file $OPENNMS_HOME/etc/jasper-reports.xml must be created.

<report
  id="MyReport" (1)
  template="My-Report.jrxml" (2)
  engine="jdbc" (3)
/>
1 The identifier defined in the previous step. This identifier must exist in $OPENNMS_HOME/etc/database-reports.xml.
2 The name of the template. The template must be located in $OPENNMS_HOME/etc/report-templates.
3 The engine to use. It is either jdbc or null.

12.4. Usage of Jaspersoft Studio

When developing new reports it is recommended to use the Jaspersoft Studio application. It can be downloaded here.

We recommend always to use the same Jaspersoft Studio version as the JasperReport library OpenNMS Horizon uses. Currently OpenNMS Horizon uses version 6.3.0.

12.4.1. Connect to the OpenNMS Horizon Database

In order to actually create SQL statements against the OpenNMS Horizon database a database Data Adapter must be created. The official Jaspersoft Studio documentation and wiki covers this aspect.

12.4.2. Use Measurements Datasource and Helpers

To use the Measurements API it is required to add the Measurements Datasource library to the build path of JasperStudio. This is achieved with right click in the Project Explorer and select Configure Buildpath.

1 configure build path 1
  1. Switch to the Libraries tab.

  2. Click Add External JARs and select the opennms-jasperstudio-extension-23.0.0-SNAPSHOT-jar-with-dependencies.jar file located in $OPENNMS_HOME/contrib/jasperstudio-extension.

  3. Close the file selection dialog.

2 configure build path 2
  1. Close the dialog.

  2. The Measurements Datasource and Helpers should now be available.

  3. Go to the Dataset and Query Dialog in Jaspersoft Studio and select a language called measurement.

3 dataset query dialog
Even if there is no Read Fields functionality available, the Data preview can be used. It is required the the access to the Measurements API is possible using the connection parameters MEASUREMENT_URL, MEASUREMENT_USERNAME and MEASUREMENT_PASSWORD. The Supported Fields section gives more details. In addition you have

12.5. Accessing Performance Data

Before OpenNMS Horizon 17 and OpenNMS Meridian 2016 it was possible to access the performance data stored in .rrd or .jrobin files directly by using the jrobin language extension provided by the RrdDataSource. This is no longer possible and the Measurements Datasource has to be used.

To access performance data within reports we created a custom Measurement Datasource which allows to query the Measurements API and process the returned data in your reports. Please refer to the official Measurements API documentation on how to use the _Measurements API_.

When using the Measurements Datasource within a report a HTTP connection to the Measurements API is only established if the report is NOT running within OpenNMS Horizon, e.g. when used with Jaspersoft Studio.

To receive data from the Measurements API simply create a query as follows:

Sample queryString to receive data from the Measurements API
<query-request step="300000" start="$P{startDateTime}" end="$P{endDateTime}" maxrows="2000"> (1)
  <source aggregation="AVERAGE" label="IfInOctets" attribute="ifHCInOctets" transient="false" resourceId="node[$P{nodeId}].interfaceSnmp[$P{interface}]"/>
  <source aggregation="AVERAGE" label="IfOutOctets" attribute="ifHCOutOctets" transient="false" resourceId="node[$P{nodeid}].interfaceSnmp[$P{interface}]"/>
</query-request>
1 The query language. In our case measurement, but JasperReports supports a lot out of the box, such as sql, xpath, etc.

12.5.1. Fields

Each datasource should return a number of fields, which then can be used in the report. The Measurement Datasource supports the following fields:

Field name Field type Field description

<label>

java.lang.Double

Each Source defined as transient=false can be used as a field. The name of the field is the label, e.g. IfInOctets

timestamp

 java.util.Date

The timestamp of the sample.

step

java.lang.Long

The Step size of the Response. Returns the same value for all rows.

start

java.lang.Long

The Start timestamp in milliseconds of the Resopnse. Returns the same value for all rows.

end

java.lang.Long

The End timestamp in milliseconds of the Response. Returns the same value for all rows.

For more details about the Response, please refer to the official Measurement API documentation.

12.5.2. Parameters

In addition to the queryString the following JasperReports parameters are supported.

Parameter name Required Description

MEASUREMENT_URL

yes

The URL of the Measurements API, e.g. http://localhost:8980/opennms/rest/measurements

MEASUREMENT_USERNAME

no

If authentication is required, specify the username, e.g. admin

MEASUREMENT_PASSWORD

no

If authentication is required, specify the password, e.g. admin

12.6. Helper methods

There are a couple of helper methods to help creating reports in OpenNMS Horizon.

These helpers come along with the Measurement Datasource.

Table 19. supported helper methods
Helper class Helper Method Description

org.opennms.netmgt.jasper.helper.MeasurementsHelper

getNodeOrNodeSourceDescriptor(nodeId, foreignSource, foreignId)

Generates a node source descriptor according to the input paramters. Either node[nodeId] or nodeSource[foreignSource:foreignId] is returned. nodeSource[foreignSource:foreignId] is only returned if foreignSource and foreignId is not empty and not null. Otherwise always node[nodeId] is returned.

nodeId : String, the id of the node
foreignSource: String, the foreign source of the node, may be null
foreignId: String, the foreign id of the node, may be null.

For more details checkout Usage of the node source descriptor.

org.opennms.netmgt.jasper.helper.MeasurementsHelper

getInterfaceDescriptor(snmpifname, snmpifdescr, snmphysaddr)

Returns the interface descriptor of a given interface, e.g. en0-005e607e9e00. The input paramaters are prioritized. If a snmpifdescr is specified, it is used instead of the snmpifname. It a snmpifdescr is defined, it will be appended to snmpifname/snmpifdescr.

snmpifname: String, the interface name of the interface, e.g. en0. May be null.
snmpifdescr: String, the description of the interface, e.g. en0. May be null.
snmphyaddr: String, the mac address of the interface, e.g. 005e607e9e00. May be null.
As each input parameter may be null, not all of them can be null at the same time. At least one input parameter has to be defined.

For more details checkout Usage of the interface descriptor.

12.6.1. Usage of the node source descriptor

A node is addressed by a node source descriptor. The node source descriptor references the node either via the foreign source and foreign id or by the node id.

If store by foreign source is enabled only addressing the node via foreign source and foreign id is possible.

In order to make report creation easier, there is a helper method to create the node source descriptor.

For more information about store by foreign source, please have a look at our Wiki.

The following example shows the usage of that helper.

jrxml report snippet to visualize the use of the node source descriptor.
<parameter name="nodeResourceDescriptor" class="java.lang.String" isForPrompting="false">
  <defaultValueExpression><![CDATA[org.opennms.netmgt.jasper.helper.MeasurementsHelper.getNodeOrNodeSourceDescriptor(String.valueOf($P{nodeid}), $P{foreignsource}, $P{foreignid})]]></defaultValueExpression>
</parameter>
<queryString language="Measurement">
  <![CDATA[<query-request step="300000" start="$P{startDateTime}" end="$P{endDateTime}" maxrows="2000">
<source aggregation="AVERAGE" label="IfInOctets" attribute="ifHCInOctets" transient="false" resourceId="$P{nodeResourceDescriptor}.interfaceSnmp[en0-005e607e9e00]"/>
<source aggregation="AVERAGE" label="IfOutOctets" attribute="ifHCOutOctets" transient="false" resourceId="$P{nodeResourceDescriptor}.interfaceSnmp[en0-005e607e9e00]"/>
</query-request>]]>

Depending on the input parameters you either get a node resource descriptor or a foreign source/foreign id resource descriptor.

12.6.2. Usage of the interface descriptor

An interfaceSnmp is addressed with the exact interface descriptor. To allow easy access to the interface descriptor a helper tool is provided. The following example shows the usage of that helper.

jrxml report snippet to visualize the use of the interface descriptor
<parameter name="interface" class="java.lang.String" isForPrompting="false">
  <parameterDescription><![CDATA[]]></parameterDescription>
  <defaultValueExpression><![CDATA[org.opennms.netmgt.jasper.helper.MeasurementsHelper.getInterfaceDescriptor($P{snmpifname}, $P{snmpifdescr}, $P{snmpphysaddr})]]></defaultValueExpression>
</parameter>
<queryString language="Measurement">
  <![CDATA[<query-request step="300000" start="$P{startDateTime}" end="$P{endDateTime}" maxrows="2000">
<source aggregation="AVERAGE" label="IfInOctets" attribute="ifHCInOctets" transient="false" resourceId="node[$P{nodeId}].interfaceSnmp[$P{interface}]"/>
<source aggregation="AVERAGE" label="IfOutOctets" attribute="ifHCOutOctets" transient="false" resourceId="node[$P{nodeId}].interfaceSnmp[$P{interface}]"/>
</query-request>]]>

To get the appropriate interface descriptor depends on the input parameter.

12.6.3. Use HTTPS

To establish a secure connection to the Measurements API the public certificate of the running OpenNMS Horizon must be imported to the Java Trust Store. In Addition OpenNMS Horizon must be configured to use that Java Trust Store. Please follow the instructions in this chapter to setup the Java Trust Store correctly.

In addition please also set the property org.opennms.netmgt.jasper.measurement.ssl.enable in $OPENNMS_HOME\etc\opennms.properties to true to ensure that only secure connections are established.

If org.opennms.netmgt.jasper.measurement.ssl.enable is set to false an accidentally insecure connection can be established to the Measurements API location. A SSL secured connection can be established even if org.opennms.netmgt.jasper.measurement.ssl.enable is set to false.

12.7. Limitations

  • Only a JDBC Datasource to the OpenNMS Horizon Database connection can be passed to a report, or no datasource at all. One does not have to use the datasource, though.

13. Enhanced Linkd

Enhanced Linkd (Enlinkd) has been designed to discover connections between nodes using data generated by various link discovery protocols and accessible via SNMP. Enlinkd gathers this data on a regular interval and creates a snapshot of a device’s neighbors from its perspective. The connections discovered by Enlinkd are called Links. The term Link, within the context of Enlinkd, is not synonymous with the term "link" when used with respect to the network OSI Layer 2 domain, whereby a link only indicates a Layer 2 connection. A Link in context of Enlinkd is a more abstract concept and is used to describe any connection between two OpenNMS Horizon Nodes. These Links are discovered based on information provided by an agent’s understanding of connections at the OSI Layer 2, Layer 3, or other OSI layers.

The following sections describe the Enlinkd daemon and its configuration. Additionally, the supported Link discovery implementations will be described as well as a list of the SNMP MIBs that the SNMP agents must expose in order for EnLinkd to gather Links between Nodes. FYI: Detailed information about a node’s connections (discovered Links) and supporting link data can be seen on the Node detail page within the OpenNMS Horizon Web-UI.

13.1. Enlinkd Daemon

Essentially Enlinkd asks each device the following question: "What is the network topology from your point of view". From this point of view this will only provide local topology discovery features. It does not attempt to discover global topology or to do any correlation with the data coming from other nodes.

For large environments the behavior of Enlinkd can be configured. During the Link discovery process informational and error output is logged to a global log file.

Table 20. Global log and configuration files for Enlinkd
File Location Description

enlinkd-configuration.xml

$OPENNMS_HOME/etc

Global configuration for the daemon process

enlinkd.log

$OPENNMS_HOME/logs

Global Enlinkd log file

log4j2.xml

$OPENNMS_HOME/etc

Configuration file to set the log level for Enlinkd

Configuration file for Enlinkd
<?xml version="1.0" encoding="ISO-8859-1"?>
<enlinkd-configuration threads="5"
                     initial_sleep_time="60000"
                     rescan_interval="86400000"
                     use-cdp-discovery="true"
                     use-bridge-discovery="true"
                     use-lldp-discovery="true"
                     use-ospf-discovery="true"
                     use-isis-discovery="true"
                     />
Table 21. Descriptione for global configuration parameter
Attribute Type Default Description

threads

Integer

5

Number of parallel threads used to discover the topology.

initial_sleep_time

Integer

60000

Time in milliseconds to wait for discovering the topology after OpenNMS Horizon is started.

rescan_interval

Integer

86400000

Interval to rediscover and update the topology in milliseconds.

use-cdp-discovery

Boolean

true

Enable or disable topology discovery based on CDP information.

use-bridge-discovery

Boolean

true

Enable or disable algorithm to discover the topology based on the Bridge MIB information.

use-lldp-discovery

Boolean

true

Enable or disable topology discovery based on LLDP information.

use-ospf-discovery

Boolean

true

Enable or disable topology discovery based on OSPF information.

use-isis-discovery

Boolean

true

Enable or disable topology discovery based on IS-IS information.

If multiple protocols are enabled, the links will be discovered for each enabled discovery protocol. The topology WebUI will visualize Links for each discovery protocol. For example if you start CDP and LLDP discovery, the WebUI will visualize a CDP Link and an LLDP Link.

Enlinkd is able to discover Layer 2 network links based on the following protocols:

This information are provided by SNMP Agents with appropriate MIB support. For this reason it is required to have a working SNMP configuration running. The following section describes the required SNMP MIB provided by the SNMP agent to allow the Link Discovery.

13.2.1. LLDP Discovery

The Link Layer Discovery Protocol (LLDP) is a vendor-neutral link layer protocol. It is used by network devices for advertising their identity, capabilities, and neighbors. LLDP performs functions similar to several proprietary protocols, such as the Cisco Discovery Protocol (CDP), Extreme Discovery Protocol, Foundry Discovery Protocol (FDP), Nortel Discovery Protocol (also known as SONMP), and Microsoft’s Link Layer Topology Discovery (LLTD)[1].

Only nodes with a running LLDP process can be part of the link discovery. The data is similar to running a show lldp neighbor command on the device. Linux and Windows servers don’t have an LLDP process running by default and will not be part of the link discovery.

The following OIDs are supported to discover and build the LLDP network topology.

Table 22. Supported OIDs from LLDP-MIB
Name OID Description

lldpLocChassisIdSubtype

.1.0.8802.1.1.2.1.3.1.0

The type of encoding used to identify the chassis associated with the local system. Possible values can be:
chassisComponent(1)
interfaceAlias(2)
portComponent(3)
macAddress(4)
networkAddress(5)
interfaceName(6)
local(7)

lldpLocChassisId

.1.0.8802.1.1.2.1.3.2.0

The string value used to identify the chassis component associated with the local system.

lldpLocSysName

.1.0.8802.1.1.2.1.3.3.0

The string value used to identify the system name of the local system. If the local agent supports IETF RFC 3418, lldpLocSysName object should have the same value of sysName object.

lldpLocPortIdSubtype

.1.0.8802.1.1.2.1.3.7.1.2

The type of port identifier encoding used in the associated lldpLocPortId object.

lldpLocPortId

.1.0.8802.1.1.2.1.3.7.1.3

The string value used to identify the port component associated with a given port in the local system.

lldpLocPortDesc

.1.0.8802.1.1.2.1.3.7.1.4

The string value used to identify the 802 LAN station’s port description associated with the local system. If the local agent supports IETF RFC 2863, lldpLocPortDesc object should have the same value of ifDescr object.

lldpRemChassisIdSubtype

.1.0.8802.1.1.2.1.4.1.1.4

The type of encoding used to identify the chassis associated with the local system. Possible values can be:
chassisComponent(1)
interfaceAlias(2)
portComponent(3)
macAddress(4)
networkAddress(5)
interfaceName(6)
local(7)

lldpRemChassisId

.1.0.8802.1.1.2.1.4.1.1.5

The string value used to identify the chassis component associated with the remote system.

lldpRemPortIdSubtype

.1.0.8802.1.1.2.1.4.1.1.6

The type of port identifier encoding used in the associated lldpRemPortId object.

interfaceAlias(1)
the octet string identifies a particular instance of the ifAlias object (defined in IETF RFC 2863). If the particular ifAlias object does not contain any values, another port identifier type should be used.

portComponent(2)
the octet string identifies a particular instance of the entPhysicalAlias object (defined in IETF RFC 2737) for a port or backplane component.

macAddress(3)
this string identifies a particular unicast source address (encoded in network byte order and IEEE 802.3 canonical bit order) associated with the port (IEEE Std 802-2001).

networkAddress(4)
this string identifies a network address associated with the port. The first octet contains the IANA AddressFamilyNumbers enumeration value for the specific address type, and octets 2 through N contain the networkAddress address value in network byte order.

interfaceName(5)
the octet string identifies a particular instance of the ifName object (defined in IETF RFC 2863). If the particular ifName object does not contain any values, another port identifier type should be used.

agentCircuitId(6)
this string identifies a agent-local identifier of the circuit (defined in RFC 3046)

local(7)
this string identifies a locally assigned port ID.

lldpRemPortId

.1.0.8802.1.1.2.1.4.1.1.7

The string value used to identify the port component associated with the remote system.

lldpRemPortDesc

.1.0.8802.1.1.2.1.4.1.1.8

The string value used to identify the description of the given port associated with the remote system.

lldpRemSysName

.1.0.8802.1.1.2.1.4.1.1.9

The string value used to identify the system name of the remote system.

Generic information about the LLDP process can be found in the LLDP Information box on the Node Detail Page of the device. Information gathered from these OIDs will be stored in the following database table:

lldp database
Figure 31. Database tables related to LLDP discovery

13.2.2. CDP Discovery

The Cisco Discovery Protocol (CDP) is a proprietary link layer protocol from Cisco. It is used by network devices to advertise identity, capabilities and neighbors. CDP performs functions similar to several proprietary protocols, such as the Link Layer Discovery Protocol (LLDP), Extreme Discovery Protocol, Foundry Discovery Protocol (FDP), Nortel Discovery Protocol (also known as SONMP), and Microsoft’s Link Layer Topology Discovery (LLTD). The CDP discovery uses information provided by the CISCO-CDP-MIB and CISCO-VTP-MIB.

Only nodes with a running CDP process can be part of the link discovery. The data is similar to running a show cdp neighbor command on the IOS CLI of the device. Linux and Windows servers don’t have a CDP process running by default and will not be part of the link discovery.

The following OIDs are supported to discover and build the CDP network topology.

Table 23. Supported OIDS from the IF-MIB
Name OID Description

ifDescr

.1.3.6.1.2.1.2.2.1.2

A textual string containing information about the interface. This string should include the name of the manufacturer, the product name and the version of the interface hardware/software.

Table 24. Supported OIDS from the CISCO-CDP-MIB to discover links
Name OID Description

cdpInterfaceName

.1.3.6.1.4.1.9.9.23.1.1.1.1.6

The name of the local interface as advertised by CDP in the Port-ID TLV.

cdpCacheEntry

.1.3.6.1.4.1.9.9.23.1.2.1.1

An entry (conceptual row) in the cdpCacheTable, containing the information received via CDP on one interface from one device. Entries appear when a CDP advertisement is received from a neighbor device. Entries disappear when CDP is disabled on the interface, or globally.

cdpCacheAddressType

.1.3.6.1.4.1.9.9.23.1.2.1.1.3

An indication of the type of address contained in the corresponding instance of cdpCacheAddress.

cdpCacheAddress

.1.3.6.1.4.1.9.9.23.1.2.1.1.4

The (first) network-layer address of the device’s SNMP-agent as reported in the Address TLV of the most recently received CDP message. For example, if the corresponding instance of cacheAddressType had the value ip(1), then this object would be an IP-address.

cdpCacheVersion

.1.3.6.1.4.1.9.9.23.1.2.1.1.5

The Version string as reported in the most recent CDP message. The zero-length string indicates no Version field (TLV) was reported in the most recent CDP message.

cdpCacheDeviceId

.1.3.6.1.4.1.9.9.23.1.2.1.1.6

The Device-ID string as reported in the most recent CDP message. The zero-length string indicates no Device-ID field (TLV) was reported in the most recent CDP message.

cdpCacheDevicePort

.1.3.6.1.4.1.9.9.23.1.2.1.1.7

The Port-ID string as reported in the most recent CDP message. This will typically be the value of the ifName object (e.g., Ethernet0). The zero-length string indicates no Port-ID field (TLV) was reported in the most recent CDP message.

cdpCachePlatform  

.1.3.6.1.4.1.9.9.23.1.2.1.1.8

The Device’s Hardware Platform as reported in the most recent CDP message. The zero-length string indicates that no Platform field (TLV) was reported in the most recent CDP message.

cdpGlobalRun

.1.3.6.1.4.1.9.9.23.1.3.1.0

An indication of whether the Cisco Discovery Protocol is currently running. Entries in cdpCacheTable are deleted when CDP is disabled.

cdpGlobalDeviceId

.1.3.6.1.4.1.9.9.23.1.3.4.0

The device ID advertised by this device. The format of this device id is characterized by the value of cdpGlobalDeviceIdFormat object.

cdpGlobalDeviceIdFormat

.1.3.6.1.4.1.9.9.23.1.3.7.0

An indication of the format of Device-Id contained in the corresponding instance of cdpGlobalDeviceId. User can only specify the formats that the device is capable of as denoted in cdpGlobalDeviceIdFormatCpb object.
serialNumber(1): indicates that the value of cdpGlobalDeviceId object is in the form of an ASCII string contain the device serial number.
macAddress(2): indicates that the value of cdpGlobalDeviceId object is in the form of Layer 2 MAC address.
other(3): indicates that the value of cdpGlobalDeviceId object is in the form of a platform specific ASCII string contain info that identifies the device. For example: ASCII string contains serialNumber appended/prepened with system name.

Table 25. Supported OIDS from the CISCO-VTP-MIB.
vtpVersion .1.3.6.1.4.1.9.9.46.1.1.1.0 The version of VTP in use on the local system. A device will report its version capability and not any particular version in use on the device. If the device does not support VTP, the version is none(3).

ciscoVtpVlanState

.1.3.6.1.4.1.9.9.46.1.3.1.1.2

The state of this VLAN. The state mtuTooBigForDevice indicates that this device cannot participate in this VLAN because the VLAN’s MTU is larger than the device can support.
The state mtuTooBigForTrunk indicates that while this VLAN’s MTU is supported by this device, it is too large for one or more of the device’s trunk ports.
operational(1), suspended(2), mtuTooBigForDevice(3), mtuTooBigForTrunk(4)

ciscoVtpVlanType

.1.3.6.1.4.1.9.9.46.1.3.1.1.3

The type of this VLAN.
ethernet(1), fddi(2), tokenRing(3), fddiNet(4), trNet(5), deprecated(6)

ciscoVtpVlanName

.1.3.6.1.4.1.9.9.46.1.3.1.1.4

The name of this VLAN. This name is used as the ELAN-name for an ATM LAN-Emulation segment of this VLAN.

Generic information about the CDP process can be found in the CDP Information box on the Node Detail Page of the device. Information gathered from these OIDs will be stored in the following database table:

cdp database
Figure 32. Database tables related to CDP discovery

13.2.3. Transparent Bridge Discovery

Discovering Layer 2 network links using the Bridge Forwarding table requires a special algorithm. To discover Links an algorithm based on a scientific paper with the title Topology Discovery for Large Ethernet Networks is implemented. The gathered information is used to classify Links in macLink and bridgeLink. A macLink represents a Link between a workstation or server identified by a mac address. A bridgeLink is a connection between backbone ports.

Transparent bridging is not loop free so if you have loops you have to enable the spanning tree protocol that will detect loops and again will put some ports in a blocking state to avoid loops. To get links it is necessary to perform some calculations that let us define the Links. The following MIBS must be supported by the SNMP agent to allow Transparent Bridge Discovery.

Table 26. Supported MIBS from the Cisco-VTP MIB

Name

OID

Description

vtpVersion

.1.3.6.1.4.1.9.9.46.1.1.1.0

The version of VTP in use on the local system. A device will report its version capability and not any particular version in use on the device. If the device does not support VTP, the version is none(3).

Table 27. Supported OIDs from the IP-MIB

Name

OID

Description

ipNetToMediaIfIndex

.1.3.6.1.2.1.4.22.1.1

The interface on which this entry’s equivalence is effective. The layer-2 interface identified by a particular value of this index is the same interface as identified by the same value of ifIndex.

ipNetToMediaPhysAddress

.1.3.6.1.2.1.4.22.1.2

The media-dependent physical address.

ipNetToMediaNetAddress

.1.3.6.1.2.1.4.22.1.3

The IpAddress corresponding to the media-dependent physical address.

ipNetToMediaType

.1.3.6.1.2.1.4.22.1.4

The type of mapping. Setting this object to the value invalid(2) has the effect of invalidating the corresponding entry in the ipNetToMediaTable. That is, it effectively dissasociates the interface identified with said entry from the mapping identified with said entry. It is an implementation-specific matter as to whether the agent removes an invalidated entry from the table. Accordingly, management stations must be prepared to receive tabular information from agents that corresponds to entries not currently in use. Proper interpretation of such entries requires examination of the relevant ipNetToMediaType object.

Table 28. Supported OIDS from the BRIDGE-MIB

Name

OID

Description

dot1dBaseBridgeAddress

.1.3.6.1.2.1.17.1.1.0

The MAC address used by this bridge when it must be referred to in a unique fashion. It is recommended that this be the numerically smallest MAC address of all ports that belong to this bridge. However it is only required to be unique. When concatenated with dot1dStpPriority a unique BridgeIdentifier is formed which is used in the Spanning Tree Protocol.

dot1dBaseNumPorts

.1.3.6.1.2.1.17.1.2.0

The number of ports controlled by this bridging entity.

dot1dBaseType

.1.3.6.1.2.1.17.1.3.0

Indicates what type of bridging this bridge can perform. If a bridge is actually performing a certain type of bridging this will be indicated by entries in the port table for the given type.

dot1dBasePort

.1.3.6.1.2.1.17.1.4.1.1

The port number of the port for which this entry contains bridge management information.

dot1dPortIfIndex

.1.3.6.1.2.1.17.1.4.1.2

The value of the instance of the ifIndex object, defined in MIB-II, for the interface corresponding to this port.

dot1dStpProtocolSpecification

.1.3.6.1.2.1.17.2.1.0

An indication of what version of the Spanning Tree Protocol is being run. The value decLb100(2) indicates the DEC LANbridge 100 Spanning Tree protocol. IEEE 802.1d implementations will return ieee8021d(3). If future versions of the IEEE Spanning Tree Protocol are released that are incompatible with the current version a new value will be defined.

dot1dStpPriority

.1.3.6.1.2.1.17.2.2

The value of the writeable portion of the Bridge ID, i.e., the first two octets of the (8 octet long) Bridge ID. The other (last) 6 octets of the Bridge ID are given by the value of dot1dBaseBridgeAddress.

dot1dStpDesignatedRoot

.1.3.6.1.2.1.17.2.5

The bridge identifier of the root of the spanning tree as determined by the Spanning Tree Protocol as executed by this node. This value is used as the Root Identifier parameter in all configuration Bridge PDUs originated by this node.

dot1dStpRootCost

.1.3.6.1.2.1.17.2.6

The cost of the path to the root as seen from this bridge.

dot1dStpRootPort

.1.3.6.1.2.1.17.2.7

The port number of the port which offers the lowest cost path from this bridge to the root bridge.

dot1dStpPort

.1.3.6.1.2.1.17.2.15.1.1

The port number of the port for which this entry contains Spanning Tree Protocol management information.

dot1dStpPortPriority

.1.3.6.1.2.1.17.2.15.1.2

The value of the priority field which is contained in the first (in network byte order) octet of the (2 octet long) Port ID. The other octet of the Port ID is given by the value of dot1dStpPort.

dot1dStpPortState

.1.3.6.1.2.1.17.2.15.1.3

The port’s current state as defined by application of the Spanning Tree Protocol. This state controls what action a port takes on reception of a frame. If the bridge has detected a port that is malfunctioning it will place that port into the broken(6) state. For ports which are disabled (see dot1dStpPortEnable), this object will have a value of disabled(1).

dot1dStpPortEnable

.1.3.6.1.2.1.17.2.15.1.4

The enabled/disabled status of the port.

dot1dStpPortPathCost

.1.3.6.1.2.1.17.2.15.1.5

The contribution of this port to the path cost of paths towards the spanning tree root which include this port. 802.1D-1990 recommends that the default value of this parameter be in inverse proportion to the speed of the attached LAN.

dot1dStpPortDesignatedRoot

.1.3.6.1.2.1.17.2.15.1.6

The unique Bridge Identifier of the Bridge recorded as the Root in the Configuration BPDUs transmitted by the Designated Bridge for the segment to which the port is attached.

dot1dStpPortDesignatedCost

.1.3.6.1.2.1.17.2.15.1.7

The path cost of the Designated Port of the segment connected to this port. This value is compared to the Root Path Cost field in received bridge PDUs.

dot1dStpPortDesignatedBridge

.1.3.6.1.2.1.17.2.15.1.8

The Bridge Identifier of the bridge which this port considers to be the Designated Bridge for this port’s segment.

dot1dStpPortDesignatedPort

.1.3.6.1.2.1.17.2.15.1.9

The Port Identifier of the port on the Designated Bridge for this port’s segment.

dot1dTpFdbAddress

.1.3.6.1.2.1.17.4.3.1.1

A unicast MAC address for which the bridge has forwarding and/or filtering information.

dot1dTpFdbPort

.1.3.6.1.2.1.17.4.3.1.2

Either the value '0', or the port number of the port on which a frame having a source address equal to the value of the corresponding instance of dot1dTpFdbAddress has been seen. A value of '0' indicates that the port number has not been learned but that the bridge does have some forwarding/filtering information about this address (e.g. in the dot1dStaticTable). Implementors are encouraged to assign the port value to this object whenever it is learned even for addresses for which the corresponding value of dot1dTpFdbStatus is not learned(3).

dot1dTpFdbStatus

.1.3.6.1.2.1.17.4.3.1.3

The status of this entry. The meanings of the values are:
other(1): none of the following. This would include the case where some other MIB object (not the corresponding instance of dot1dTpFdbPort, nor an entry in the dot1dStaticTable) is being used to determine if and how frames addressed to the value of the corresponding instance of dot1dTpFdbAddress are being forwarded.
invalid(2): this entry is not longer valid (e.g., it was learned but has since aged-out), but has not yet been flushed from the table.
learned(3): the value of the corresponding instance of dot1dTpFdbPort was learned, and is being used.
self(4): the value of the corresponding instance of dot1dTpFdbAddress represents one of the bridge’s addresses. The corresponding instance of dot1dTpFdbPort indicates which of the bridge’s ports has this address.
mgmt(5): the value of the corresponding instance of dot1dTpFdbAddress is also the value of an existing instance of dot1dStaticAddress.

Table 29. Supported OIDS from the Q-BRIDGE-MIB

Name

OID

Description

dot1qTpFdbPort

.1.3.6.1.2.1.17.7.1.2.2.1.2

Either the value 0, or the port number of the port on which a frame having a source address equal to the value of the corresponding instance of dot1qTpFdbAddress has been seen. A value of 0 indicates that the port number has not been learned but that the device does have some forwarding/filtering information about this address (e.g., in the dot1qStaticUnicastTable). Implementors are encouraged to assign the port value to this object whenever it is learned, even for addresses for which the corresponding value of dot1qTpFdbStatus is not learned(3).

dot1qTpFdbStatus

.1.3.6.1.2.1.17.7.1.2.2.1.3

The status of this entry. The meanings of the values are:
other(1): none of the following. This may include the case where some other MIB object (not the corresponding instance of dot1qTpFdbPort, nor an entry in the dot1qStaticUnicastTable) is being used to determine if and how frames addressed to the value of the corresponding instance of dot1qTpFdbAddress are being forwarded.
invalid(2): this entry is no longer valid (e.g., it was learned but has since aged out), but has not yet been flushed from the table.
learned(3): the value of the corresponding instance of dot1qTpFdbPort was learned and is being used.
self(4): the value of the corresponding instance of dot1qTpFdbAddress represents one of the device’s addresses. The corresponding instance of dot1qTpFdbPort indicates which of the device’s ports has this address.
mgmt(5): the value of the corresponding instance of dot1qTpFdbAddress is also the value of an existing instance of dot1qStaticAddress.

Generic information about the bridge link discovery process can be found in the Bridge Information box on the Node Detail Page of the device. Information gathered from this OID will be stored in the following database table:

bridge database
Figure 33. Database tables related to transparent bridge discovery

With Enlinkd it is possible to get Links based on network routing applications. The following routing daemons can be used to provide a discovery of links based Layer 3 information:

This information is provided by SNMP Agents with appropriate MIB support. For this reason it is required to have a working SNMP configuration running. The link data discovered from Enlinkd is provided in the Topology User Interface and on the detail page of a node.

13.3.1. OSPF Discovery

The relevant MIBs for OSPF topology are OSPF-MIB and OSPF-TRAP-MIB. In these MIBs are defined the relevant objects used to find OSPF links, specifically:

  • The Router ID which, in OSPF, has the same format as an IP address

  • But identifies the router independent of its IP address.

Also all the interfaces are identified by their IP addresses. The OSPF links come from the SNMP ospfNbrTable defined in OSPF-MIB and this table is in practice persisted in the ospfLink table:

Table 30. Supported OIDs from OSPF-MIB
Name OID Description

ospfRouterId

.1.3.6.1.2.1.14.1.1.0

A 32-bit integer uniquely identifying the router in the Autonomous System. By convention, to ensure uniqueness, this should default to the value of one of the router’s IP interface addresses. This object is persistent and when written the entity should save the change to non-volatile storage.

ospfAdminStat

.1.3.6.1.2.1.14.1.2.0

The administrative status of OSPF in the router. The value enabled denotes that the OSPF Process is active on at least one interface; disabled disables it on all interfaces. This object is persistent and when written the entity should save the change to non-volatile storage.

ospfVersionNumber

.1.3.6.1.2.1.14.1.3.0

The current version number of the OSPF protocol is 2.

ospfAreaBdrRtrStatus

.1.3.6.1.2.1.14.1.4.0

A flag to note whether this router is an Area Border Router.

ospfAreaASBdrRtrStatus

.1.3.6.1.2.1.14.1.5.0

A flag to note whether this router is configured as an Autonomous System Border Router. This object is persistent and when written the entity should save the change to non-volatile storage.

ospfIfIpAddress

.1.3.6.1.2.1.14.7.1.1

The IP address of this OSPF interface.

ospfAddressLessIf

.1.3.6.1.2.1.14.7.1.2

For the purpose of easing the instancing of addressed and addressless interfaces; this variable takes the value 0 on interfaces with IP addresses and the corresponding value of ifIndex for interfaces having no IP address.

ospfNbrIpAddr

.1.3.6.1.2.1.14.10.1.1

The IP address this neighbor is using in its IP source address. Note that, on addressless links, this will not be 0.0.0.0 but the address of another of the neighbor’s interfaces.

ospfNbrAddressLessIndex

.1.3.6.1.2.1.14.10.1.2

On an interface having an IP address, zero. On addressless interfaces, the corresponding value of ifIndex in the Internet Standard MIB. On row creation, this can be derived from the instance.

ospfNbrRtrId

.1.3.6.1.2.1.14.10.1.3

A 32-bit integer (represented as a type IpAddress) uniquely identifying the neighboring router in the Autonomous System.

Table 31. Supported OIDs from IP-MIB
Name OID Description

ipAdEntIfIndex

.1.3.6.1.2.1.4.20.1.2

The index value which uniquely identifies the interface to which this entry is applicable. The interface identified by a particular value of this index is the same interface as identified by the same value of the IF-MIB’s ifIndex.

ipAdEntNetMask

.1.3.6.1.2.1.4.20.1.3

The subnet mask associated with the IPv4 address of this entry. The value of the mask is an IPv4 address with all the network bits set to 1 and all the hosts bits set to 0.

Generic information about the OSPF link discovery process can be found in the OSPF Information box on the Node Detail Page of the device. Information gathered from these OIDs will be stored in the following database table:

ospf database
Figure 34. Database tables related to OSPF discovery

13.3.2. IS-IS Discovery

IS-IS Links are found in the isisISAdjTable that is defined in ISIS-MIB (mib-rfc4444.txt). In this table is found the information needed to find the Adjacency Intermediate System. The information about IS-IS is stored into two tables: isisElement and isisLink. isisElement contains the ISISSysID, a unique identifier of the "Intermediate System" (the name for the Router in ISO protocols). Each entry in this SNMP MIB table represents a unidirectional link from the Intermediate System that is queried to the Adjacent Intermediate Systems running IS-IS and "peering" with the source router. If two routers IS-A and IS-B support ISIS-MIB, then EnLinkd will create two link entries in OpenNMS Horizon: one from IS-A to IS-B (from the adjtable of IS-A) the complementary link back from IS-B to IS-A (from the adjTable of _IS-B). IS-IS links are represented in the ISIS-MIB as follows:

Table 32. Supported OIDs from ISIS-MIB
Name OID Description

isisSysID

.1.3.6.1.2.1.138.1.1.1.3.0

The ID for this Intermediate System. This value is appended to each of the area addresses to form the Network Entity Titles. The derivation of a value for this object is implementation specific. Some implementations may automatically assign values and not permit an SNMP write, while others may require the value to be set manually. Configured values must survive an agent reboot.

isisSysAdminState

.1.3.6.1.2.1.138.1.1.1.8.0

The administrative state of this Intermediate System. Setting this object to the value on when its current value is off enables the Intermediate System. Configured values must survive an agent reboot.

isisSysObject

.1.3.6.1.2.1.138.1.1.1

isisSysObject

isisCircIfIndex

.1.3.6.1.2.1.138.1.3.2.1.2

The value of ifIndex for the interface to which this circuit corresponds. This object cannot be modified after creation.

isisCircAdminState

.1.3.6.1.2.1.138.1.3.2.1.3

The administrative state of the circuit.

isisISAdjState

.1.3.6.1.2.1.138.1.6.1.1.2

The state of the adjacency.

isisISAdjNeighSNPAAddress

.1.3.6.1.2.1.138.1.6.1.1.4

The SNPA address of the neighboring system.

isisISAdjNeighSysType

.1.3.6.1.2.1.138.1.6.1.1.5

The type of the neighboring system.

isisISAdjNeighSysID

.1.3.6.1.2.1.138.1.6.1.1.6

The system ID of the neighboring Intermediate System.

isisISAdjNbrExtendedCircID

.1.3.6.1.2.1.138.1.6.1.1.7

The 4-byte Extended Circuit ID learned from the Neighbor during 3-way handshake, or 0.

Generic information about the IS-IS link discovery process can be found in the IS-IS Information box on the Node Detail Page of the device. Information gathered from this OIDs will be stored in the following database table:

is is database
Figure 35. Database tables related to IS-IS discovery

14. Operation

14.1. HTTPS / SSL

This chapter covers the possibilities to configure OpenNMS Horizon to protect web sessions with HTTPS and also explains how to configure OpenNMS Horizon to establish secure connections.

In order to use HTTPS the Java command line tool keytool is used. It is automatically shipped with each JRE installation. More details about the keytool can be found at the official documentation.

14.1.1. Standalone HTTPS with Jetty

To configure OpenNMS Horizon to protect web sessions with HTTPS please refer to the official OpenNMS Horizon Wiki article Standalone HTTPS with Jetty.

14.1.2. OpenNMS Horizon as HTTPS client

To establish secure HTTPS connections within Java one has to setup a so called Java Trust Store.

The Java Trust Store contains all certificates a Java application should trust when making connections as a client to a server.

Setup Java Trust Store

To setup the Java Trust Store the following command can be issued.

If you do not have a Java Trust Store setup yet, it is created automatically.
Import a certificate to the Java Trust Store
keytool \
  -import \ (1)
  -v \ (2)
  -trustcacerts \ (3)
  -alias localhost \ (4)
  -file localhost.cert \ (5)
  -keystore /$OPENNMS_HOME/etc/trust-store.jks  (6)
1 Define to import a certificate or a certificate chain
2 Use verbose output
3 Define to trust certificates from cacerts
4 The alias for the certificate to import, e.g. the common name
5 The certificate to import
6 The location of the Java Trust Store

If you create a new Java Trust Store you are asked for a password to protect the Java Trust Store. If you update an already existing Java Trust Store please enter the password you chose when creating the Java Trust Store initially.

Download existing public certificate

To Download an existing public certificate the following command can be issued.

Download an existing public certificate
openssl \
  s_client \ (1)
  -showcerts \ (2)
  -connect localhost:443 \ (3)
  -servername localhost \ (4)
  < /dev/null \ (5)
  > localhost.cert (6)
1 Use SSL/TLS client functionality of openssl.
2 Show all certificates in the chain
3 PORT:HOST to connect to, e.g. localhost:443
4 This is optional, but if you are serving multiple certificates under one single ip address you may define a server name, otherwise the ip of localhost:PORT certificate is returned which may not match the requested server name (mail.domain.com, opennms.domain.com, dns.domain.com)
5 No input
6 Where to store the certificate.
Configure OpenNMS Horizon to use the defined Java Trust Store

To setup OpenNMS Horizon to use the defined Java Trust Store the according javax.net.ssl.trustStore* properties have to be set. Open $OPENNMS_HOME/etc/opennms.properties and add the properties javax.net.ssl.trustStore and javax.net.ssl.trustStorePassword as shown below.

$OPENNMS_HOME/etc/opennms.properties snippet to define a Java Trust Store
javax.net.ssl.trustStore=/$OPENNMS_HOME/etc/trust-store.jks (1)
javax.net.ssl.trustStorePassword=change-me (2)
1 The location of the Java Trust Store
2 The password of the Java Trust Store

For more details on the Java build-in SSL System properties have a look at chapter Debugging / Properties.

Each time you modify the Java Trust Store you have to restart OpenNMS Horizon to have the changes take effect.

14.1.3. Differences between Java Trust Store and Java Key Store

The Java Trust Store is used to determine whether a remote connection should be trusted or not, e.g. whether a remote party is who it claims to be (client use case).

The Java Key Store is used to decide which authentication credentials should be sent to the remote host for authentication during SSL handshake (server use case).

For more details, please check the JSSE Reference Guide.

14.1.4. Debugging / Properties

If you encounter issues while using HTTPS it might be useful to enable debugging or use one of the build-in Java System Properties to configure the proper use of SSL.

Table 33. Java build-in System Properties (Source)
System Property Name Description

javax.net.ssl.keyStore

Location of the Java keystore file containing an application process’s own certificate and private key. On Windows, the specified pathname must use forward slashes, /, in place of backslashes, \.

javax.net.ssl.keyStorePassword

Password to access the private key from the keystore file specified by javax.net.ssl.keyStore. This password is used twice: to unlock the keystore file (store password) and to decrypt the private key stored in the keystore (key password). In other words, the JSSE framework requires these passwords to be identical.

javax.net.ssl.keyStoreType

(Optional) For Java keystore file format, this property has the value jks (or JKS). You do not normally specify this property, because its default value is already jks.

javax.net.ssl.trustStore

Location of the Java keystore file containing the collection of CA certificates trusted by this application process (trust store). On Windows, the specified pathname must use forward slashes, /, in place of backslashes, \. If a trust store location is not specified using this property, the Sun JSSE implementation searches for and uses a keystore file in the following locations (in order): $JAVA_HOME/lib/security/jssecacerts and $JAVA_HOME/lib/security/cacerts

javax.net.ssl.trustStorePassword

Password to unlock the keystore file (store password) specified by javax.net.ssl.trustStore.

javax.net.ssl.trustStoreType

(Optional) For Java keystore file format, this property has the value jks (or JKS). You do not normally specify this property, because its default value is already jks.

javax.net.debug

To switch on logging for the SSL/TLS layer, set this property to ssl. More details about possible values can be found here.

14.2. Request Logging

HTTP requests logs for Jetty can be enabled by uncommenting the following snippet in etc/jetty.xml:

<!-- NCSA Request Logging
 <Item>
     <New id="RequestLog" class="org.eclipse.jetty.server.handler.RequestLogHandler">
       <Set name="requestLog">
         <New id="RequestLogImpl" class="org.eclipse.jetty.server.NCSARequestLog">
           <Arg>logs/jetty-requests-yyyy_mm_dd.log</Arg>
           <Set name="retainDays">90</Set>
           <Set name="append">true</Set>
           <Set name="extended">true</Set>
           <Set name="logTimeZone">US/Central</Set>
         </New>
       </Set>
     </New>
 </Item>
-->
If you do not have a jetty.xml in the etc directory, you can start by copying the example from etc/examples/jetty.xml.

If you would like the include the usernames associated with the requests in the log file, you must also uncomment the following snippet in jetty-webapps/opennms/WEB-INF/web.xml:

<!-- Enable this filter mapping when using NCSA request logging
<filter-mapping>
  <filter-name>jettyUserIdentityFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>
-->

After restarting OpenNMS Horizon, requests logs of the following form should be available in logs/jetty-requests-*.log:

127.0.0.1 - - [02/Jun/2017:09:16:38 -0500] "GET / HTTP/1.1" 302 0 "-" "Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/5
8.0.3029.110 Safari/537.36"
127.0.0.1 - anonymousUser [02/Jun/2017:09:16:39 -0500] "GET /opennms/ HTTP/1.1" 302 0 "-" "Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
127.0.0.1 - admin [02/Jun/2017:09:16:46 -0500] "POST /opennms/rest/datachoices?action=enable HTTP/1.1" 200 0 "http://127.0.0.1:8980/opennms/index.jsp" "Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
127.0.0.1 - rtc [02/Jun/2017:09:16:45 -0500] "POST /opennms/rtc/post/DNS+and+DHCP+Servers HTTP/1.1" 200 35 "-" "Java/1.8.0_121"

14.3. Geocoder Service

The Geocoder Service is used to resolve geolocation information within OpenNMS Horizon. By default The Google Map API is used to resolve the geolocation information, if available. In order to configure the Google Map API the following properties in etc/org.opennms.features.geocoder.google.cfg are supported:

Property Type Default Description

clientId

String

empty string

The Google Map API Client ID. This is required if you exceed the free Google Map API usage. Please refer to the official documentation for more information.

clientKey

String

empty string

The Google Map API API Key. This is required if you exceed the free Google Map API usage. Please refer to the official documentation for more information.

timeout

Integer

500

The connection timeout in milliseconds the Geocoder tries to resolve a single geolocation.

14.4. Newts

This section describes how to configure OpenNMS Horizon to use Newts and how to use OpenNMS Horizon to monitor your Cassandra cluster.

14.4.1. Configuration

Enabling Newts

OpenNMS Horizon can be configured to use Newts by setting the following property in in ${OPENNMS_HOME}/etc/opennms.properties:

org.opennms.timeseries.strategy=newts

It is also highly recommended that resources stored in Newts are referenced by their foreign source and foreign ID, as opposed to their database ID. To this end, the following property should also be set in the same file:

org.opennms.rrd.storeByForeignSource=true

With these set, OpenNMS Horizon will begin persisting metrics using the Newts engine when restarted.

Additional configuration options are presented in the next section.

Configuration Reference

The following properties, found in ${OPENNMS_HOME}/etc/opennms.properties, can be used to configure and tune Newts.

General
Name Default Description

org.opennms.newts.config.keyspace

newts

Name of the keyspace to use.

org.opennms.newts.config.hostname

localhost

IP address or hostnames of the Cassandra nodes. Multiple hosts can be separated by a comma.

org.opennms.newts.config.port

9042

CQL port used to connect to the Cassandra nodes.

org.opennms.newts.config.username

cassandra

Username to use when connecting to Cassandra via CQL.

org.opennms.newts.config.password

cassandra

Password to use when connecting to Cassandra via CQL.

org.opennms.newts.config.ssl

false

Enable/disable SSL when connecting to Cassandra.

org.opennms.newts.config.read_consistency

ONE

Consistency level used for read operations. See Configuring data consistency for a list of available options.

org.opennms.newts.config.write_consistency

ANY

Consistency level used for write operations. See Configuring data consistency for a list of available options.

org.opennms.newts.config.max_batch_size

16

Maximum number of records to insert in a single transaction. Limited by the size of the Cassandra cluster’s batch_size_fail_threshold_in_kb property.

org.opennms.newts.config.ring_buffer_size

8192

Maximum number of records that can be held in the ring buffer. Must be a power of two.

org.opennms.newts.config.writer_threads

16

Number of threads used to pull samples from the ring buffer and insert them into Newts.

org.opennms.newts.config.ttl

31540000

Number of seconds after which samples will automatically be deleted. Defaults to one year.

org.opennms.newts.config.resource_shard

604800

Duration in seconds for which samples will be stored at the same key. Defaults to 7 days in seconds.

org.opennms.newts.query.minimum_step

300000

Minimum step size in milliseconds. Used to prevent large queries.

org.opennms.newts.query.interval_divider

2

If no interval is specified in the query, the step will be divided into this many intervals when aggregating values.

org.opennms.newts.query.heartbeat

450000

Duration in milliseconds. Used when no heartbeat is specified. Should generally be 1.5x your largest collection interval.

org.opennms.newts.query.parallelism

Number of cores

Maximum number of threads that can be used to compute aggregates. Defaults to the number of available cores.

org.opennms.newts.config.cache.strategy

See bellow

Canonical name of the class used for resource level caching. See the table bellow for all of the available options.

org.opennms.newts.config.cache.max_entries

8192

Maximum number of records to keep in the cache when using an in-memory caching strategy.

org.opennms.newts.nan_on_counter_wrap

false

Disables the processing of counter wraps, replacing these with NaNs instead.

org.opennms.newts.config.cache.priming.disable

false

Disables the cache primer, which pre-emptively loads the cache with indexed resources on start-up.

org.opennms.newts.config.cache.priming.block_ms

120000

Block startup for this many milliseconds while waiting for the cache to be primed. Set this value to -1 to disable blocking. Set this value to 0 to block indefinitely waiting for all of the records to be read.

Available caching strategies include:

Name Class Default

In-Memory Cache

org.opennms.netmgt.newts.support.GuavaSearchableResourceMetadataCache

Y

Redis-based Cache

org.opennms.netmgt.newts.support.RedisResourceMetadataCache

N

Redis Cache

When enabled, the following options can be used to configure the Redis-based cache.

Name Default Description

org.opennms.newts.config.cache.redis_hostname

localhost

IP address of hostname of the Redis server.

org.opennms.newts.config.cache.redis_port

6379

TCP port used to connect to the Redis server.

Recommendations

You will likely want to change the values of cache.max_entries and the ring_buffer_size to suit your installation.

Meta-data related to resources are cached in order to avoid writing redundant records in Cassandra. If you are collecting data from a large number of resources, you should increase the cache.max_entries to reflect the number of resources you are collecting from, with a suitable buffer.

The samples gathered by the collectors are temporarily stored in a ring buffer before they are persisted to Cassandra using Newts. The value of the ring_buffer_size should be increased if you expect large peaks of collectors returning at once or latency in persisting these to Cassandra. However, note that the memory used by the ring buffer is reserved, and larger values may require an increased heap size.

Cache priming is used to help reduce the number of records that need to be indexed after restarting OpenNMS Horizon. This works by rebuilding the cache using the index data that has already been persisted in Cassandra. If you continue to see large spikes of index related inserts after rebooting you may want to consider increasing the amount of time spent priming the cache.

14.4.2. Cassandra Monitoring

This section describes some of the metrics OpenNMS Horizon collects from a Cassandra cluster.

JMX must be enabled on the Cassandra nodes and made accessible from _OpenNMS Horizon in order to collect these metrics. See Enabling JMX authentication for details.
The data collection is bound to the agent IP interface with the service name JMX-Cassandra. The JMXCollector is used to retrieve the MBean entities from the Cassandra node.
Client Connections

The number of active client connections from org.apache.cassandra.metrics.Client are collected:

Name Description

connectedNativeClients

Metrics for connected native clients

connectedThriftClients

Metrics for connected thrift clients

Compaction Bytes

The following compaction manager metrics from org.apache.cassandra.metrics.Compaction are collected:

Name Description

BytesCompacted

Number of bytes compacted since node started

Compaction Tasks

The following compaction manager metrics from org.apache.cassandra.metrics.Compaction are collected:

Name Description

CompletedTasks

Estimated number of completed compaction tasks

PendingTasks

Estimated number of pending compaction tasks

Storage Load

The following storage load metrics from org.apache.cassandra.metrics.Storage are collected:

Name Description

Load

Total disk space (in bytes) used by this node

Storage Exceptions

The following storage exception metrics from org.apache.cassandra.metrics.Storage are collected:

Name Description

Exceptions

Number of unhandled exceptions since start of this Cassandra instance

Dropped Messages

Measurement of messages that were DROPPABLE. These ran after a given timeout set per message type so was thrown away. In JMX these are accessible via org.apache.cassandra.metrics.DroppedMessage. The number of dropped messages in the different message queues are good indicators whether a cluster can handle its load.

Name Stage Description

Mutation

MutationStage

If a write message is processed after its timeout (write_request_timeout_in_ms) it either sent a failure to the client or it met its requested consistency level and will relay on hinted handoff and read repairs to do the mutation if it succeeded.

Counter_Mutation

MutationStage

If a write message is processed after its timeout (write_request_timeout_in_ms) it either sent a failure to the client or it met its requested consistency level and will relay on hinted handoff and read repairs to do the mutation if it succeeded.

Read_Repair

MutationStage

Times out after write_request_timeout_in_ms.

Read

ReadStage

Times out after read_request_timeout_in_ms. No point in servicing reads after that point since it would of returned error to client.

Range_Slice

ReadStage

Times out after range_request_timeout_in_ms.

Request_Response

RequestResponseStage

Times out after request_timeout_in_ms. Response was completed and sent back but not before the timeout

Thread pools

Apache Cassandra is based on a so called Staged Event Driven Architecture (SEDA). This seperates different operations in stages and these stages are loosely coupled using a messaging service. Each of these components use queues and thread pools to group and execute their tasks. The documentation for Cassandra Thread pool monitoring is originated from Pythian Guide to Cassandra Thread Pools.

Table 34. Collected metrics for Thread Pools
Name Description

ActiveTasks

Tasks that are currently running

CompletedTasks

Tasks that have been completed

CurrentlyBlockedTasks

Tasks that have been blocked due to a full queue

PendingTasks

Tasks queued for execution

Memtable FlushWriter

Sort and write memtables to disk from org.apache.cassandra.metrics.ThreadPools. A vast majority of time this backing up is from over running disk capability. The sorting can cause issues as well however. In the case of sorting being a problem, it is usually accompanied with high load but a small amount of actual flushes (seen in cfstats). Can be from huge rows with large column names, i.e. something inserting many large values into a CQL collection. If overrunning disk capabilities, it is recommended to add nodes or tune the configuration.

Alerts: pending > 15 || blocked > 0
Memtable Post Flusher

Operations after flushing the memtable. Discard commit log files that have had all data in them in sstables. Flushing non-cf backed secondary indexes.

Alerts: pending > 15 || blocked > 0
Anti Entropy Stage

Repairing consistency. Handle repair messages like merkle tree transfer (from Validation compaction) and streaming.

Alerts: pending > 15 || blocked > 0
Gossip Stage

Post 2.0.3 there should no longer be issue with pending tasks. Instead monitor logs for a message:

Gossip stage has {} pending tasks; skipping status check ...

Before that change, in particular older versions of 1.2, with a lot of nodes (100+) while using vnodes can cause a lot of CPU intensive work that caused the stage to get behind. Been known to of been caused with out of sync schemas. Check NTP working correctly and attempt nodetool resetlocalschema or the more drastic deleting of system column family folder.

Alerts: pending > 15 || blocked > 0
Migration Stage

Making schema changes

Alerts: pending > 15 || blocked > 0
MiscStage

Snapshotting, replicating data after node remove completed.

Alerts: pending > 15 || blocked > 0
Mutation Stage

Performing a local including:

  • insert/updates

  • Schema merges

  • commit log replays

  • hints in progress

Similar to ReadStage, an increase in pending tasks here can be caused by disk issues, over loading a system, or poor tuning. If messages are backed up in this stage, you can add nodes, tune hardware and configuration, or update the data model and use case.

Alerts: pending > 15 || blocked > 0
Read Stage

Performing a local read. Also includes deserializing data from row cache. If there are pending values this can cause increased read latency. This can spike due to disk problems, poor tuning, or over loading your cluster. In many cases (not disk failure) this is resolved by adding nodes or tuning the system.

Alerts: pending > 15 || blocked > 0
Request Response Stage

When a response to a request is received this is the stage used to execute any callbacks that were created with the original request.

Alerts: pending > 15 || blocked > 0
Read Repair Stage

Performing read repairs. Chance of them occurring is configurable per column family with read_repair_chance. More likely to back up if using CL.ONE (and to lesser possibly other non-CL.ALL queries) for reads and using multiple data centers. It will then be kicked off asynchronously outside of the queries feedback loop. Note that this is not very likely to be a problem since does not happen on all queries and is fast providing good connectivity between replicas. The repair being droppable also means that after write_request_timeout_in_ms it will be thrown away which further mitigates this. If pending grows attempt to lower the rate for high read CFs.

Alerts: pending > 15 || blocked > 0
JVM Metrics

Some key metrics from the running Java virtual machine are also collected:

java.lang:type=Memory

The memory system of the Java virtual machine. This includes heap and non-heap memory

java.lang:type=GarbageCollector,name=ConcurrentMarkSweep

Metrics for the garbage collection process of the Java virtual machine

If you use Apache Cassandra for running Newts you can also enable additional metrics for the Newts keyspace.

14.4.3. Newts Monitoring

This section describes the metrics OpenNMS Horizon collects for monitoring the Newts keyspace from org.apache.cassandra.metrics.Keyspace on an Cassandra node.

JMX must be enabled on the Cassandra nodes and made accessible from _OpenNMS Horizon in order to collect these metrics. See Enabling JMX authentication for details.

The data collection is bound to the agent IP interface with the service name JMX-Cassandra-Newts. The JMXCollector is used to retrieve the MBean entities from the Cassandra node.

All Memory Table Data Size
Name Description

AllMemtablesLiveDataSize

Total amount of live data stored in the memtables (2i and pending flush memtables included) that resides off-heap, excluding any data structure overhead

AllMemtablesOffHeapDataSize

Total amount of data stored in the memtables (2i and pending flush memtables included) that resides off-heap.

AllMemtablesOnHeapDataSize

Total amount of data stored in the memtables (2i and pending flush memtables included) that resides on-heap.

Memtable Switch Count
Name Description

MemtableSwitchCount

Number of times flush has resulted in the memtable being switched out.

Memtable Columns Count
Name Description

MemtableColumnsCount

Total number of columns present in the memtable.

Memory Table Data Size
Name Description

MemtableLiveDataSize

Total amount of live data stored in the memtable, excluding any data structure overhead

MemtableOffHeapDataSize

Total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.

MemtableOnHeapDataSize

Total amount of data stored in the memtable that resides on-heap, including column related overhead and partitions overwritten.

Read and Write Latency
Name Description

ReadTotalLatency

Local read metrics.

WriteTotalLatency

Local write metrics.

Range Latency
Name Description

RangeLatency 99th Percentile

Local range slice metrics 99th percentile.

Latency
Name Description

CasCommitTotalLatency

CasPrepareTotalLatency

CasProposeTotalLatency

Bloom Filter Disk Space
Name Description

BloomFilterDiskSpaceUsed

Disk space used by bloom filter

Bloom Filter Off Heap Memory
Name Description

BloomFilterOffHeapMemoryUsed

Off heap memory used by bloom filter

Newts Memory Used
Name Description

 CompressionMetadataOffHeapMemoryUsed

Off heap memory used by compression meta data

IndexSummaryOffHeapMemoryUsed

Off heap memory used by index summary

Pending
Name Description

PendingCompactions

Estimate of number of pending compactions for this column family

PendingFlushes

Estimated number of tasks pending for this column family

Disk Space
Name Description

TotalDiskSpaceUsed

Total disk space used by SSTables belonging to this column family including obsolete ones waiting to be garbage collected.

LiveDiskSpaceUsed

Disk space used by SSTables belonging to this column family

15. System Properties

The global behavior of OpenNMS is configured with Property files. Configuration can have also effect on the Java Virtual Machine underneath OpenNMS. Changes in these property files require a restart of OpenNMS. The configuration files can be found in ${OPENNMS_HOME}/etc.

The priority for Java system properties is as follows:

  1. Those set via the Java command line i.e. in opennms.conf via ADDITIONAL_MANAGER_OPTIONS

  2. opennms.properties.d/*.properties

  3. opennms.properties

  4. libraries.properties

  5. rrd-configuration.properties

  6. bootstrap.properties

Property files in opennms.properties.d/ are sorted alphabetically.

To avoid conflicts with customized configurations, all custom properties can be added to one or more files in ${OPENNMS_HOME}/etc/opennms.properties.d/. It is recommended to avoid modification of OpenNMS properties from the default installation. Create dedicated files with your customized properties in opennms.properties.d/.

16. Ticketing

The ticketing integration allows OpenNMS Horizon to create trouble tickets in external systems. Tickets can be created and updated in response to new and/or resolved alarms.

To activate the ticketing integration, the following properties in ${OPENNMS_HOME}/etc/opennms.properties must be set accordingly:

Property Default Description

opennms.ticketer.plugin

NullTicketerPlugin

The plugin implementation to use. Each ticketer integration should define which value to set. The NullTicketerPlugin does nothing when attempting to create/update/delete tickets.

opennms.alarmTroubleTicketEnabled

false

Defines if the integration is enabled. If enabled various links to control the issue state is shown on the alarm details page.

opennms.alarmTroubleTicketLinkTemplate

${id}

A template to generate a link to the issue, e.g. http://issues.opennms.org/browse/${id}

17. Enabling RMI

By default, the RMI port in the OpenNMS Horizon server is disabled, for security reasons. If you wish to enable it so you can access OpenNMS Horizon through jconsole, remote-manage OpenNMS Horizon, or use the remote poller over RMI, you will have to add some settings to the default OpenNMS Horizon install.

17.1. Enabling RMI

To enable the RMI port in OpenNMS Horizon, you will have to add the following to the ${OPENNMS_HOME}/etc/opennms.conf file. If you do not have an opennms.conf file, you can create it.

# Configure remote JMX
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.port=18980"
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.local.only=false"
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.authenticate=true"
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.ssl=false"

# Listen on all interfaces
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dopennms.poller.server.serverHost=0.0.0.0"
# Accept remote RMI connections on this interface
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Djava.rmi.server.hostname=<your-server-ip-address>"

This tells OpenNMS Horizon to listen for RMI on port 18980, and to listen on all interfaces. (Originally, RMI was only used for the Remote Poller, so despite the porperty name mentioning the "opennms poller server" it applies to RMI as a whole.) Note that you must include the -Djava.rmi.server.hostname= option or OpenNMS Horizon will accept connections on the RMI port, but not be able to complete a valid connection.

Authentication will only be allowed for users that are in the admin role (i.e. ROLE_ADMIN), or the jmx role (i.e. ROLE_JMX). To make a user an admin, be sure to add only the ROLE_ADMIN role to the user in users.xml. To add the jmx role to the user, add the ROLE_JMX role to the user in users.xml, and also the ROLE_USER role if is required to provide access to the WebUI.

Make sure $OPENNMS_HOME/etc/jmxremote.access has the appropriate settings:

admin   readwrite
jmx     readonly

The possible types of access are:

readwrite

Allows retrieving JMX metrics as well as executing MBeans.

readonly

Allows retrieving JMX metrics but does not allow executing MBeans, even if they just return simple values.

17.2. Enabling SSL

To enable SSL on the RMI port, you will need to have an existing keystore for the OpenNMS Horizon server. For information on configuring a keystore, please refer to the official OpenNMS Horizon Wiki article Standalone HTTPS with Jetty.

You will need to change the com.sun.management.jmxremote.ssl option to true, and tell OpenNMS Horizon where your keystore is.

# Configure remote JMX
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.port=18980"
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.local.only=false"
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.authenticate=true"
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dcom.sun.management.jmxremote.ssl=true"

# Configure SSL Keystore
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Djavax.net.ssl.keyStore=/opt/opennms/etc/opennms.keystore"
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Djavax.net.ssl.keyStorePassword=changeit"

# Listen on all interfaces
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Dopennms.poller.server.serverHost=0.0.0.0"
# Accept remote RMI connections on this interface
ADDITIONAL_MANAGER_OPTIONS="$ADDITIONAL_MANAGER_OPTIONS -Djava.rmi.server.hostname=<your-server-ip-address>"

17.3. Connecting to RMI over SSL

Note that if you are using a self-signed or otherwise untrusted certificate, you will need to configure a truststore on the client side when you attempt to connect over SSL-enabled RMI. To create a truststore, follow the example in the HTTPS client instructions in the operator section of the manual. You may then use the truststore to connect to your OpenNMS Horizon RMI server.

For example, when using jconsole to connect to the OpenNMS Horizon RMI interface to get JVM statistics, you would run:

jconsole -J-Djavax.net.ssl.trustStore=/path/to/opennms.truststore -J-Djavax.net.ssl.trustStorePassword=changeit

18. Minion

18.1. Using Kafka for Sink ( Traps and Syslog)

By default, OpenNMS Horizon uses the embedded ActiveMQ broker to communicate with Minions. This broker is used for both issuing remote procedure calls (RPCs, ie. ping this host) and for transporting unsolicited messages such as SNMP traps and syslog messages.

Apache Kafka can be used as an alternative to ActiveMQ for transporting the unsolicited messages.

Kafka must be enabled on both OpenNMS Horizon and Minion to function.

The Kafka server must be compatible with Kafka client version 1.0.1.

18.1.1. Consumer Configuration

Enable and configure the Kafka consumer on OpenNMS Horizon by using the following commands. The initialSleepTime property will ensure that messages are not consumed from Kafka until the OpenNMS Horizon system has fully initialized.

echo 'org.opennms.core.ipc.sink.initialSleepTime=60000' > "$OPENNMS_HOME/etc/opennms.properties.d/sink-initial-sleep-time.properties"
echo 'org.opennms.core.ipc.sink.strategy=kafka
org.opennms.core.ipc.sink.kafka.bootstrap.servers=127.0.0.1:9092' >> "$OPENNMS_HOME/etc/opennms.properties.d/kafka.properties"

Restart OpenNMS Horizon to apply the changes.

Additional Kafka consumer options can be set by defining additional system properties prefixed with org.opennms.core.ipc.sink.kafka. For example, you can customize the group ID using org.opennms.core.ipc.sink.kafka.group.id=MyOpenNMS.

A list of all the available options can be found here in New Consumer Configs.

18.1.2. Producer Configuration

Enable the Kafka producer on Minion using:

echo '!opennms-core-ipc-sink-camel
opennms-core-ipc-sink-kafka' >> "$MINION_HOME/etc/featuresBoot.d/kafka.boot"
The snippet above prevents the opennms-core-ipc-sink-camel feature from starting when Minion is started, and loads the opennms-core-ipc-sink-kafka feature instead.

Next, configure the Kafka producer on Minion using:

echo 'bootstrap.servers=127.0.0.1:9092
acks=1' > "$MINION_HOME/etc/org.opennms.core.ipc.sink.kafka.cfg"

Restart Minion to apply the changes.

Additional Kafka producer options can be set directly in the org.opennms.core.ipc.sink.kafka.cfg file reference above. A list of all the available options can be found here in Producer Configs.

18.2. Using Kafka for RPC

By default, OpenNMS Horizon uses the embedded ActiveMQ broker to communicate with Minions. Enabling kafka for RPC will allow replacing ActiveMQ if needed.

Kafka must be enabled on both OpenNMS Horizon and Minion to function.

The Kafka server must be compatible with Kafka client version 1.0.1.

For Kafka RPC, number of partitions should always be greater than number of minions at a location. When there are multiple locations, partitions >= max (number of minions at a location).

18.2.1. Client(OpenNMS) configuration

Enable and configure the Kafka on OpenNMS Horizon by using the following commands.

echo 'org.opennms.core.ipc.rpc.strategy=kafka
org.opennms.core.ipc.rpc.kafka.bootstrap.servers=127.0.0.1:9092' >> "$OPENNMS_HOME/etc/opennms.properties.d/kafka.properties"

Restart OpenNMS Horizon to apply the changes. Additional Kafka producer/consumer options can be set by defining additional system properties prefixed with org.opennms.core.ipc.rpc.kafka.

Default time to live (time at which request will expire) is 30 secs. It can be changed by configuring system property org.opennms.core.ipc.rpc.kafka.ttl

18.2.2. Server(Minion) configuration

Enable the Kafka on Minion using:

echo '!opennms-core-ipc-rpc-jms
opennms-core-ipc-rpc-kafka' >> "$MINION_HOME/etc/featuresBoot.d/kafka.boot"
The snippet above prevents the opennms-core-ipc-rpc-jms feature from starting when Minion is started, and loads the opennms-core-ipc-rpc-kafka feature instead.

Next, configure the Kafka on Minion using:

echo 'bootstrap.servers=127.0.0.1:9092
acks=1' > "$MINION_HOME/etc/org.opennms.core.ipc.rpc.kafka.cfg"

A list of all the available options for kafka producer/consumer configuration can be found here. Producer Configs. New Consumer Configs

18.3. Using AWS SQS

By default, OpenNMS Horizon uses an ActiveMQ broker to communicate with Minions. This broker is used for both issuing remote procedure calls (RPCs, ie. ping this host) and for transporting unsolicited messages such as SNMP traps and syslog messages.

AWS SQS can be used as an alternative to ActiveMQ for both remote procedure calls and transporting the unsolicited messages.

AWS SQS must be enabled on both OpenNMS Horizon and Minion to function.

18.3.1. OpenNMS Horizon Configuration

Enable and configure the AWS SQS on OpenNMS Horizon by using the following commands. The initialSleepTime property will ensure that messages are not consumed from AWS SQS until the OpenNMS Horizon system has fully initialized.

echo 'org.opennms.core.ipc.rpc.strategy=sqs
org.opennms.core.ipc.sink.strategy=sqs
org.opennms.core.ipc.sink.initialSleepTime=60000
org.opennms.core.ipc.aws.sqs.aws_region=us-east-1' > "$OPENNMS_HOME/etc/opennms.properties.d/aws-sqs.properties"

AWS Credentials are required in order to access SQS. The default credential provider chain looks for credentials in this order:

  • Environment Variables (i.e. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY)

  • Java system properties (i.e. aws.accessKeyId and aws.secretKey. These keys can be added to $OPENNMS_HOME/etc/opennms.conf)

  • Default credential profiles file (i.e. ~/.aws/credentials)

  • Amazon ECS container credentials (i.e. AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)

  • Instance profile credentials (i.e. through the metadata service when running on EC2)

Alternatively, the credentials can be specified inside the aws-sqs.properties file:

echo 'org.opennms.core.ipc.aws.sqs.aws_access_key_id=XXXXXXXXXXX
org.opennms.core.ipc.aws.sqs.aws_secret_access_key=XXXXXXXXXXX' >> "$OPENNMS_HOME/etc/opennms.properties.d/aws-sqs.properties"

When running OpenNMS inside AWS, it is possible to use the default provider chain with an IAM Role to avoid hard coding the AWS Credentials on a configuration file. The following shows an example of the role that should be associated with the EC2 instance on which OpenNMS is going to run:

aws iam role

If you require consistent ordering of the messages, you should use FIFO queues instead of Standard queues. You can enable FIFO queues by adding the following parameter to the aws-sqs.properties file referenced above:

org.opennms.core.ipc.aws.sqs.sink.FifoQueue=true

Restart OpenNMS Horizon to apply the changes.

18.3.2. Minion Configuration

Enable the AWS SQS on Minion using:

echo '!minion-jms
!opennms-core-ipc-rpc-jms
!opennms-core-ipc-sink-camel
opennms-core-ipc-rpc-aws-sqs
opennms-core-ipc-sink-aws-sqs' > "$MINION_HOME/etc/featuresBoot.d/aws-sqs.boot"
The snippet above prevents the default JMS related features from starting and loads the SQS related features instead.

Next, configure AWS SQS on Minion using:

echo 'aws_region=us-east-1
aws_access_key_id=XXXXXXXXXXX
aws_secret_access_key=XXXXXXXXXXX' > "$MINION_HOME/etc/org.opennms.core.ipc.aws.sqs.cfg"

The AWS credentials are required. If they are not specified on the configuration file, the default credentials provider chain (explained above) will be used instead.

If you require consistent ordering to the messages, you should use FIFO queues instead of Standard queues. You can enable FIFO queues by adding the following parameter to the org.opennms.core.ipc.aws.sqs.cfg file referenced above:

sink.FifoQueue=true

Restart Minion to apply the changes.

AWS credentials are required when the Minion is not running inside a VPC.
The Minion SQS settings must match what OpenNMS currently has. This is particularly critical for the FifoQueue setting.

18.3.3. SQS Configuration Settings

From the Amazon SQS Documentation, the following tables list parameters which can be added to either Minion (via MINION_HOME/etc/org.opennms.core.ipc.aws.sqs.cfg) or OpenNMS Horizon (via OPENNMS_HOME/etc/opennms.properties.d/aws-sqs.properties), along with the correct syntax for each environment.

Sink Settings

Queues used for reception of unsolicited messages (e.g. SNMP traps, syslog messages) are configured by setting properties with sink prepended to the SQS parameter name:

Parameter Notes OpenNMS Horizon Minion

DelaySeconds

Default: 0 seconds

org.opennms.core.ipc.aws.sqs.sink.DelaySeconds

sink.DelaySeconds

MaximumMessageSize

Default: 262144 bytes

org.opennms.core.ipc.aws.sqs.sink.MaximumMessageSize

sink.MaximumMessageSize

MessageRetentionPeriod

Default: 1209600 seconds

org.opennms.core.ipc.aws.sqs.sink.MessageRetentionPeriod

sink.MessageRetentionPeriod

ReceiveMessageWaitTimeSeconds

Default: 10 seconds (for OpenNMS)

org.opennms.core.ipc.aws.sqs.sink.ReceiveMessageWaitTimeSeconds

sink.ReceiveMessageWaitTimeSeconds

VisibilityTimeout

Default: 30 seconds

org.opennms.core.ipc.aws.sqs.sink.VisibilityTimeout

sink.VisibilityTimeout

Policy

-

org.opennms.core.ipc.aws.sqs.sink.Policy

sink.Policy

RedrivePolicy

-

org.opennms.core.ipc.aws.sqs.sink.RedrivePolicy

sink.RedrivePolicy

KmsMasterKeyId

-

org.opennms.core.ipc.aws.sqs.sink.KmsMasterKeyId

sink.KmsMasterKeyId

KmsDataKeyReusePeriodSeconds

-

org.opennms.core.ipc.aws.sqs.sink.KmsDataKeyReusePeriodSeconds

sink.KmsDataKeyReusePeriodSeconds

FifoQueue

Default: false

org.opennms.core.ipc.aws.sqs.sink.FifoQueue

sink.FifoQueue

ContentBasedDeduplication

Valid only when sink.FifoQueue is true

org.opennms.core.ipc.aws.sqs.sink.ContentBasedDeduplication

sink.ContentBasedDeduplication

RPC Settings

Queues used for provisioning, service polling, data collection, and other concerns apart from unsolicited message reception are configured by setting properties with rpc prepended to the SQS parameter name:

Parameter Notes OpenNMS Horizon Minion

DelaySeconds

Default: 0 seconds

org.opennms.core.ipc.aws.sqs.rpc.DelaySeconds

rpc.DelaySeconds

MaximumMessageSize

Default: 262144 bytes

org.opennms.core.ipc.aws.sqs.rpc.MaximumMessageSize

rpc.MaximumMessageSize

MessageRetentionPeriod

Default: 1209600 seconds

org.opennms.core.ipc.aws.sqs.rpc.MessageRetentionPeriod

rpc.MessageRetentionPeriod

ReceiveMessageWaitTimeSeconds

Default: 10 seconds (for OpenNMS)

org.opennms.core.ipc.aws.sqs.rpc.ReceiveMessageWaitTimeSeconds

rpc.ReceiveMessageWaitTimeSeconds

VisibilityTimeout

Default: 30 seconds

org.opennms.core.ipc.aws.sqs.rpc.VisibilityTimeout

rpc.VisibilityTimeout

Policy

-

org.opennms.core.ipc.aws.sqs.rpc.Policy

rpc.Policy

RedrivePolicy

-

org.opennms.core.ipc.aws.sqs.rpc.RedrivePolicy

rpc.RedrivePolicy

KmsMasterKeyId

-

org.opennms.core.ipc.aws.sqs.rpc.KmsMasterKeyId

rpc.KmsMasterKeyId

KmsDataKeyReusePeriodSeconds

-

org.opennms.core.ipc.aws.sqs.rpc.KmsDataKeyReusePeriodSeconds

rpc.KmsDataKeyReusePeriodSeconds

FifoQueue

Default: false

org.opennms.core.ipc.aws.sqs.rpc.FifoQueue

rpc.FifoQueue

ContentBasedDeduplication

Valid only when rpc.FifoQueue is true

org.opennms.core.ipc.aws.sqs.rpc.ContentBasedDeduplication

rpc.ContentBasedDeduplication

When FIFO queues are not required, there is no need to add FifoQueue=false to the configuration files, as this is the default behavior.

18.3.4. Managing Multiple Environments

In order to support multiple OpenNMS Horizon environments in a single AWS region, the aws_queue_name_prefix property can be used to prefix the queue names.

For example, if we set this property to be "PROD", the queue names will resemble PROD-OpenNMS-Sink-Heartbeat, instead of OpenNMS-Sink-Heartbeat.

This property must be properly configured at OpenNMS Horizon and Minion side.

18.3.5. AWS Credentials

The credentials (a.k.a. the Access Key ID and the Secret Access Key) are required in both sides, OpenNMS and Minion.

In order to create credentials just for accessing SQS resources, follow this procedure:

  • From the AWS Console, choose the appropriate region.

  • Open the IAM Dashboard and click on "Add user".

  • Choose a name for the user, for example opennms-minion.

  • Check only Programmatic access for the Access type.

  • On the permissions, click on Attach existing policies directly.

  • On the search bar, write SQS, and then check on AmazonSQSFullAccess.

  • Click on Create User

aws minion user

Finally, either click on Download .csv or click on "Show" to grab a copy of the Access key ID, and the Secret access key.

18.3.6. Limitations

There are a number of limitations when using AWS SQS, in particular:

  • A message can include only XML, JSON, and unformatted text. The following Unicode characters are allowed: #x9 | #xA | #xD | #x20 to #xD7FF | #xE000 to #xFFFD | #x10000 to #x10FFFF. Any characters not included in this list are rejected.

  • The minimum message size is 1 byte (1 character). The maximum is 262,144 bytes (256 KB).

  • Without batching, FIFO queues can support up to 300 messages per second (300 send, receive, or delete operations per second).

See Amazon SQS Limits for further details.

Location names

Queue names in AWS SQS are limited to 80 characters. When issuing remote procedure calls, the target location is used a part of the queue name. For this reason, it is important that:

  • The length of the location name and queue name prefix (if used) must not exceed 32 characters in aggregate.

  • Both the location name and queue name prefix (if used) may only contain alphanumeric characters, hyphens (-), and underscores (_).

19. Plugin Manager

With the introduction of Karaf as an OSGi application container, OpenNMS Horizon now has the ability to install or upgrade features on top of a running instance of OpenNMS Horizon. In addition, the new distributed OSGi architecture allows an OpenNMS Horizon system to be deployed as multiple software modules each running in their own Karaf instance.

The OpenNMS Horizon Plugin Manager_ provides a unified interface for managing the lifecycle of optional OSGi plugins installed in OpenNMS Horizon or in any Karaf instances which it manages. This need not be limited to Karaf instances running OpenNMS Horizon but can also be used to deploy modules to Karaf instances running user applications.

In addition to managing the installation of OSGi features, the Plugin Manager also allows the installation of licence keys which can be used to enable features for a particular instance of OpenNMS Horizon. Although the OpenNMS Horizon platform remains open source, this provides a mechanism for third parties developing features on top of the OpenNMS Horizon platform to manage access to their software.

The Plugin Manager also provides a mechanism for a separate 'app-store' or Available Plugins Server to be used to deliver these new features and / or licences into a particular OpenNMS Horizon instance. It is also possible to deliver software without access to the internet using the traditional Karaf Kar/RPM deployment model. (Kar files are a form of zip file containing bundles and features definitions which can be deployed in the Karaf /deploy directory). These can be placed in the /deploy directory directly or installed there using an RPM). In this case a number of features can be delivered together in a single software package but each only enabled at run time using the Plugin Manager.

OpenNMS Horizon plugins are standard Karaf features with additional metadata which describes the feature and the licence (if any) required. A plugin requiring a licence will not start if a valid licence string is not also installed.

In addition to options described in the licence metadata which is publicly accessible, licences can also contain encrypted secret properties which can only be decrypted when the licence in authenticated. After licence authentication, these properties are then available to a plugin as properties of it’s licenceAuthenticator object.

Note that Karaf's features mechanism has not been modified in any way. The Plugin Manager simply provides a user front end and additional metadata for features. Plugin features can be installed from the internal features repository, remote maven repositories or from Kar files placed in the deploy directory depending on how the Karaf configuration is set up. The standard OpenNMS Horizon configuration has no remote maven access enabled for Karaf and external features must be locally provisioned as a Kar or an RPM before being enabled with the Plugin Manager.

This guide describes how to deploy and manage plugins using the Plugin Manager. A separate plugin developer’s guide is provided for those wishing to write their own plugins or generate licences.

19.1. Plugin Manager UI

The Plugin Manager page is split into four quadrants.

The top left quadrant is a panel for setting access properties for each of the managed karaf instances including the local OpenNMS Horizon instance. In order to access any information through the Plugin Manager, users must enter the url, admin or ReST username and password of the remote karaf being managed by editing its entry in the karaf instance list. This is done by selecting the required karaf entry and selecting edit karaf instance button. The local OpenNMS Horizon system is designated by the localhost entry which cannot be removed. NOTE that the localhost entry in the karaf instance list also needs to have an entry matching the admin or ReST users of the localhost system for anything to work.

The top right quadrant is a panel for displaying response messages to any action performed. When any operation is performed in the plugin manager, the result is displayed. The full error message associated with any failures can also be viewed.

The bottom right quadrant allows a remote plugin repository and shopping cart to be set up.

The bottom left quadrant contains panels for showing the installed plugins, for setting up a plugin manifest, selecting locally or remotely hosted plugins to be installed and for controlling the installed licences.

19.2. Plugin Manager UI panel

The Plugin Manager is accessed as an entry in the Additional Tools panel of the OpenNMS Horizon Admin Gui.

01 adminPageEntry

The Plugin Manager administration page is split into six main areas as illustrated below.

  1. Top Left is the Karaf Instance data panel which lists the Karaf instances known to the Plugin Manager. When a Karaf instance is selected, the data on the rest of the page refers to the selected instance.

  2. Bottom Left is the Available Plugins Server Panel which is used to set the address and passwords to access the Available Plugins Server and / or the list of locally available plugins provided by a Kar or RPM.

  3. Top Right, just below the main OpenNMS Horizon menu bar are links to three diagnostic pages which can help test the ReST interface to remote Karaf Instances.

  4. Middle Right is a messages panel which reports the status of any operations. If an operation fails, the full error message can be viewed by pressing the error message button.

  5. Bottom Right is a tabbed panel which reflects the status of the plugins and licences installed in the Karaf instance selected by the Karaf Instance data panel.

02 pluginmgrFullPage

19.3. Setting Karaf Instance Data

The Karaf instances known to the Plugin Manager are listed in the Karaf Instance data panel. Localhost refers to the local OpenNMS Horizon server and is always an option in the panel. The Karaf instance data is persisted locally and should be refreshed from remote sources using the reload Karaf instance data button before changes are made.

Please note that the Localhost configuration in the Plugin Manager by default uses admin for both the username and the password. This will not work in a production OpenNMS where you have changed the admin user password. You should edit the Localhost configuration using the edit instance list button to match your local configuration)

Each Karaf instance must have a unique system id which is used to update its configuration and also to validate its licences. The system id it must be unique and included a checksum. A new random system id can be generated for a Karaf instance using a button on the panel.

In most situations the remote Karaf instance can be accessed from the OpenNMS Horizon Plugin Manager. However in many cases, the remote Karaf will be behind a firewall in which case it must initiate the communications to request its configuration and supply an update on its status.

The Remote is Accessible field tells the Plugin Manager which mode of operation is in use.

Remote request of configuration is not yet fully implemented and will be completed in a future release.
Table 35. Karaf Instance Fields
Field Name Description

Instance Name

host Name of the Karaf instance

Karaf URL

URL used to access the Karaf Plugin Manager ReST API

Current Instance System ID

The system ID currently installed in the Karaf system

Manifest System ID

The system ID to be provisioned in the Karaf system

Remote is Accessible

If ticked 'true', the Plugin Manager will try and contact the remote Karaf instance using the URL. If not ticked (i.e. false), the remote Karaf instance must request its configuration.

Allow Status Update from Remote

Allow the remote Karaf instance to request an update to its remote configuration from the locally held manifest and at the same time to update its status.

03 karafInstanceData

19.4. Manually adding a managed Karaf instance

The list of Karaf instances can be modified using the Karaf instance editor illustrated below. The same fields apply as above.

04 karafinstanceeditor

19.5. Installed Plugins

Under plugin settings, the Installed Plugins tab lists which plugins are currently installed in the Karaf instance selected in the Karaf instance data panel. Installed plugins can be uninstalled by selecting the plugin on the list and selecting 'uninstall' or reinstalled by selecting the reinstall button. However Plugins designated as System Plugins (i.e. the System Plugin checkbox is ticked and grayed out) cannot be uninstalled through the UI. (The Plugin Manager is itself a system plugin).

Each plugin has metadata associated with it which is used to identify and describe the plugin.

Table 36. Plugin Metadata Fields
Plugin Metadata Description

Product ID

The unique key used to identify the name and version of the feature. (Same as Karaf Feature Name/Version)

Licence Key Required

If true (ticked), this plugin needs a licence key to start

Licence Validated

If a licence key is required, a green text label will indicate if the licence has been installed and validated. Otherwise a red text label will indicate an invalid licence

System Plugin

If true (ticked) this is a system plugin and cannot be removed.

Packaging Descriptor

This describes the packaging mechanism by which the feature was delivered. This will refer to a Kar if the feature was manually installed as a Kar/RPM on the host server.

Feature Repository URL

The URL identifying the feature repository (Same as Karaf Feature Repository URL)

Product Description

A textual description of the functionality provided by the plugin.

Product URL

A URL to point to the plugin’s documentation / web site

Licence Type

A description of the licence applied to the plugin (May be GPL if the plugin is not subject to an EULA)

Organisation

The organisation issuing the plugin and/or licence.

05 installedPlugins
The installed plugins tab shows the data retrieved the last time the Reload Karaf Instance data button was pressed. (This allow us to maintain a record of offline Karaf instances). However it also means that the localhost data may not be up to date with the local Karaf instance. You should always reload to get the accurate picture of what is currently installed.

19.6. Available Plugins Server

Plugins which are available to be installed in OpenNMS Horizon are either listed in the Local Available Plugins tab or the Remote Available Plugins tab. Local Available Plugins are plugins which are available as standard packaged with the OpenNMS Horizon build.

The Plugin Manager gets this list from the local system using the rest interface with the admin user and password.

The Plugin Manager obtains a list of available plugins from the Available Plugin’s server.

Available Plugin’s server can be part of an externally hosted plugin shopping cart or it can simply be a url serving the internal list of available plugins as described in the section on Internal Plugins.

In order for externally downloaded plugins to be installed, the Available Plugin’s server must have a related maven repository from which Karaf can download the feature. By default feature download is not enabled in OpenNMS Horizon. To enable Karaf external feature download, the address of the maven repository should be entered in the org.ops4j.pax.url.mvn.cfg file in the OpenNMS Horizon /etc directory.

Alternatively the Plugin Manager can list the available plugins which have been installed on the local machine as bundled Plugin Kar’s (using the Karaf Kar deploy mechanism) along with any internal plugins bundled with OpenNMS Horizon. In this case, the Plugin Server URL should be pointed at http:\\localhost:8980\opennms.

The admin username and passwords are used to access the Available Plugins Server. If a shopping cart is provided for obtaining licences, the URL of the shopping cart should be filled in.

06 availablePluginsServer

19.7. Installing Available Plugins

The Available Plugins panel list the plugins which are available and listed by the Available Plugins server. These can be directly installed into the selected Karaf instance or can be posted to a manifest for later installation. If a plugin is installed, the system will try and start it. However if a corresponding licence is required and not installed, the features will be loaded but not started. You must restart the feature if you later install a licence key.

07 availablePlugins

19.8. Plugins Manifest

The Plugins Manifest for a given Karaf instance lists the target plugins which the Karaf instance should install when it next contacts the licence manager. If the Plugin Manager can communicate with the remote server, then a manifest can be selected for immediate installation. A manual manifest entry can also be created for a feature. This can be used to install features which are not listed in the Available Features list.

08 pluginManifest

19.9. Installing Internal Plugins

OpenNMS Horizon is packaged with an internal repository of plugins which are shipped with the OpenNMS Horizon distribution. These plugins can be installed in the local OpenNMS Horizon Karaf instance and activated by a user using the Plugin Manager in the same way it could be used to download and install external plugins.

The internal-plugin-descriptor feature maintains a list of internal plugins which are packaged with OpenNMS Horizon. This list of internal plugins can be accessed by the Plugin Manager through the Local Available Plugins Panel.

The same list can also be accessed through the remote plugins panel if the Available Plugins Server entry is set to point to the local _{opennms-product-name}_ instance.
To do this set Plugin Server URL to the address of the local _{opennms-product-name}_ (i.e. http:\\localhost:8980\opennms) and set the Plugin Server Username and Plugin Server Password to match the _{opennms-product-name}_ ReST or admin username and password.

Clicking Reload available plugins will then add the list of available internal plugins to the Available Plugins Tab where they can be installed and started by the user as described previously.

The internal plugins included with this OpenNMS Horizon release are documented in a later section.

19.10. Installed Licences Panel

Each licence has a licence ID which is the Karaf feature ID of the feature to which the licence refers. Many licences can be installed on a system but only one licence string is allowed per feature ID.

Licence Strings are used to validate that a particular feature can be run on a given Karaf instance. The Plugin Manager will not allow a feature to run if it’s licence cannot be validated using a private key encoded in the feature bundle.

Licences are associated with specific Product ID’s and specific Karaf instances. Several Karaf instances can be listed in a licence allowing a feature to run on more than one system using the same licence. When a licence is installed, the licence metadata is decoded and displayed.

A licence may be installed before or after its associated feature is installed. If a licence is installed after the feature the feature must be restarted before the licence will be read.
09 installedLicences

19.11. Adding a New Licence

New licences are added using the add licence panel. Licences are obtained from the App Store where they can be generated by a user for a given set of system id’s.

A licence must be copied (cut and paste) from the app store into the add licence panel. The Validate licence button should be used to check the licence has been installed correctly. Please note that this just checks the integrity of the licence string. A licence is only authenticated once it is installed and the corresponding feature bundle checks it on start-up.

10 addLicence

The system provides a robust licencing mechanism which would be difficult to spoof and is considered sufficient for most applications. However it should not be considered cryptographic secure to the extent that a determined hacker could not break the system.

19.12. Licence Structure

The following note describes how licence keys are structured.

Licence keys contain machine readable metadata which can be accessed without decryption. The metadata is digitally signed with an encrypted hash which must be decrypted in order to verify the licence data.

Licence keys may also contain additional encrypted secret properties. The secret properties are intended to securely convey application specific secrets such as passwords or keys needed to access remote services.

The entire licence has a Crc32 Checksum appended to ensure it is conveyed intact when it is installed.

The licence keys consist of Hexadecimal strings as ascii printable characters in three (or four) sections separated by the ':' character as follows;

With secret properties:
<licenceMetadataHexStr>:<encryptedHashStr>":"<aesSecretKeyStr>:<encryptedSecretPropertiesStr>-<Crc32Checksum>

Without secret properties:
<licenceMetadataHexStr>:<encryptedHashStr>":"<aesSecretKeyStr>-<Crc32Checksum>

The licenceMetadataHexStr is a Hexadecimal encoded version of the XML licence metadata. This section is not encrypted and may be read without decoding so that the key features of the licence may be displayed without access to the licence keys.

The encryptedHashStr is a Hexadecimal version of the encrypted hash of the licenceMetadataHexStr. If the hash can be decrypted and the resulting hash matches a hash of the licenceMetadataHexStr, then the licence is deemed to be validated. Note that the start time and duration of the licence and the unique system id in the licence must match the local context for the licence to be fully activated.

If the encryptedSecretPropertiesStr is present it contains an encrypted version of the secret properties (as name value pairs) supplied with the licence. (Note that the size of the original properties are limited to 245 bytes by the encrypting algorithm).

Private key encryption is used to encrypt the metadata hash and the secret properties. The AesSymetricKeyCipher has a length of 124 bits which is the longest length key allowed without Government Export authorised without cryptographic extensions

The encryption key is held in the licence creation server as part of the licence specification. The decryption key is held in the remote licence authenticator where the licence is verified.

However the key held in the licence authenticator is itself encrypted and must first also be decrypted using the aesSecretKeyStr supplied with the licence. This means that a licence can only be validated and the secret properties decrypted if the remote licence authenticator is itself unlocked by the licence.

20. Internal Plugins

20.1. Internal Plugins supplied with OpenNMS Horizon

OpenNMS Horizon includes a number of plugins which can be installed by the Plugin Manager UI or directly from the Karaf consol. Plugins are simply Karaf features which have additional metadata describing the Plugin and possibly defining that the Plugin also needs a licence installed to run.

Once installed, the plugins will always start when OpenNMS is restarted. If the plugins appear not to be working properly, you should check the /data/log/karaf.log file for problems.

Each internal plugin supplied with OpenNMS Horizon is described in its own section below.

20.2. Installing Plugins with the Karaf Consol

The easiest way to install a plugin is to use the Plugin Manager UI described in the Plugin Manager section. However plugins can also be installed using the Karaf consol. To use the Karaf consol, you need to open the karaf command prompt using

ssh -p 8101 admin@localhost
(or ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no if no host checking is wanted)

To install or remove a feature in Karaf use

karaf@root> feature:install <feature name>
karaf@root> feature:uninstall <feature name>

You can see which plugins are installed using

karaf@root> product-reg:list

21. Special Cases and Workarounds

21.1. Overriding SNMP Client Behavior

By default, the SNMP subsystem in OpenNMS Horizon does not treat any RFC 3416 error-status as fatal. Instead, it will attempt to continue the request, if possible. However, only a subset of errors will cause OpenNMS Horizon’s SNMP client to attempt retries. The default SNMP error-status handling behavior is as follows:

Table 37. Default SNMP Error Status Behavior
error-status Fatal? Retry?

noError(0)

false

false

tooBig(1)

false

true

noSuchName(2)

false

true

badValue(3)

false

false

readOnly(4)

false

false

genErr(5)

false

true

noAccess(6)

false

true

wrongType(7)

false

false

wrongLength(8)

false

false

wrongEncoding(9)

false

false

wrongValue(10)

false

false

noCreation(11)

false

false

inconsistentValue(12)

false

false

resourceUnavailable(13)

false

false

commitFailed(14)

false

false

undoFailed(15)

false

false

authorizationError(16)

false

true

notWritable(17)

false

false

inconsistentName(18)

false

false

You can override this behavior by setting a property inside ${OPENNMS_HOME}/etc/opennms.properties in the form:

org.opennms.netmgt.snmp.errorStatus.[statusCode].[type]

For example, to make authorizationError(16) abort and not retry, you would set:

org.opennms.netmgt.snmp.errorStatus.16.fatal=true
org.opennms.netmgt.snmp.errorStatus.16.retry=false

22. IFTTT Integration

The free web-based service IFTTT allows to combine web applications using simple conditional instructions. Each supported service has several triggers that can be used to trigger actions of other services. This allows for example to change brightness and color of a smart bulb, send messages or date to IoT devices.

The OpenNMS Horizon integration makes uses of the so-called "Webhooks" service, that allows to trigger actions when a specific web-request was received. The basic operation is as follows: OpenNMS Horizon polls for alarms with associated nodes and matches a given category filter. For the resulting alarm set the maximum severity and total count is computed. If one of these values changed compared to the last poll one or more events specified for the computed maximum severity will be sent to IFTTT.

22.1. IFTTT Configuration

In order to use the IFTTT integration in OpenNMS Horizon you need an IFTTT account. With this account you are able to create so-called applets that combine a trigger with an action. In our case we use the "Webhooks" service as the trigger and define the event name OpenNMS. After this step you can combine this trigger with any of the possible supported services and their actions.

Webhooks service trigger definition

trigger definition small

In your account service settings for the "Webhooks" service you find your key in the given service URL. In the following example this key is X71dfUZsH4Wkl6cjsLjdV.

Webhooks service settings

webhooks settings small

On the side of OpenNMS Horizon you need a configuration that defines which event names to send on an alarm count or severity change. The configuration file ifttt-config.xml contains so called trigger packages.

The operation is as follows: OpenNMS Horizon retrieves all alarms that have a node associated. Each trigger package defines whether only acknowledged alarms should be taken into account. It then computes the maximum severity and alarm count for each trigger package’s category filter. After that it triggers all events defined in the corresponding trigger sets for the computed maximum severity. The category filter accepts Java regular expressions. Using an empty category filter will use all unacknowledged alarms with an associated node.

Each trigger inside a trigger set defines the event name to be triggered and three additional values. These values can be used to set additional attributes for the corresponding IFTTT applet action. The following trigger sets can be defined:

Name Execution

ON

on start of the IFTTT alarm polling daemon to switch on a device

OFF

on stop of the IFTTT alarm polling daemon to switch off a device

NORMAL

if severity is NORMAL

WARNING

if severity is WARNING

MINOR

if severity is MINOR

MAJOR

if severity is MAJOR

CRITICAL

if severity is CRITICAL

There are also ON and OFF available for the trigger set definition. The ON event will be sent when the polling daemon is started and the OFF when it is stopped. These events can be used to powering up/down and initializing devices.

22.2. OpenNMS Configuration

IFTTT alarm polling will be enabled by setting the attribute enabled to true in the ifttt-config.xml file. It is also possible to configure the polling interval. The following trigger package defined the trigger sets which itself define a sequence of events to be triggered at IFTTT. Each trigger defines the eventName and an additional delay. This allows to defer the execution of the next trigger in a trigger set.

22.3. Example

The following example shows the configuration file for a WiFi light bulb controlled via IFTTT. The defined applets use value1 for setting the color and value2 for setting the brightness. The third value demonstrate the use of placeholders. For the severity-based trigger sets the following placeholders can be used in the three value fields: %os%/%oldSeverity for old severity, %ns%/%newSeverity% for new severity, %oc%/%oldCount for old alarm count and %nc%/``%newCount% for new alarm count. This is useful for sending messages or operating LED displays via IFTTT.

<ifttt-config enabled="true" key="X71dfUZsH4Wkl6cjsLjdV" pollInterval="30">
    <trigger-package categoryFilter="Routers|Switches" onlyUnacknowledged="true">
        <trigger-set name="ON">
            <trigger eventName="on" delay="0">
                <value1></value1>
                <value2></value2>
                <value3></value3>
            </trigger>
        </trigger-set>

        <trigger-set name="OFF">
            <trigger eventName="off" delay="0">
                <value1></value1>
                <value2></value2>
                <value3></value3>
            </trigger>
        </trigger-set>

        <trigger-set name="NORMAL">
            <trigger eventName="OpenNMS" delay="0">
                <value1>#336600</value1>
                <value2>0.40</value2>
                <value3>%os%,%ns%,%oc%,%nc%</value3>
            </trigger>
        </trigger-set>

        <trigger-set name="WARNING">
            <trigger eventName="OpenNMS" delay="0">
                <value1>#FFCC00</value1>
                <value2>0.50</value2>
                <value3>%os%,%ns%,%oc%,%nc%</value3>
            </trigger>
        </trigger-set>

        <trigger-set name="MINOR">
            <trigger eventName="OpenNMS" delay="0">
                <value1>#FF9900</value1>
                <value2>0.60</value2>
                <value3>%os%,%ns%,%oc%,%nc%</value3>
            </trigger>
        </trigger-set>

        <trigger-set name="MAJOR">
            <trigger eventName="OpenNMS" delay="0">
                <value1>#CC3300</value1>
                <value2>0.70</value2>
                <value3>%os%,%ns%,%oc%,%nc%</value3>
            </trigger>
        </trigger-set>

        <trigger-set name="CRITICAL">
            <trigger eventName="OpenNMS" delay="0">
                <value1>#FF0000</value1>
                <value2>0.80</value2>
                <value3>%os%,%ns%,%oc%,%nc%</value3>
            </trigger>
        </trigger-set>
    <trigger-package>
</ifttt-config>

23. Telemetry Daemon

The telemetry daemon (telemetryd) provides an extensible framework that can be used to handle sensor data pushed to OpenNMS Horizon. The extensible framework is used to implement support for a variety of applications which use different protocols to transfer metrics. In telemetryd an operator can define a series of protocols, each of which has at least one Listener, and at least one Adapter.

telemetryd overview
Figure 36. Generic component overview of protocol implementations in Telemetryd

The Listener and Adapter together with it’s configuration build a Protocol for an application.

23.1. What is a Listener

A Listener is responsible for receiving sensor data from some external source. For example, this may include listening for packets from an UDP socket, retrieving messages from an MQTT topic, etc…​ It is possible to configure multiple Listeners.

23.2. What is an Adapter

An Adapter is responsible for processing the byte streams dispatched by the Listeners. For example, this may include decoding a specific JSON format, persisting metrics and/or generating events.

The framework does not make any assumption about the data about being received or processed, leaving this up to the Listener and Adapter implementation.

In case you have multiple Adapters, the execution order is the same as defined in the telemetryd-configuration.xml.

23.3. What are Protocols

A Protocol is composed with at least one Listener and at least one Adapter and their configuration. With a Protocol it is possible to process sensor data from Juniper Telemetry Interface (JTI) or Netflow v5.

23.4. Push Sensor Data through Minion

Listeners may run on either OpenNMS Horizon or Minion, whereas adapters run exclusively on OpenNMS Horizon. If a listener is running on Minion, the received messages will be automatically dispatched to the associated adapter(s) running in OpenNMS Horizon.

telemetryd minion
Figure 37. Running Listener on a Minion forwarding packets using the messaging system

24. Elasticsearch Integration

OpenNMS Horizon persists/forwards certain data to Elasticsearch.

The following chapters describe the configuration possibilities as well as the available features.

Internally all Elasticsearch integrations use the Jest library to access the Elasticsearch ReST interface.

24.1. Configuration

The configuration is feature dependant and therefore must take place in the feature configuration file in ${OPENNMS_HOME}/etc/org.opennms.features.flows.persistence.elastic.cfg.

The following properties can be set:

Property Description Required default

elasticUrl

URL(s) to Elasticsearch nodes. Can either point directly to ReST API or seed nodes. The format is: <host>:<port>. Comma separate multiple values.

required

http://localhost:9200

elasticIndexStrategy

Index strategy for data, allowed values yearly, monthly, daily, hourly

optional

daily

globalElasticUser

Username to use for all nodes, when X-Pack Security is configured.

optional

-

globalElasticPassword

Password to use for all nodes, when X-Pack Security is configured.

optional

-

defaultMaxTotalConnectionPerRoute

Sets the default max connections per route. If a negative value is given, the value is ignored.

optional

<available processors> * 2

maxTotalConnection

Sets the default max total connections. If a negative value is given, the value is ignored.

optional

<max connections per route> * 3

nodeDiscovery

Enable/Disable node discovery. Valid values are true|false.

optional

false

nodeDiscoveryFrequency

Defines the frequency in seconds in which the nodes are re-discovered. Must be set, if discovery=true

optional

-

proxy

Allows defining a HTTP proxy. Only accepts valid URLs.

optional

-

retries

Defines how many times an operation is retried before considered failed.

optional

0

retryCooldown

Defines the cooldown in ms to wait before retrying. Value of 0 means no cooldown. Value must be >= 0.

optional

500

connTimeout

Defines the connection timeout in ms.

optional

5000

readTimeout

Defines the read timeout in ms.

optional

30000

bulkRetryCount

Defines the number of retries performed before a bulk operation is considered as failed. When bulk operations fail, only the failed items are retried.

optional

5

settings.index.number_of_shards

The number of primary shards that an index should have. Refer to Elasticsearch Reference → Index Modules for more details.

optional

-

settings.index.number_of_replicas

The number of replicas each primary shard has. Refer to Elasticsearch Reference → Index Modules for more details.

optional

-

settings.index.refresh_interval

How often to perform a refresh operation, which makes recent changes to the index visible to search. Refer to Elasticsearch Reference → Index Modules for more details.

optional

-

settings.index.routing_partition_size

The number of shards a custom routing valuce can go to. Refer to Elasticsearch Reference → Index Modules for more details.

optional

-

If a configuration management tool is used, the properties file can be created and is used as startup configuration
If credentials are provided preemptive auth is used for all defined Elasticsearch nodes.
Configuration Example to access Elasticsearch
elasticUrl=http://elastic:9200
elasticIndexStrategy=daily
elasticGlobalUser=elastic
elasticGlobalPassword=changeme

24.2. Credentials

It is possible to define credentials for each Elasticsearch node individually. Credentials for each node must be stored in ${OPENNMS_HOME}/etc/elastic-credentials.xml.

Custom credentials
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<elastic-credentials>
    <credentials url="http://localhost:9200" username="ulf" password="ulf" />
    <credentials url="https://10.10.0.1:9333" username="ulf" password="flu" />
</elastic-credentials>
Credentials are globally defined and will be used by each feature.

24.3. Features

24.3.1. Version Matrix

Not all features are enabled by default and may require a certain version of Elasticsearch. Therefore the following table provides a version overview.

Name Supported Elastic Version Enabled by default Feature

Event and Alarm Forwarder

>= 5.0

no

opennms-es-rest

Flow Support

>= 6.2.4

yes

opennms-flows

24.3.2. Alarm and Event Forwarder

The Alarm and Event Forwarder (formerly known as the Elasticsearch ReST plugin) forwards events and alarms to Elasticsearch. In combination with the alarm Alarm Change Notifier Plugin it also forwards alarm change events.

The events and alarms in Elasticsearch can then be used for indexing, long time archival, plotting with Grafana and browsing with Kibana.

This feature uses the Elasticsearch ReST interface and can interact with cloud-hosted Elasticsearch instances.
If you use Kibana, make sure you are using the version that is compatible with your version of Elasticsearch.
Configuration

The configuration is held in ${OPENNMS_HOME}/etc/org.opennms.plugin.elasticsearch.rest.forwarder.cfg. Please refer to section Configuring Elasticsearch in order to configure Elasticsearch connection settings.

Besides the general Elasticsearch connection settings, the following properties are supported to configure the Alarm and Event Forwarder:

Parameter Default Value Required Description

logEventDescription

true

optional

Whether to forward the event description field to Elasticsearch. It can be disabled because it contains a long text field that can be redundant with the rest of the metadata included in the event.

archiveRawEvents

true

optional

Archive events.

archiveAlarms

true

optional

Archive alarms.

archiveAlarmChangeEvents

true

optional

Archive alarm change events.

archiveOldAlarmValues

true

optional

For alarm change events, we can choose to archive the detailed alarm values but this is expensive. Set false in production.

archiveNewAlarmValues

true

optional

archiveAssetData

true

optional

If true The following attributes representing useful node asset fields from the node asset table are included in archived events and alarms. These are included only where the values are not null or empty strings in the table.

(asset-latitude,asset-longitude,asset-region,asset-building,asset-floor,asset-room,asset-rack,asset-slot,asset-port,asset-category,asset-displaycategory,asset-notifycategory,asset-pollercategory,asset-thresholdcategory,asset-managedobjecttype,asset-managedobjectinstance,asset-manufacturer,asset-vendor,asset-modelnumber,parent-nodelabel,parent-nodeid,parent-foreignsource,parent-foreignid)

groupOidParameters

false

optional

If true all oid from the event parameters are stored in a single array p_oids instead of a flattened structue.

logAllEvents

false

optional

If changed to true, then archive all events even if they have not been persisted in the OpenNMS Horizon database.

batchSize

200

optional

Increase this value to enable batch inserts into Elasticsearch. This is the maximum size of a batch of events that is sent to Elasticsearch in a single connection.

batchInterval

500

optional

The maximum time interval in milliseconds between batch events (recommended: 500ms) when a batchSize value greater than 1 is being used.

Once you are sure everything is correctly configured, you can activate the Event & Alarm Forwarder by logging into the OSGi console and installing the feature: opennms-es-rest.

OSGi login and installation of the Elasticsearch forwarder
ssh admin@localhost -p 8101
feature:install opennms-es-rest
Loading Historical Events

It is possible to load historical OpenNMS Horizon events into Elasticsearch from the OpenNMS Horizon database using a karaf console command. The command uses the OpenNMS Horizon Events ReST interface to retrieve a set number of historical events and forward them to Elasticsearch. Because we are using the ReST interface it is also possible to contact a remote OpenNMS Horizon and download its events into Elasticsearch by using the correct remote URL and credentials.

The following example sends historic events to Elasticsearch using the karaf console:

# open karaf command prompt using
# ssh -p 8101 admin@localhost
karaf> elasticsearch:send-historic-events --username admin --password admin --url http://localhost:8980 --limit 10 --offset 0
For more details, consolidate the --help option of the command.
Index Definitions

Three indices are created; one for alarms, one for alarm change events and one for raw events. Alarms and alarm change events are only saved if the Alarm Change Notifier Plugin plugin is also installed to generate alarm change events from the OpenNMS Horizon alarms table. The index names are of the form (assuming an index strategy of monthly): opennms-<name>-<index-strategy>/type/id

For example

a) Alarms

opennms-alarms-2017-01/alarmdata/1823

b) Alarm Change Events

opennms-events-alarmchange-2017-01/eventdata/11549

c) Raw OpenNMS Horizon events (not including alarm change events)

opennms-events-raw-2017-01/eventdata/11549
Viewing events using Kibana Sense

Kibana Sense is a Kibana app which allows you to run queries directly against Elasticsearch. (https://www.elastic.co/guide/en/sense/current/installing.html)

If you install Kibana Sense you can use the following commands to view the alarms and events sent to Elasticsearch You should review the Elasticsearch ReST API documentation to understand how searches are specified. (See https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html)

Example searches to use in Kibana Sense (you can copy the whole contents of this panel into Kibana Sense as a set of examples)

# Search all the alarms indexes

GET /opennms-alarms-*/_search

# Get all of the alarms indexes

GET /opennms-alarms-*/

# Get a specific alarm id from the 2017.01 index

GET opennms-alarms-2017-01/alarmdata/1823

# Delete all alarm indexes

DELETE /opennms-alarms-*/

# Search all the events indexes

GET /opennms-events-*/_search

# Search all the raw events indexes

GET /opennms-events-raw*/_search

# Delete all the events indexes

DELETE /opennms-events-*/

# Get all the raw events indexes

GET /opennms-events-raw*/

# Get all the alarmchange event indexes

GET /opennms-events-alarmchange-*/

# Search all the alarm change event indexes

GET opennms-events-alarmchange-*/_search

# Get a specific alarm change event

GET opennms-events-alarmchange-2016-08/eventdata/11549
Mapping of Alarms and Events to Elasticsearch
Overview of index mapping

In OpenNMS Horizon, Alarm and Event table entries contain references to associated node, asset, service and journal message tables. In Elasticsearch, we must flatten these entries into a single index entry for each insertion. Thus each index entry contains more context information than would be found in the actual OpenNMS Horizon event or alarm. This context information includes the associated node and asset table information which was current when (but may have changed since) the event was archived.

In the Table of Index Mappings below we have example alarm and event JSON entries retrieved using a sense command. The table helps illustrate how OpenNMS Horizon saves data in Elasticsearch.

Internal Elasticsearch fields always begin with an underscore character. The internal fields id, _index and _type are combined to give the unique identifier for an entry as described above under Index Definitions. All of the fields under _source represent the stored alarm or event (_Elasticsearch documentation refers to source entries as indexed documents). The ID of each event is included in the _source id field and also duplicated in the internal _id.

Events in the OpenNMS Horizon events table (i.e. those corresponding to logs or traps) are copied directly to the opennms-events-raw- indexes. In OpenNMS Horizon alarms and events can contain parameters which are key-value pairs referencing additional data stored when the event or alarm is created. In Elasticsearch these parameters are always stored in separate fields in the index with names beginning with p_

Alarm change events created by the Alarm Change Notifier Plugin have an identical format to raw events but are only copied to the opennms-events-alarmchange- indexes. These alarm change events are also used to change the state of alarms in the opennms-alarms- indexes. Thus alarm entries in the opennms-alarms- indexes reflect the current state of alarms as notified by OpenNMS Horizon through alarm change events.

The parameters included with each type of Alarm Change Event are listed in the Alarm Change Notifier Plugin section. Each parameter in the index will have a p_ prefix (ie. %parm[newalarmvalues]% becomes p_newalarmvalues).

Alarms and Events have severity fields defined as integers (long) and also corresponding severity_text fields which give the text equivalent (Critical, Major, Minor, Normal, Cleared).

Additional Alarm Fields

The id of each alarm is included in the _source alarmid field and also duplicated in the internal _id reference for the alarms index. Alarm Change Events reference their associated alarm using the p_alarmid parameter. To make it easier to search for alarm change events associated with the same alarm, alarms also have a _source p_alarmid parameter which matches alarmid. Thus we should be able to search for an alarm in the opennms-alarms index and find its complete lifecycle from alarm raise to deletion in the opennms-events-alarmchange index.

The alarms index is enriched with additional data to allow the alarm entries to be used in SLA calculations.

Additional Alarm Fields description

alarmackduration

Calculated time in milliseconds from first event which created the alarm to the latest alarm acknowledgement.

alarmclearduration

Calculated time in milliseconds from first event which created the alarm to the latest alarm clear.

initialseverity

The final state of any given alarm in an alarm index should be cleared and deleted. Therefore we also include an initial severity.

initialseverity_text

The initial severity as a text field.

Table of Index Mapping

The following table describes the mapping of simple OpenNMS Horizon events to the Raw Events Index and the mapping of Alarm Change Events to the Alarm Change Events index and to the Alarms index. Note that fields that begin with an underscore (_) are internal to Elasticsearch.

Alarm Index Fields Event Index Fields (Alarm change and raw events) Description

Example Alarm JSON

Alarm Field

Event Field

Example Event JSON

Type

Description

{

{

{

{

"_index": "opennms-alarms-2017.03",

"_index":

"_index":

"_index": "opennms-events-alarmchange-2017.03",

string

_index is the index in which this alarm or event is stored.

"_type": "alarmdata",

"_type":

"_type":

"_type": "eventdata",

string

_type either alarmdata or eventdata

"_id": "31",

"_id":

"_id":

"_id": "1110",

string

_id field matches the event or alarm ID, if present.

"_score": 1,

"_score":

"_score":

"_score": 1,

long

Internal Elasticsearch ranking of the search result.

"_source": {

"_source":

"_source":

"_source": {

string

_source contains the data of the index entry.

"@timestamp": "2017-03-03T12:44:21.210Z",

"@timestamp":

"@timestamp":

"@timestamp": "2017-03-02T15:20:56.861Z",

date

For Alarms, @timestamp is alarm creation time based on the first event time. For Events, @timestamp is event time from event.getTime().

"dom": "3",

"dom":

"dom":

"dom": "2",

long

Day of month from @timestamp.

"dow": "6",

"dow":

"dow":

"dow": "5",

long

Day of week from @timestamp.

"hour": "12",

"hour":

"hour":

"hour": "15",

long

Hour of day from @timestamp.

"eventdescr":

"eventdescr": "<p>Alarm <a href=\"/opennms/alarm/detail.htm?id=30\">30</a> Cleared<p>…​",

string

Event description.

"eventseverity":

"eventseverity": "3",

long

Event severity.

Alarm Change Events:

All events have severity normal.

"eventseverity_text":

"eventseverity_text": "Normal",

string

Text representation of severity value.

"eventsource":

"eventsource": "AlarmChangeNotifier",

string

OpenNMS event source.

Alarm Change Events:

All events have the event source AlarmChangeNotifier.

"eventuei":

"eventuei": "uei.opennms.org/plugin/AlarmChangeNotificationEvent/AlarmCleared",

string

OpenNMS universal event identifier (UEI) of the event.

"id":

"id": "1110",

string

Event ID.

"interface":

"interface": "127.0.0.1",

string

IP address of the event.

"ipaddr":

"ipaddr": "/127.0.0.1",

string

IP address of the event.

"logmsg":

"logmsg": "<p>Alarm <a href=\"/opennms/alarm/detail.htm?id=30\">30</a> Cleared<p>",

string

Log message of the event.

Alarm Change Events:

Log messages contain a link to the alarm.

"logmsgdest":

"logmsgdest": "logndisplay",

string

Log Destination of the Event.

"p_newalarmvalues":

"p_newalarmvalues": "{
\"suppressedtime\":\"2017-03-02T14:24:59.282Z\",+ \"systemid\":\"00000000-0000-0000-0000-000000000000\",+ \"suppresseduntil\":\"2017-03-02T14:24:59.282Z\",+ \"description\":\"<p>SNMP data collection on interface 127.0.0.1\\n
failed.<\\/p>\",
\"mouseovertext\":null,
\"x733probablecause\":0,
\"lasteventid\":1072,
\"lasteventtime\":\"2017-03-02T14:24:59.282Z\",
\"managedobjectinstance\":null,
\"alarmacktime\":null,
\"qosalarmstate\":null,
\"ipaddr\":\"127.0.0.1\",
\"alarmackuser\":null,
\"nodeid\":88,
\"firsteventtime\":\"2017-03-02T14:24:59.282Z\",
\"severity\":2,
\"ifindex\":null,
\"alarmtype\":1,
\"x733alarmtype\":null,
\"logmsg\":\"SNMP data collection on interface 127.0.0.1 failed with Unexpected exception when collecting SNMP data for interface 127.0.0.1 at location Default.'.\",
\"tticketid\":null,
\"firstautomationtime\":null,
\"clearkey\":null,
\"managedobjecttype\":null,
\"eventuei\":\"uei.opennms.org\\/nodes\\/dataCollectionFailed\",
\"counter\":1,
\"applicationdn\":null,
\"operinstruct\":null,
\"ossprimarykey\":null,
\"stickymemo\":null,
\"tticketstate\":null,
\"alarmid\":30,
\"serviceid\":5,
\"reductionkey\":\"uei.opennms.org\\/nodes\\/dataCollectionFailed::88\",
\"suppresseduser\":null,
\"lastautomationtime\":null,
\"eventparms\":\"reason=Unexpected exception when collecting SNMP data for interface 127.0.0.1 at location Default.(string,text)\"}",

string

Alarm and event parameters are key-value pairs which can be associated with alarms or events. All parameters in Alarms or Events are stored in Elasticsearch in separate index fields with names beginning with p_.

Alarm Change Events:

Parameters p_oldalarmvalues and p_newalarmvalue contain a JSON string representing the alarm fields before and after the Alarm change respectively.

The p_newalarmvalue values are copied into the alarm index of the corresponding alarm (given by alarmid in p_newalarmvalue and by p_alarmid).

"p_oldalarmvalues":

"p_oldalarmvalues": "{ …​. }",

string

See p_newalarmvalues.

"p_oldseverity":

"p_oldseverity": "5",

long

Alarm Change Events:

Contains the old severity of the alarm before this alarm change event.

"alarmackduration": "2132249",

"alarmackduration":

long

Time in milliseconds from first event which created the alarm to the latest alarm acknowledgement.

"alarmacktime": "2017-03-03T13:19:53.351Z",

"alarmacktime":

"p_alarmacktime":

"p_alarmacktime": "2017-03-03T13:19:53.351Z",

date

AlarmChangeNotificationEvent/AlarmAcknowledged Events:

Time that the alarm was acknowledged.

"alarmackuser": "admin",

"alarmackuser":

"p_alarmackuser"

"p_alarmackuser": "admin",

AlarmChangeNotificationEvent/AlarmAcknowledged Events:

Name of the user who acknowledged the alarm.

"alarmclearduration": "2175014"

"alarmclearduration":

long

Time in milliseconds from first event which created the alarm to the latest alarm clear.

"alarmcleartime": "2017-03-03T13:20:36.224Z",

"alarmcleartime":

"p_alarmcleartime":

"p_alarmcleartime": "2017-03-03T13:20:36.224Z",

date

AlarmChangeNotificationEvent/AlarmClear Events:

Time that the alarm was cleared.

"alarmid": "31",

"alarmid":

"p_alarmid":

"p_alarmid": "30",

string

Alarm Change Events:

The alarm ID of the alarm that has changed.

"alarmtype": "1",

"alarmtype":

"p_alarmtype":

"p_alarmtype": "1",

string

Alarm Change Events:

Corresponds to the alarm’s type.

"applicationdn": null,

"applicationdn":

string

"asset-category": "Power",

"asset-category":

"asset-category":

"asset-category": "Power",

string

All asset_ entries correspond to fields in the Asset Table of the node referenced in the event. These fields are only present if populated in the asset table.

"asset-building": "55",

"asset-building":

"asset-building":

"asset-building": "55",

string

"asset-room": "F201",

"asset-room":

"asset-room":

"asset-room": "F201",

string

"asset-floor": "Gnd",

"asset-floor":

"asset-floor":

"asset-floor": "Gnd",

string

"asset-rack": "2101",

"asset-rack":

"asset-rack":

"asset-rack": "2101",

string

"categories": "",

"categories":

"categories":

"categories": "",

string

categories corresponds to node categories table. This is a comma-separated list of categories associated with this node ID. This field is indexed so separate values can be searched.

"clearkey": null,

"clearkey":

string

"counter": "1",

"counter":

string

"description": "<p>SNMP data collection on interface 127.0.0.1\n failed.</p>",

"description":

string

"eventuei": "uei.opennms.org/nodes/dataCollectionFailed",

"eventuei":

"p_eventuei":

"p_eventuei": "uei.opennms.org/nodes/dataCollectionFailed",

string

Alarm Change Events:

Corresponds to the alarm’s event UEI.

"firstautomationtime": null,

"firstautomationtime":

date

"firsteventtime": "2017-03-03T12:44:21.210Z",

"firsteventtime":

date

"foreignid": "1488375237814",

"foreignid":

"foreignid":

"foreignid": "1488375237814",

string

Foreign ID of the node associated with the alarm or event.

"foreignsource": "LocalTest",

"foreignsource":

"foreignsource":

"foreignsource": "LocalTest",

string

Foreign source of the node associated with alarm or event.

"ifindex": null,

"ifindex":

string

"ipaddr": "127.0.0.1",

"ipaddr":

string

"lastautomationtime": null,

"lastautomationtime":

"lasteventid": "1112",

"lasteventid":

string

"lasteventtime": "2017-03-03T12:44:21.210Z",

"lasteventtime":

"logmsg": "SNMP data collection on interface 127.0.0.1 failed with 'Unexpected exception when collecting SNMP data for interface 127.0.0.1 at location Default.'.",

"logmsg":

"p_logmsg":

"p_logmsg": "SNMP data collection on interface 127.0.0.1 failed with 'Unexpected exception when collecting SNMP data for interface 127.0.0.1 at location Default.'.",

string

"managedobjectinstance": null,

"managedobjectinstance":

string

"managedobjecttype": null,

"managedobjecttype":

string

"mouseovertext": null,

"mouseovertext":

string

"nodeid": "88",

"nodeid":

"nodeid":

"nodeid": "88",

string

Node ID of the node associated with the alarm or event.

"nodelabel": "localhost",

"nodelabel":

"nodelabel":

"nodelabel": "localhost",

string

Node label of the node associated with the alarm or event.

"nodesyslocation": "Unknown (edit /etc/snmp/snmpd.conf)",

"nodesyslocation":

"nodesyslocation":

"nodesyslocation": "Unknown (edit /etc/snmp/snmpd.conf)",

string

SNMP syslocation of the node associated with the alarm or event.

"nodesysname": "localhost.localdomain",

"nodesysname":

"nodesysname":

"nodesysname": "localhost.localdomain",

string

SNMP sysname of the node associated with the alarm or event.

"operatingsystem": null,

"operatingsystem":

string

"operinstruct": null,

"operinstruct":

string

"ossprimarykey": null,

"ossprimarykey":

string

"p_alarmid": "31",

"p_alarmid":

string

The Elasticsearch alarms index has a field p_alarmid which corresponds to the alarmid of the alarm and also the p_alarmid field in Alarm Change Events. This allows Alarm and Alarm Change Event indexes to be easily searched together for all Alarm Change Events corresponding to an alarm.

"p_reason": "Unexpected exception when collecting SNMP data for interface 127.0.0.1 at location Default.",

"p_reason":

string

All parameters in Alarms or Events are stored in Elasticsearch in separate index fields with names beginning with p_. p_reason is an example parameter injected by the uei.opennms.org/nodes/dataCollectionFailed event in OpenNMS.

"qosalarmstate": null,

"qosalarmstate":

string

"reductionkey": "uei.opennms.org/nodes/dataCollectionFailed::88",

"reductionkey":

"p_reductionkey":

"p_reductionkey": "uei.opennms.org/nodes/dataCollectionFailed::88",

string

Alarm Change Events:

Corresponds to alarm reductionkey.

"serviceid": "5",

"serviceid":

"p_serviceid":

"p_serviceid": "5"

string

Alarm Change Events:

Corresponds to the alarm’s service ID.

"severity": "2",

"severity":

"p_alarmseverity":

"p_alarmseverity": "2",

string

Alarm Change Events:

Corresponds to the alarm’s severity.

"severity_text": "Cleared",

"severity_text":

string

"stickymemo": null,

"stickymemo":

"p_stickymemo"

"p_stickymemo": null,

string

AlarmChangeNotificationEvent/StickyMemoAdded Events:

Content of current sticky memo for the alarm.

AlarmChangeNotificationEvent/StickyMemoUpdate Events:

These events have parameters:

  • p_author: author of stickymemo

  • p_body: content of sticky memo

AlarmChangeNotificationEvent/JournalMemoUpdate Events:

These events have parameters:

  • p_author: user who authored the memo

  • p_body: content of the memo

  • p_reductionkey: reduction key associated with memo (corresponds to alarm reduction key)

Note that journal memos do not have an entry in the alarm index but are only referenced by reduction key.

"suppressedtime": "2017-03-03T12:44:21.210Z",

"suppressedtime":

"p_suppressedtime":

"p_suppressedtime": "2017-03-02T14:24:59.282Z",

date

AlarmChangeNotificationEvent/AlarmSuppressed Events:

Corresponds to the alarm’s suppressed time.

"suppresseduntil": "2017-03-03T12:44:21.210Z",

"suppresseduntil":

"p_suppresseduntil":

"p_suppresseduntil": "2017-03-02T14:24:59.282Z",

date

AlarmChangeNotificationEvent/AlarmSuppressed Events:

Corresponds to the alarm’s suppressed until time.

"suppresseduser": null,

"suppresseduser":

"p_suppresseduser":

"p_suppresseduser": null,

string

AlarmChangeNotificationEvent/AlarmSuppressed Events:

Corresponds to the alarm’s suppressed user.

"systemid": "00000000-0000-0000-0000-000000000000",

"systemid":

"p_systemid":

"p_systemid": "00000000-0000-0000-0000-000000000000",

string

Alarm Change Events:

Corresponds to the alarm’s system ID.

"tticketid": null,

"p_tticketid":

"p_tticketid":

"p_tticketid": null,

string

AlarmChangeNotificationEvent/TroubleTicketStateChange Events:

Corresponds to the alarm’s trouble ticket ID.

"tticketstate": null,

"p_tticketstate":

"p_tticketstate":

"p_tticketstate": null,

string

AlarmChangeNotificationEvent/TroubleTicketStateChange Events:

Corresponds to the alarm’s trouble ticket state.

"x733alarmtype": null,

"x733alarmtype":

string

"x733probablecause": "0",

"x733probablecause":

string

}

}

}

}

24.3.3. Flow Support

Flow Support is described in detail here.

When persisting flows into Elasticsearch, every flow is represented by a single document.

The following table describes a subset of the fields in the flow document:

Field Description

@timestamp

Timestamp in milliseconds at which the flow was sent by the exporter.

location

Monitoring location at which the flow was received. This will be Default unless you are using Minion.

netflow.bytes

Number of bytes transferred in the flow.

netflow.last_switched

Timestamp in milliseconds at which the last packet of the flow was transferred.

netflow.direction

ingress or egress

netflow.first_switched

Timestamp in milliseconds at which the first packet of the flow was transferred.

netflow.last_switched

Timestamp in milliseconds at which the last packet of the flow was transferred.

netflow.input_snmp

SNMP interface index on which packets related to this flow were received.

netflow.output_snmp

SNMP interface index on which packets related to this flow were forwarded.

25. Flow Support

25.1. Introduction

OpenNMS Horizon supports receiving, decoding and persisting flow information sent via Netflow v5, Netflow v9, IPFIX and sFlow. While flows offer a great breadth of information, the current focus of the support in OpenNMS Horizon is aimed at:

  • Network diagnostic: Being able to view the top protocols and top talkers within the context of a particular network interface.

  • Forensic analysis: Persisting the flows for long term storage.

25.1.1. How it works

At a high level:

  • telemetryd is used to receive and decode flows on both OpenNMS Horizon and Minion.

  • The telemetryd adapters convert the flows to a canonical flow model and dispatch these to the flow repository.

  • The flow repository enriches the flows and persists them to Elasticsearch:

    • Flows are tagged with an application name via the Classification Engine.

    • Metadata related to associated nodes such as ids and categories are also added to the flows.

  • The REST API supports generating both summaries and time series data from the flows stored in the flow repository.

  • OpenNMS Helm is used to visualize the flow data using the flow datasource that interfaces with the OpenNMS Horizon REST API.

25.2. Setup

Here we assume that you already have:

25.2.1. Configuration Elasticsearch persistence

From a Karaf shell on your OpenNMS Horizon instance, start by configuring the flow persistence to use your Elasticsearch cluster:

$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.features.flows.persistence.elastic
admin@opennms()> config:property-set elasticUrl http://elastic:9200
admin@opennms()> config:update
This configuration is stored in ${OPENNMS_HOME/etc/org.opennms.features.flows.persistence.elastic.cfg. See General Elasticsearch Configuration for a complete set of options.

25.2.2. Enabling a protocol

Next, enable one or more of the protocols you would like to handle in ${OPENNMS_HOME}/etc/telemetryd-configuration.xml.

In this example we enable the NetFlow v5 protocol, but the same process can be repeated for any of the other flow related protocols.
Enable NetFlow v5 in telemetryd-configuration.xml
<protocol name="Netflow-5" description="Listener for Netflow 5 UDP packets" enabled="true">
   <listener name="Netflow-5-UDP-8877" class-name="org.opennms.netmgt.telemetry.listeners.udp.UdpListener">
        <parameter key="port" value="8877"/>
    </listener>

    <adapter name="Netflow-5-Parser" class-name="org.opennms.netmgt.telemetry.adapters.netflow.Netflow5Adapter">
    </adapter>
 </protocol>

Apply the changes without restarting by sending a reloadDaemonConfig event via the CLI:

Send a reloadDaemonConfig event through CLI
${OPENNMS_HOME}bin/send-event.pl -p 'daemonName Telemetryd' uei.opennms.org/internal/reloadDaemonConfig

This will open a UDP socket bound to 0.0.0.0:8877 to which NetFlow v5 messages can be forwarded.

25.2.3. Linking to OpenNMS Helm in the Web UI

In order to access flow related graphs from the OpenNMS Horizon web interface, you must configure a link to your instance of OpenNMS Helm.

$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.netmgt.flows.rest
admin@opennms()> config:property-set flowGraphUrl 'http://grafana:3000/dashboard/flows?node=$nodeId&interface=$ifIndex'
admin@opennms()> config:update
This URL can optionally point to other tools as well. It supports placeholders for $nodeId, $ifIndex, $start and $end.

Once configured, an icon will appear on the top right corner of a resource graph for an SNMP interface if there is flow data for that interface.

Configuring a listener on a Minion (Optional)

In this example we’ll look at enabling a generic listener for the NetFlow v5 protocol on Minion.

NetFlow v5 uses the generic UDP listener, but other protocols require a specific listener. See the examples in ${OPENNMS_HOME}/etc/telemetryd-configuration.xml, or Telemetryd Listener Reference for details.

To enable and configure an UDP Listener for NetFlow v5 on Minion, connect to the Karaf Console and set the following properties:

$ ssh -p 8201 admin@localhost
...
admin@minion()> config:edit org.opennms.features.telemetry.listeners-udp-8877
admin@minion()> config:property-set name Netflow-5
admin@minion()> config:property-set class-name org.opennms.netmgt.telemetry.listeners.udp.UdpListener
admin@minion()> config:property-set listener.port 8877
admin@minion()> config:update
If a configuration management tool is used, the properties file can be created and is used as startup configuration in ${MINION_HOME}/etc/org.opennms.features.telemetry.listeners-udp-8877.cfg.
name = Netflow-5
class-name = org.opennms.netmgt.telemetry.listeners.udp.UdpListener
listener.port = 8877
The associated protocol, in this case Netflow-5 must also be enabled on OpenNMS Horizon for the messages to be processed.

25.2.4. Node cache configuration (Optional)

By default each Flow Document is - if known by OpenNMS Horizon - enriched with node information. To reduce the number of queries to the database, the data is cached.

The following cache properties are available to be set in ${OPENNMS_HOME/etc/org.opennms.features.flows.persistence.elastic.cfg:

Property Description Required default

nodeCache.maximumSize

The maximum size of the cache

false

1000

nodeCache.expireAfterWrite

Number of seconds until an entry in the node cache is evicted. Set to 0 to disable eviction.

false

300

nodeCache.recordStats

Defines if cache statistics are exposed via JMX. Set to false to disable statistic recording.

false

true

25.2.5. Classification Exporter Filter cache configuration (Optional)

A rule in the Classification Engine may define an exporterFilter. In order to resolve if the filter criteria matches the address of an exporter a database query is executed. A cache can be configured to cache the result to improve performance.

The following cache properties are available to be set in ${OPENNMS_HOME/etc/org.opennms.features.flows.classification.cfg:

Property Description Required default

cache.classificationFilter.enabled

Enables or disables the cache.

false

false

cache.classificationFilter.maxSize

The maximum size of the cache

false

5000

cache.classificationFilter.expireAfterRead

Number of seconds until an entry in the node cache is evicted. Set to 0 to disable eviction. The timer is reset every time an entry is read.

false

300

nodeCache.recordStats

Defines if cache statistics are exposed via JMX. Set to false to disable statistic recording.

false

true

25.3. Classification Engine

The Classification Engine applies a set of user- and/or system-defined rules to each flow to classify it. This allows users to group flows by applications, e.g. if all flows to port 80 are marked as http.

In order to classify a flow, a rule must be defined. A rule defines at least a name, which the flow is classified with, and additional parameters which must match for a successful classification.

25.3.1. Rule definition

A rule has the following fields:

Name Mandatory Description

name

mandatory

The name the flow is classified with, e.g. http

dstPort

optional

The dstPort of the flow must match this port. May be a range or list of ports, e.g. 80,8080,8980, or 8000-9000.

dstAddress

optional

The dstAddress of the flow must match this address. May contain wildcards.

srcPort

optional

The srcPort of the flow must match this port. See dstPort for more details.

srcAddress

optional

The srcAddress of the flow must match this address. See dstAddress for more details.

exporterFilter

optional

The exporter of the flow must match this criteria. It supports all capabilities of the OpenNMS Horizon Filters API.

protocol

optional

The ip protocol of the flow must match this criteria.

Even if all fields (besides name) are optional, at least one of them must be defined to be considered a valid rule. A list of pre-defined rules already exist. The pre-defined rules are inspired by the IANA Service Name and Transport Protocol Port Number Registry. New rules can be defined using the Classification UI which can be found in the Admin Menu: Admin → Configure OpenNMS → Manage Flow Classification

25.3.2. Rule Priority

User-defined rules always have a higher priority than the pre-defined rules. For example, if the user defines a new rule, http with a dstPort of 8980 that rule has a higher priority than the pre-defined rule www-alt.

The priorities are as follows:

Field Priority

srcAddress

+9

dstAddress

+9

srcPort

+3

dstPort

+3

protocol

+1

exporterFilter

+1

The priority is added for each field which is defined according to the table above. This means a rule with a srcAddress or dstAddress has a priority of at least 9 and is always higher than a rule with a srcPort or dstPort, etc.

The calculation of the priority is implemented here.

At the moment it is not possible to manually define a priority. This may be implemented at a later time. See issue HZN-1265.

25.3.3. Verification

With a more complex set of rules it is not always easy to verify if everything is configured correctly. To make things a bit easier, the Classification UI allows to test/verify a classification. To do so, please navigate to the Classification UI: Admin → Configure OpenNMS → Manage Flow Classification and select the Test Classification action in the top right. This allows to simulate a flow being send to the Classification Engine with certain fields.

25.3.4. Example

Let’s assume the following rules are defined:

name srcAddress srcPort dstAddress dstPort protocol exporterFilter

OpenNMS

10.0.0.1

8980

tcp,udp

http

80,8980,8080,9000

udp,tcp

https

443

Exporters

categoryName == 'Exporters'

The following flows are send to OpenNMS Horizon and with the rules defined above classified accordingly.

Flow Classification

protocol: tcp,

srcAddress: 10.0.0.5, srcPort: 60123,

dstAddress: 54.246.188.65, dstPort: 80,

exporterAddress: 10.0.0.55

http

protocol: tcp,

srcAddress: 10.0.0.5, srcPort: 60123,

dstAddress: 54.246.188.65, dstPort: 443,

exporterAddress: 10.0.0.55

https

protocol: tcp,

srcAddress: 10.0.0.5, srcPort: 60123,

dstAddress: 10.0.0.1, dstPort: 8980,

exporterAddress: 10.0.0.55

OpenNMS

26. Kafka Producer

26.1. Overview

The Kafka Producer feature allows events, alarms, nodes and metrics from OpenNMS Horizon to be forwarded to Kafka.

These objects are stored in different topics and the payloads are encoded using Google Protocol Buffers (GPB). See opennms-kafka-producer.proto and collectionset.proto in the corresponding source distribution for the model definitions.

26.1.1. Events

The Kafka Producer listens for all events on the event bus and forwards these to a Kafka topic. The records are keyed by event UEI and contain a GPB encoded model of the event.

By default, all events are forwarded to a topic named events.

The name of the topic used can be configured, and an optional filtering expression can be set to help control which events are sent to the topic.

26.1.2. Alarms

The Kafka Producer listens for changes made to the current set of alarms and forwards the resulting alarms to a Kafka topic. The records are keyed by alarm reduction key and contain a GPB encoded model of the alarm. When an alarm is deleted, a null value is sent with the corresponding reduction key. Publishing records in this fashion allows the topic to be used as a KTable. The Kafka Producer will also perform periodic synchronization tasks to ensure that the contents of the Kafka topic reflect the current state of alarms in the OpenNMS Horizon database.

By default, all alarms (and subsequent updates) are forwarded to a topic named alarms.

The name of the topic used can be configured, and an optional filtering expression can be set to help control which alarms are sent to the topic.

26.1.3. Nodes

If an event or alarm being forwarded reference a node, then the corresponding node is also forwarded. The records are keyed by "node criteria" (see bellow) and contain a GPB encoded model of the alarm. A caching mechanism is in place to help avoid forwarding nodes that have been successfully forwarded, and have not changed since.

The name of the topic used can be configured.

The node topic is not intended to include all of the nodes in the system, it only includes records for nodes that relate to events or alarms that have been forwarded.
Node Criteria

The node criteria is a string representation of the unique identifier for a given node. If the node is associated with a foreign source (fs) and foreign id (fid), the node criteria resulting node criteria will be the name of the foreign source, followed by a colon (:) and then the foreign id i.e. (fs:fid). If the node is not associated with both a foreign source and foreign id, then the node id (database id) will be used.

26.1.4. Metrics

The Kafka Producer can be used to write metrics to Kafka either exclusively, or in addition to an existing persistence strategy i.e. RRD or Newts. The metrics are written in the form of "collection sets" which correspond to the internal representation used by the existing collectors and persistence strategies. The records are keyed by Node ID or by IP Address if no Node ID is available and contain a GPB encoded version of the collection sets. The records are keyed in this fashion to help ensure that collection sets related to the same resources are written to the same partitions.

When enabled (this functionality is disabled by default), the metrics are written to a topic named metrics.

When exclusively writing to Kafka, no metrics or resource graphs will be available on the OpenNMS Horizon instance.

26.2. Enabling the Kafka Producer

The Kafka Producer is disabled by default and can be enabled as follows.

First, login to the Karaf shell of your OpenNMS Horizon instance and configure the Kafka client settings to point to your Kafka broker. See Producer Configs for a complete list of available options.

$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.features.kafka.producer.client
admin@opennms()> config:property-set bootstrap.servers 127.0.0.1:9092
admin@opennms()> config:update

Next, install the opennms-kafka-producer feature from that same shell using:

admin@opennms()> feature:install opennms-kafka-producer

In order to ensure that the feature continues to be installed as subsequent restarts, add opennms-kafka-producer to the featuresBoot property in the ${OPENNMS_HOME}/etc/org.apache.karaf.features.cfg.

26.3. Configuring the Kafka Producer

The Kafka Producer exposes the following options to help fine tune its behavior.

Name Default Value Description

eventTopic

events

Name of the topic used for events. Set this to an empty string to disable forwarding events.

alarmTopic

alarms

Name of the topic used for alarms. Set this to an empty string to disable forwarding alarms.

nodeTopic

nodes

Name of the topic used for nodes. Set this to an empty string to disable forwarding nodes.

metricTopic

metrics

Name of the topic used for metrics.

eventFilter

-

A Spring SpEL expression (see bellow) used to filter events. Set this to an empty string to disable filtering, and forward all events.

alarmFilter

-

A Spring SpEL expression (see bellow) used to filter alarms. Set this to an empty string to disable filtering, and forward all alarms.

forward.metrics

false

Set this value to true to enable forwarding of metrics.

nodeRefreshTimeoutMs

300000 (5 minutes)

Number of milliseconds to wait before looking up a node in the database again. Decrease this value to improve accuracy at the cost of additional database look ups.

alarmSyncIntervalMs

300000 (5 minutes)

Number of milliseconds at which the contents of the alarm topic will be synchronized with the local database. Decrease this to improve accuracy at the cost of additional database look ups. Set this value to 0 to disable alarm synchronization.

26.3.1. Configuring Filtering

Filtering can be used to selectively forward events and/or alarms to the Kafka topics.

Filtering is performed using a Spring SpEL expression which is evaluated against each object to determine if it should be forwarded. The expression must return a boolean value i.e. true or false.

Enabling Event Filtering

To enable event filtering, set the value of the eventFilter property to a valid SpEL expression.

$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.features.kafka.producer
admin@opennms()> config:property-set eventFilter 'getUei().equals("uei.opennms.org/internal/discovery/newSuspect")'
admin@opennms()> config:update

In the example above, the filter is configured such that only events with the given UEI are forwarded. Consult the source code of the org.opennms.netmgt.xml.event.OnmsEvent class in your distribution for a complete list of available properties.

Enabling Alarm Filtering

To enable alarm filtering, set the value of the alarmFilter property to a valid SpEL expression.

$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.features.kafka.producer
admin@opennms()> config:property-set alarmFilter 'getTTicketId() != null'
admin@opennms()> config:update

In the example above, the filter is configured such that only alarms that are associated with a ticket id are forwarded. Consult the source code of the org.opennms.netmgt.model.OnmsAlarm class in your distribution for a complete list of available properties.

26.3.2. Enabling Metric Forwarding

To enable metric forward, set the value of the forward.metrics property to true.

$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.features.kafka.producer
admin@opennms()> config:property-set forward.metrics true
admin@opennms()> config:update
Enabling Exclusive Metric Forwarding

Once metric forwarding is enabled, you can use this as the exclusive persistence strategy as follows by setting the following system property:

echo 'org.opennms.timeseries.strategy=osgi' > "$OPENNMS_HOME/etc/opennms.properties.d/kafka-for-metrics.properties"

26.3.3. Configuring Topic Names

By default three topics are created i.e. events, alarms, nodes. To change these, you can use:

$ ssh -p 8101 admin@localhost
...
admin@opennms()> config:edit org.opennms.features.kafka.producer
admin@opennms()> config:property-set eventTopic ""
admin@opennms()> config:property-set nodeTopic "opennms-nodes"
admin@opennms()> config:update

In the example above, we disable event forwarding by setting an empty topic name and change the node topic name to opennms-nodes.

26.4. Shell Commands

The Kafka Producer also provides a series of shell commands to help administering and debugging the service.

26.4.1. kafka-producer:list-alarms

The list-alarms command can be used to enumerate the reduction keys and show the associated event labels for the alarms that are present in the topic. This command leverages functionality used by the alarm synchronization process, and as a result this must be enabled in for this command to function.

$ ssh -p 8101 admin@localhost
...
admin@opennms> kafka-producer:list-alarms
uei.opennms.org/alarms/trigger:n33:0.0.0.0:HTTPS_POOLs
        Alarm: Generic Trigger

26.4.2. kafka-producer:sync-alarms

The sync-alarms command can be used to manually trigger the alarm synchronization process.

$ ssh -p 8101 admin@localhost
...
admin@opennms> kafka-producer:sync-alarms
Performing synchronization of alarms from the database with those in the ktable.
Executed 1 updates in 47ms.

Number of reduction keys in ktable: 4
Number of reduction keys in the db: 4 (4 alarms total)
Reduction keys added to the ktable: (None)
Reduction keys deleted from the ktable: (None)
Reduction keys updated in the ktable:
        uei.opennms.org/nodes/nodeLostService::1:127.0.0.1:Minion-RPC

26.4.3. kafka-producer:evaluate-filter

The evaluate-filter command can be used to test arbitrary SpEL filtering expressions against alarms or events.

Evaluating filters against alarms

To test a filter against an alarm, specify the database id of the alarm and the expression to test:

admin@opennms> kafka-producer:evaluate-filter --alarm-id 57 "getReductionKey().contains('n33')"
SPEL Expression: getReductionKey().contains('n33')
Alarm with ID 57 has reduction key: uei.opennms.org/alarms/trigger:n33:0.0.0.0:HTTPS_POOLs
Result: true
Evaluating filters against events

To test a filter against an event, specify the UEI of the event and the expression to test:

admin@opennms> kafka-producer:evaluate-filter --event-uei uei.opennms.org/alarms/trigger "getUei().contains('alarm')"
SPEL Expression: getUei().contains('alarm')
Event has UEI: uei.opennms.org/alarms/trigger
Result: true

In this case, a new event will be created with the given UEI, and the filter will be evaluated against this new event object. At this time, existing events cannot be referenced by this tool, so this functionality only serves to help make sure the expressions are syntactically valid.


1. Wikipedia LLDP: https://en.wikipedia.org/wiki/Link_Layer_Discovery_Protocol