Extending Cribl: Building Custom Functions

One constant in log use cases is that you can’t plan for what you’re going to find at customers. Whether it’s multiple levels of encapsulation, like JSON-in-XML-in-Pipe-Separated (yes we’ve seen this), a need to radically transform the structure of events in a way we haven’t seen, or a need to reach out to an external system we’ve never worked with before, we knew going into this market we’d need to provide an easily extensible product. When we find ourselves in a place where a customer can’t chain our flexible out of the box functions like Drop, Eval, or Mask, our customers or we can easily drop in a custom function which meets their needs.

One of the reasons that we chose JavaScript was its rich ecosystem of libraries and its emergence as a universal runtime with WebAssembly. Cribl allows you to easily drop in your own code, interpreted or compiled, and get full access to your log data in motion. Even before we had a UI or any out of the box functions we proved out all our use case ideas through custom functions. We provide a very simple API to working with data, requiring you only to implement two methods: init and process. Configuration for custom functions works the same way as with out of the box functions by providing a JSON Schema and a UI schema as implemented via React JSON Schema Form. With this simple schema definition language, which you may be familiar with as the same behind Swagger, the UI will automatically render the forms properly allowing you to provide even sophisticated configuration available to your end users.

Lastly, one other advantage we get from JavaScript as a language is our ability to allow users to configure functions with JavaScript through the form of JavaScript Expressions. Along with our library of functions we ship for Masking and Encoding, powerful transformations and operations are possible through one-liner JavaScript expressions included in Cribl configurations.

With this post, we will walk you through how to build some custom functions with Cribl, first by starting with some examples from the functions we ship and concluding with a common use case: doing a DNS lookup against IP Addresses found in raw data.

What is a function?

First, we should define what is a function in Cribl. In Cribl, functions are a combination of code, configuration and data. Functions are a directory of files. Here is our regex_filter function we ship with Cribl:

regex_filter
├── conf.schema.json
├── conf.ui-schema.json
└── index.js

index.js contains our JavaScript code. It can include any built-in Node modules or reference other JavaScript files in it’s directory. Support for npm modules is on the backlog.

conf.schema.jsonand config.ui-schema.jsonare schema files for React JSON Schema Form, which will be covered in more detail before.

Missing from the current directory but expected to ship in 1.2 is support for shipping sample data with the functions for testing and validation.

To install a function, perhaps from our content repository, simply drop the function directory in $CRIBL_HOME/bin/functionsand restart Cribl. After that the function will be available in the UI. Next, lets look into the details of how a function is implemented.

Drop: The Simplest Function

Lets examine a function which Cribl ships with: Drop. Drop is an incredibly simple function. If the Filter expression matches, we’ll drop the event. The Filter expression gets evaluated before the function itself gets called, so Drop is only executed for events which should be dropped.

Let’s look at the code for the function:

exports.name = 'Drop';
exports.version = '0.1';
exports.group = 'Standard';
exports.process = () => null;

Cribl functions are NodeJS modules, and we look for several module variables to be defined, the names of which should seem obvious. Name defines how the UI will display the function name, Version documents the function’s version, and Groupis used by the UI to group like functions.

The processmethod is called for every event. It is passed the eventwhich is a JavaScript object which contains all the key/value pairs from our event. These key/value pairs are sent to our destination systems: in Splunk, they become index-time fields, in Elastic they become the shape of the event, or to a FileSystem or S3 they are serialized as JSON documents, one per line. In the case of the Dropfunction, we do not use the contents of the event, so the method is quite simply, return nullfor every call. When Cribl receives a falseyreturn value, we will drop the event.

Next Example: RegEx Filter

Now lets introduce a slightly more sophisticated example. The next function RegEx Filter, will drop an event if a given regular expression matches. This introduces some configuration into the function, allowing the user to input data. It implements both initand processand ships conf.schema.json and conf.ui-schema.jsonfor defining configurable variables.

First, lets look at the biggest new item we’ve introduced, JSON Schema. If you’ve never heard of JSON Schema, check out their tutorial. We use React JSON Schema Form to render JSON Schema as forms. You can use their interactive playground to test forms and see what options are available. For RegEx Filter, we’ve introduced a simple schema which defines two config variables: regexwhich defines the RegEx we’ll execute against the data, and fieldwhich defines which field we’ll test for a RegEx match. Here’s the Schema JSON, contained in conf.schema.json:

{
  "type": "object",
  "title": "",
  "properties": {
    "regex": {
      "title": "Regex",
      "description": "Regex to test against",
      "type": "string",
      "regexp": true
    },
    "field": {
      "title": "Field",
      "description": "Name of the field to apply the regex on (defaults to _raw)",
      "type": "string",
      "default": "_raw"
    }
  }
}

This should be seem straightforward. We are returning an object who’s properties, regexand fieldhave various properties defined about them, including their title, description, type and default values. Any JSON Schema will work here, including sophisticated examples we’ve seen in Swagger. For some more sophisticated examples in Cribl, look at the Maskor Lookupfunctions.

React JSON Schema Form also allows us to specify some information that isn’t covered simply in the schema for the data. The UI may need to differentiate a password field from a normal string field for example. In this case, we’re defining the RegEx field to use a custom input type which will validate a Regular Expression in conf.schema.json:

{
  "regex": {
    "ui:widget": "RegexInput",
    "ui:placeholder": "Regular expression"
  }
}

The UI Schema matches a given field name, in this case regexand it tells it to use aui:widgetof RegexInput. Now, let’s look at the code in index.js:

exports.name = 'Regex Filter';
exports.version = '0.1';
exports.group = 'Standard';

const { NamedGroupRegExp } = C.util; 

let regex;
let field = '_raw';
exports.init = (opts) => {
  const conf = opts.conf || {};
  regex = null;
  field = '_raw';

  if (conf.regex) {
    regex = new NamedGroupRegExp(conf.regex);
  }
  if (conf.field) {
    field = conf.field;
  }
};

exports.process = (event) => {
  if (regex) {
    regex.lastIndex = 0; // common trap of setting "global" flag
    return regex.test(event[field]) ? null : event;
  }
  return event;
};

The function is, again, quite simple. Most of the code is validating inputs to ensure the user has properly filled out regexand field. Lets look at the new concepts. First, we declare module level variables:

let regex;
let field = '_raw';

JavaScript is single threaded, so we can safely declare state at the module which will persist across each invocation of the Function’s processmethod. Next, we declare an initmethod which is called with an object, we use the name opts which contains the key/value pairs configured by the user.

exports.init = (opts) => {
  const conf = opts.conf || {};
  regex = null;
  field = '_raw';

  if (conf.regex) {
    regex = new NamedGroupRegExp(conf.regex);
  }
  if (conf.field) {
    field = conf.field;
  }
};

React JSON Schema Form validates input provided by the UI, but users can configure via YAML or JSON configs so we must also include validation in our functions as well to ensure we are not misconfigured. The majority of the code in initis validating that the user has inputted regexand fieldin the configuration. Now, lets look at process:

exports.process = (event) => {
  if (regex) {
    regex.lastIndex = 0; // common trap of setting "global" flag
    return regex.test(event[field]) ? null : event;
  }
  return event;
};

Here again, we’re simply testing the value in fieldto see if it matches regex. If so, we return null, if we not we return the event unmodified.

Reaching Out: Enriching Data using DNS

Lastly, let’s look at an example which shows a few more capabilities: asynchronous execution, reaching out to a third party system, and modifying an event. This really shows the power of the extensibility of Cribl. Custom user code can use information in the event modify the event using information accessed elsewhere. Even though Cribl does not ship with it, we can meaningfully extend it to implement a use case which is currently difficult to do in all logging systems: do a DNS lookup at ingestion time instead of read time. This function is hosted in our content repo, under dns.

To keep it simple, our function has no configuration, it simply enriches any IPv4 address it finds in the _rawfield of the event. Our function also does not support cache expiry and a few other features we’d likely implement for more than a demo. We can and should enhance it to make it more full featured, but this again shows how allowing users to extend the product allow them to extend the product with less full featured implementations than Cribl would need to implement to ship a generic version. Lets look at the code:

exports.name = 'DNS Lookup';
exports.version = '0.1';
exports.group = 'Demo Functions';

const dns = require('dns');

const ipv4Regex = /(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/gm;
const cache = {};

function reverse(IP, midx) {
  if (!cache[IP]) {
    cache[IP] = {
      promise: new Promise((resolve, reject) => { // eslint-disable-line
        dns.reverse(IP, (err, hostnames) => {
          if (!err) {
            const value = [`dns${midx !== 1 ? midx.toString() : ''}`, hostnames.join(' ')]; // if idx is not 1, name field dns2, dns3, etc
            cache[IP].value = value;
            resolve(value);
          } else {
            resolve([]);
          }
        });
      }),
    };
    return cache[IP].promise;
  } else if (!cache[IP].value) {
    return cache[IP].promise;
  }
  return Promise.resolve(cache[IP].value);
}

exports.disabled = 0;
exports.asyncTimeout = 500; // ms
exports.process = (event) => {
  const promises = [];
  let matches;
  let matchIdx = 1;
  ipv4Regex.lastIndex = 0; // ensure this is properly reset
  while (matches = ipv4Regex.exec(event._raw)) {
    const midx = matchIdx;
    const IP = matches[0];
    promises.push(reverse(IP, midx));
    matchIdx++;
  }
  if (promises.length === 0) {
    return event;
  }
  return Promise.all(promises)
    .then((entries) => {
      entries.filter(e => e !== undefined).forEach(e => {
        event[e[0]] = e[1];
      });
      return event
    })
    .catch(() => {
      return event;
    });
};

Our function defines a few module level variables, such as importing Node’s dnsmodule, setting up a cachevariable and defining a RegEx we will use for matching IPv4 addresses. Let’s look at our processimplementation:

exports.process = (event) => {
  const promises = [];
  let matches;
  let matchIdx = 1;
  ipv4Regex.lastIndex = 0; // ensure this is properly reset
  while (matches = ipv4Regex.exec(event._raw)) {
    const midx = matchIdx;
    const IP = matches[0];
    promises.push(reverse(IP, midx));
    matchIdx++;
  }
  if (promises.length === 0) {
    return event;
  }
  return Promise.all(promises)
    .then((entries) => {
      entries.filter(e => e !== undefined).forEach(e => {
        event[e[0]] = e[1];
      });
      return event
    })
    .catch(() => {
      return event;
    });
};

We first match all the instances of the IPv4 regex we find in the _rawfield, which is hard coded for this function. For each match, we add a promise to an array which we then pass to Promise.all. With Promise.all, our function will wait for all DNS resolutions to complete before calling our .then()implementation then merges back in the DNS responses to the event object itself before returning it. The meat of the logic for the function is in the resolvefunction we’ve implemented which wraps Node’s dns.reversein a promise:

function reverse(IP, midx) {
  if (!cache[IP]) {
    cache[IP] = {
      promise: new Promise((resolve, reject) => { // eslint-disable-line
        dns.reverse(IP, (err, hostnames) => {
          if (!err) {
            const value = [`dns${midx !== 1 ? midx.toString() : ''}`, hostnames.join(' ')]; // if idx is not 1, name field dns2, dns3, etc
            cache[IP].value = value;
            resolve(value);
          } else {
            resolve([]);
          }
        });
      }),
    };
    return cache[IP].promise;
  } else if (!cache[IP].value) {
    return cache[IP].promise;
  }
  return Promise.resolve(cache[IP].value);
}

This method first checks our module level cache object, called cache, and if it matches it returns a promise of the value in the cache. If not, it creates a new promise, which resolves when the async dns.resolvereturns. It checks for errors and returns the resolved value.

As you can see, this is fairly straightforward. In less than 60 lines, we’ve implemented a meaningful extension to Cribl’s functionality.

Conclusion

There are hundreds of different use cases which can be easily implemented as Cribl functions. We don’t want everyone to invent their own implementations, so we’re launching a shared repo of functions users have built which solve various use cases. In version coming soon, you’ll be able to point Cribl at a URL for a repo on GitHub or BitBucket and import a function with a single click. For now, it’s simple to clone these repos and insert in them into $CRIBL_HOME/bin/functionsand it’ll show up in your UI upon restart.

What would you like Cribl to do it doesn’t do today? We’d love to collaborate on publishing a new extension to our content repo. We want everyone to be able to conceive of and easily ship their own ideas and share them with the community. We’d love to see your contributions, or file an issue and we’ll build you an implementation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s