ElasticSearch Datasource Integration

With Knowi you can natively connect to data in your ElasticSearch cluster, create visualizations, perform joins across multiple indexes, use Knowi's search-based analytics feature (also known as natural language querying), and more.

Overview

  1. Connect, extract and transform data from your ElasticSearch, using one of the following options:

    a. Through our UI to connect directly.

    b. Using our Cloud9Agent. This can securely pull data inside your network. See agent configuration for more details.

  2. Visualize and Automate your Reporting instantly.

UI Based Approach

Connecting

  1. Log in to Knowi and select Queries from the left sidebar.

  2. Click on New Datasource + button and select ElasticSearch from the list of datasources.

  3. After navigating to the New Datasource page, either use the pre-configured settings into Cloud9 Chart's own demo ElasticSearch database or follow the prompts and configure the following details to set up connectivity to your own ElasticSearch database:
    a. Datasource Name: Enter a name for your datasource
    b. Elasticsearch URL: Elasticsearch URL to connect to.
    c. User ID: Enter the User ID to connect
    d. Password: Enter the password to connect to the database
    e. Custom Headers: Headers for additional connection properties. f. Version: Select Elasticsearch server version

  4. Establish Network connectivity and click on the Test Connection button.

    Note: The connection validity of the network can be tested only if it has been established via Direct Connectivity or an SSH tunnel. For more information on connectivity and datasource, please refer to the documentation on- Connectivity & Datasources.

  5. Click on Save and start Querying.

adding-elasticsearch

Query

Set up Query using a visual builder or query editor

Visual Builder

Step 1: After connecting to the ElasticSearch datasource, Knowi will pull out a list of indexes along with field samples. Using these indexes, you can automatically generate queries through our visual builder in a no-code environment by either dragging and dropping fields or making your selections through the drop-down.

visual-builder

Tip: You can also write queries directly in the Query Editor, a versatile text editor that offers more advanced editing functionalities like ElasticSearch JSON Query, support for multiple language modes, Cloud9QL, and more.

Step 2: Define data execution strategy by using any of the following two options:

  • Direct Execution: Directly execute the Query on the original Datasource, without any storage in between. In this case, when a widget is displayed, it will fetch the data in real time from the underlying Datasource.

  • Non-Direct Execution: For non-direct queries, results will be stored in Knowi's Elastic Store. Benefits include- long-running queries, reduced load on your database, and more.

Non-direct execution can be put into action if you choose to run the Query once or at scheduled intervals. For more information, feel free to check out this documentation- Defining Data Execution Strategy

data-strategy

Step 3: Click on the Preview button to analyze the results of your Query and fine-tune the desired output, if required.

data-strategy

The result of your Query is called Dataset. After reviewing the results, name your dataset and then hit the Create & Run button.

create-and-run

Query Editor

A versatile text editor designed for editing code that comes with a number of language modes including Couchbase Query Language (CQL) and add-ons like Cloud9QL, and AI Assistant which empowers you with powerful transformations and analysis capabilities like prediction modeling and cohort analysis if you need it.

Create and Run

AI Assistant

AI assistant query generator automatically generates queries from plain English statements for searching the connected databases and retrieving information. The goal is to simplify and speed up the search process by automatically generating relevant and specific queries, reducing the need for manual input, and improving the probability of finding relevant information.

Step 1: Select Generate Query from AI Assistant dropdown and enter the details of the query you'd like to generate in plain English. Details can include table or collection names, fields, filters, etc.
Example: Elasticsearch query to Show me name, grades, cuisine from table restaurant Note: The AI Assistant uses OpenAI to generate a query and only the question is sent to OpenAI APIs and not the data.

Create and Run

Step 2: Define data execution strategy by using any of the following two options:

  • Direct Execution: Directly execute the Query on the original Datasource, without any storage in between. In this case, when a widget is displayed, it will fetch the data in real time from the underlying Datasource.

  • Non-Direct Execution: For non-direct queries, results will be stored in Knowi's Elastic Store. Benefits include- long-running queries, reduced load on your database, and more.

Non-direct execution can be put into action if you choose to run the Query once or at scheduled intervals. For more information, feel free to check out this documentation- Defining Data Execution Strategy

data-strategy

Step 3: Click on the Preview button to analyze the results of your Query and fine-tune the desired output, if required.

Data Strategy MysqlDB

Note 1: The OpenAI must be enabled by the admin before using the AI Query Generator. 

{Account Settings > Customer Settings > OpenAI Integration}

Note 2: The user can copy the API key from the personal OpenAI account and use the same or use the default key provided by Knowi.

Furthermore, AI Assistant offers you additional features that can be performed on top of the generated query as listed below:

  • Explain Query
  • Find Issues
  • Syntax Help
Explain Query

Provides explanations for your existing query. For example, an explanation requested for the query generated below AI Assistant has returned the description-

This query is asking Elasticsearch to return 10 documents with the fields "name", "grades", and "cuisine". The query is using a "bool" query, which is a combination of other queries. The "must" clause is a list of queries that must all match for a document to be returned. In this case, the only query in the "must" clause is a "match_all" query, which will match all documents in the index..

Find Issues

Helps in debugging and troubleshooting the query. For example, finding issues in the query generated below returns this error- The grade is misspelled (should be "grade")

Syntax Help

Ask questions around query syntax for this datasource. For example, suggesting the syntax for the requested query returned the response- {"query":{"bool":{"must":[{"match_all":}]}}}

Highlighting

Users can pull ElasticSearch highlights from one or more fields in an ElasticSearch query. Requesting highlights (see: ElasticSearch Highlighting Syntax) in the editor view of an ElasticSearch query will return a _highlight column in your query output containing highlighted fields and fragments.

Sample query that searches for Costco records and highlights the customer field:

{ "query" : { "match_phrase" : { "customer" : "Costco" } }, "highlight": { "fields": { "customer": {} } }, "size" : 10000 }

This will return a separate field named _highlight containing the highlighted field and fragments. See the documentation on ElasticSearch Highlight for more information.

Cloud9Agent Configuration

As an alternative to the UI based connectivity above, you can use Cloud9Agent inside your network to pull from ElasticSearch securely. See Cloud9Agent to download your agent along with instructions to run it.

Highlights:

  • Pull data using SQL.
  • Execute queries on a schedule, or, one time.

The agent contains a datasource_example_elasticsearch.json and query_example_elasticsearch.json under the examples folder of the agent installation to get you started.

  • Edit those to point to your database and modify the queries to pull your data.
  • Move it into the config directory (datasource_XXX.json files first if the Agent is running).

Datasource Configuration:

Parameter Comments
name Unique Datasource Name.
datasource Set value to elasticsearch
url URL to connect to, where applicable for the datasource. Example for ElasticSearch: localhost:3306/test
userId User id to connect, where applicable.
Password Password, where applicable
userId User id to connect, where applicable.

Query Configuration:

Query Config Params Comments
entityName Dataset Name Identifier
identifier A unique identifier for the dataset. Either identifier or entityName must be specified.
dsName Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required.
queryStr ElasticSearch query to execute. Required.
frequencyType One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved)
frequency Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes
startTime Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm
c9QLFilter Optional post processing of the results using Cloud9QL. Typically uncommon against SQL based datastores.
overrideVals This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present.

Examples

Datasource Example:

[
  {
    "name":"demoElasticSearch",
    "url":"localhost:http://54.205.52.21:9200",
    "datasource":"elasticsearch",
    "userId":"a",
    "password":"b"
  }
]

Query Example:

[
    {
        "entityName": "ElasticSearch Demo",
        "indexes":"sendingactivity",
        "queryStr": "{\"size\":1000,\"query\": {\"query_string\": {\"query\": \"Transactional*\"}}} ",
        "c9QLFilter": "select sum(sent) as sent, sum(opened) as Opened, date(date) as Sent Date group by date(date)",
        "dsName": "demoElasticSearch",
        "overrideVals": {
            "replaceAll": true
        }
    }
]

The first query is run every 10 minutes at the top of the hour and replaces all data for that dataset in Knowi. The second is run once a day at 07:20 AM and updates existing data with the same Type field, or inserts new records otherwise.

Query Elasticsearch Template

Knowi supports querying Elasticsearch templates by allowing the end-user queries to pass as a parameter into the search template. This prevents your query-building logic from being exposed to the end user.

Elasticsearch templates make it easy to convert the user input into Elasticsearch queries. Simply modify the template for id and parameter(s) based on your requirements and run the query against it.

Knowi automatically detects Elasticsearch templates in the code editor and allows you to query it as shown below:

create-and-run

Direct Passthrough Query

A Direct Passthrough Query allows you to query an Elasticsearch index right from your dashboard through the use of dashboard filters. Currently, Elasticsearch is the only datasource supported.

The primary use case for this would be when you have a large amount of data (billions of records) that you'd like to aggregate in real-time. For example, take an index containing 800 million records. Instead of pulling all 800 million records in before manually filtering and grouping at the widget level, you can perform aggregation operations and query against your Elasticsearch index directly from your dashboard.

This is achieved by utilizing Knowi's existing Direct Query tokens. When a user makes a dashboard filter selection using these specific "template" query tokens (defined below), Knowi will translate these selections into Elasticsearch and Cloud9QL queries, execute the query, and display the resulting dataset in the passthrough query widgets.

Here is a summary of the flow:
1. Create a direct passthrough query against an Elasticsearch datasource (see template below)
2. Save the query
3. Add the newly created widget to a dashboard
4. Configure dashboard filters using the template tokens (ƒ) to set metrics and dimensions

Creating a Direct Passthrough Query

Query your Elasticsearch datasource by following the steps outlined in this document.

Query Template

To create a Direct Passthrough Query, we will have to manually add the template in the query Editor mode. Currently, there is no way to auto-generate this template, but we may add this functionality later on.

In the Elasticsearch JSON Query section, define your table (index), metric field, dimension field, and query tokens. Note that these tokens follow the format defined in the Runtime Query Parameters:

  1. $c9_table$()$: The should be the name of the index that you are querying against. Currently, you can only query one index per passthrough query.
  2. $c9_metrics$[]${field|Metric}$: The will be shown on the dashboard/widget as a field which you can use to set the metric.
  3. $c9_dimensions$[]${field|Dimension}$: The will be shown on the dashboard/widget as a field which you can use to set the metric.
  4. $c9_query$()$: This is going to be replaced by the auto-generated ElasticSearch query based on your metric and dimension selections

Example:

/*  
Query Template  
Table: $c9_table$(activities)$  
Metrics: $c9_metrics$[Metrics]${field|Metric}$  
Dimensions: $c9_dimensions$[Dimensions]${field|Dimension}$  
*/
$c9_query$({})$

In the Cloud9QL Transformations section, you can define how the auto-generated C9QL query should be placed by setting:

  1. $c9_c9ql$()$: This is going to be replaced by the auto-generated C9QL query based on your metric and dimension dashboard filter selections
$c9_c9ql$()$

create-and-run

Configuring the Direct Passthrough Query Filters

Step 1: Click on the Filters icon on the dashboard toolbar.

create-and-run

Step 2: Click on the + Add button and select Filter from the dropdown list.

create-and-run

Step 3: Configure the filter by defining the values for the following parameters:
Fields to filter (Metrics or Dimensions)

  • Condition type
  • Value
  • Label
  • Operation

Configuring the Metrics

  • Fields to filter: If you select Metrics, then the Value field will be converted to the list of fields contained in the table.
    Note: (ƒ) for field types, (T) for text, and (#) for numeric fields.

  • Condition type: Set to equals. This must be set to equals in order to pass the query parameters properly.

  • Value: List of fields from the index specified
    Note: This provides the ability to select one field from multiple fields. The field selected will be highlighted, and the Label and Operation will be based on the selected field.

  • Label: This is the field name which can be set to an alias.
    Note: This applies to the field selected (highlighted) above in the Value box.

  • Operation: This is dependent on the field's data type. For example, if the field data type is string then the operation dropdown listing includes None or Count. If the field data type is numeric, then the operation dropdown listing includes Sum, Min, Max, Avg, Median, and Count.
    Note: This applies to the field selected (highlighted) above in the field box

create-and-run

Configuring the Dimensions for Filter

Dimension is used to group the data based on the fields you have selected/aggregated in the Metrics. For example, if you select a field 'customer' here, then the data table will aggregate your metrics by your 'customer' field.

  • Fields to filter: If you select Dimensions then the Value field will be converted to the list of attributes contained in the table.
    Note: (ƒ) for field types, (T) for text, and (#) for numeric fields.
  • Condition type: Set to equals. This must be set to equals in order to pass the query parameters properly.

  • Value: List of fields from the index specified. These will automatically be added as columns in your result.

create-and-run

Step 3: Hit the Save button and the filter will be added to the right-hand filter panel.

create-and-run

Editing the Filters

You can edit your Metrics (aggregations) and Dimensions (groupings) via the dashboard filter bar. For your Metrics, click on existing fields to alter their respective Label and/or Operation. Add new fields by selecting additional fields in the Value dropdown. For Dimensions, add or remove groupings by selecting additional fields in the Value dropdown.

create-and-run

Regular Field Filters

Selecting a field that exists on this index will be translated and passed through the query directly

  • Field to filter: Index field or column name
  • Condition Type: Condition that defines 'Field to filter' and 'Value' relationship (Equals, Greater than, Not equals)
  • Value: Filter value

Example:

  • Field to filter: Sent
  • Condition Type: Greater Than
  • Value: 100