Knowi enables analysis, visualization, warehousing and reporting automation from file based data across the following file formats & Channels:
Note, that data from these sources will be created as widgets directly.
Use our agent inside your network to connect and extract data from various file based data sources securely. See Cloud9Agent to download your agent along with instructions to run it. Under the examples folder, the agent contains datasource_example_csv.json and query_example_csv.json to get you started.
Highlights:
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to csv |
url | File path. Example: /usr/local/examples/demoFile.txt |
quoteChar | Quote Character for String values, if applicable. Defaults to double quotes if not specified. |
delimChar | Delimiter character. Defaults to comma if not specified |
nullFindString | Optional, searches for fields to replace with nullReplaceWith. Useful to replace values such as 'NA', 'None' etc in a number field |
nullReplaceWith | Optional, Replaces the string specified in nullFindString with this |
columnTypes | Nested JSON that defines datatypes for fields. Defaults to String if a type is not specified. Datatypes supported: String, Date( |
Date Formats for Date Fields in columnTypes:
Letter | Date/Time Component |
---|---|
y | Year |
M | Month |
w | Week of Year |
W | Week in month |
D | Day in Year |
d | Day in Month |
F | Day of Week in Month |
a | AM/PM marker |
H | Hour in day (0-23) |
h | Hour in am/pm (1-12) |
m | Minute in hour |
s | Second in minute |
S | Millisecond |
z | Time zone |
Z | Time zone |
Example:
... "columnTypes":{ "Week":"Date(MM/dd/yy)", "Date":"Date(MM/dd/yy HH:mm)" } ...
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs |
runOnSourceChange | The query is run when the file has been modified, if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoCSV",
"url":"/usr/local/examples/demo-data.csv",
"datasource":"csv",
"quoteChar":"\"",
"delimChar":",",
"eolSymbols":"\n",
"nullFindString":"None",
"nullReplaceWith":"0",
"columnTypes":{
"Week":"Date(MM/dd/yy)",
"Date":"Date(MM/dd/yy HH:mm)"
}
}
]
Query Example:
[
{
"entityName":"CSV Demo",
"dsName":"demoCSVFile",
"queryStr":"select count(*) where someField like abc",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
Highlights:
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to json |
url | File path. Example: /usr/local/examples/demoFile.json |
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs. Nested queries and objects are supported. | .
runOnSourceChange | Optional. The query is run when the file has been modified, if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoJSON",
"url":"examples/demo-data.json",
"datasource":"json"
},
{
"name":"demoJSON-Nested",
"url":"examples/demo-nested.json",
"datasource":"json"
}
]
Query Example:
[
{
"entityName":"JSON Demo - 1",
"dsName":"demoJSON",
/* Using Cloud9SQL to query JSON file */
"queryStr":"select sum(sent) as Sent, customer as Customer group by Customer",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
},
{
"entityName":"JSON Nested Demo",
"dsName":"demoJSON-Nested",
"queryStr":"select nestedObj.a, nestedArr[0], sent where nestedArr[0]=150",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
Highlights:
Note: The examples folder of the agent contains datasourceexampleregex.json and queryexampleregex.json sample configurations.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to regex |
url | File path. Example: /usr/local/examples/demoFile.json |
logFormat | Format of the log entries to process.
|
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs. Nested queries and objects are supported. | .
runOnSourceChange | Optional. The query is run when the file has been modified, if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoRegex",
"url":"examples/demo-regex.log",
"datasource":"regex",
"logFormat":"${date:DATE(MMM d HH:mm:ss)}${host:WORD()}\\S+:${msg}"
}
]
Query Example:
[
{
"entityName":"Sample Regex Query",
"dsName":"demoRegex",
"queryStr":"select date(date) as date, count(*) as count group by date(date)",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
Highlights:
Note: The examples folder of the agent contains datasourceexamplexml.json and queryexamplexml.json sample configurations.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to xml |
url | File path. Example: /usr/local/examples/demoFile.txt |
dataNode | The base node to traverse from |
columnTypes | Nested JSON that defines datatypes for fields. For nested elements, use the dot notation. Example: Count.Sent
Defaults to String if a type is not specified. Datatypes supported: String, Date( |
Date Formats for Date Fields in columnTypes:
Letter | Date/Time Component |
---|---|
y | Year |
M | Month |
w | Week of Year |
W | Week in month |
D | Day in Year |
d | Day in Month |
F | Day of Week in Month |
a | AM/PM marker |
H | Hour in day (0-23) |
h | Hour in am/pm (1-12) |
m | Minute in hour |
s | Second in minute |
S | Millisecond |
z | Time zone |
Z | Time zone |
Example:
... "columnTypes":{ "Week":"Date(MM/dd/yy)", "Date":"Date(MM/dd/yy HH:mm)", "Count.Sent":"Integer" } ...
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs |
runOnSourceChange | The query is run when the file has been modified, if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoXML",
"url":"examples/demo-data.xml",
"datasource":"xml",
"dataNode":"data.item",
"columnTypes":{
"id":"Integer",
"Week":"Date(MM/dd/yy)",
"Date":"Date(MM/dd/yy HH:mm)",
"Count.Sent":"Integer"
}
}
]
Query Example:
[
{
"entityName":"XML Demo - 1",
"dsName":"demoXML",
"queryStr":"select *",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
},
{
"entityName":"XML Demo - 2",
"dsName":"demoXML",
"queryStr":"select sum(Count.Sent) as Sent, Customer where recipients.recipient[0] = [email protected]",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
The first query returns the entire document as is with nested elements intact. The second traverses nested relationships to build a simple aggregation, based on specific conditions.
Highlights:
Datasource and Query configurations are the same as the CSV/JSON processing, except for a customProcessors attribute that enables FTP.
Example CSV Datasource using FTP to pull file:
[
{
"name":"pageViewsLogs",
"url":"/export/home/akjd/file.csv",
"datasource":"csv",
"quoteChar":"\"",
"delimChar":"|",
"eolSymbols":"\n",
"nullFindString":"None",
"nullReplaceWith":"0",
"columnTypes":{
"startCount":"int",
"endCount":"int"
},
"customProcessors":[
{
"className":"com.cloud9.connector.processors.FTPProcessor",
"processOrder":"timed",
"frequencyType":"days",
"frequency":1,
"startTime":"03:15",
"params":{
"host":"ftp.box.com",
"user":"[email protected]",
"password":"somePass",
"remoteFile":"/BoxDir/\${date:yyyyMMdd}/page_views_\${date:yyyyMMdd-1d}000000_\${date:yyyyMMdd}000000.csv",
"localFile":"/export/home/akjd/file.csv",
"overwriteOnlyWhenChanged":true
}
}
]
}
]
FTP Processor params configuration:
FTP Processor Parameter | Comments |
---|---|
host | FTP Host to connect to |
port | Port number, optional |
user | user to connect to the FTP site. Connects as anonymous if not specified |
password | Password to connect. Connects as anonymous if user is not specified |
mode | Optional security mode. Default is empty. Values: ftps or ftpes | remoteFile | Remote File Name. For files with dates in them, use \${date:dateFormat} to for current date. You can also add or subtract days or weeks. Example: \${date:yyyyMMdd-1d} |
localFile | Where to put the file pulled from the FTP site. Typically, this is set to the same as url parameter in the main datasource configuration |
overwriteOnlyWhenChanged | If set to true, overwrites local copy only if the remote file is different than the local copy. |
This can be used to turn existing text reports sent via email into dashboards widgets and dashboards, with storage and trends.
Highlights:
Contact us for the mailbox to CC your report into and we'll take care of the rest.
Highlights:
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to excel |
url | File path. Example: /usr/local/examples/demoFile.txt |
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data from the worksheet. Use select * to select everything from the worksheet. See Cloud9QL docs |
excelWorksheetName | Worksheet name to process data from |
runOnSourceChange | The query is run when the file has been modified, if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoExcel",
"url":"/usr/local/abc/files/demo-data.xlsx",
"datasource":"excel"
}
]
Query Example:
[
{
"entityName":"Excel Data - Demo Sent",
"dsName":"demoExcel",
"queryStr":"select sum(sent) as Sent, Customer group by customer",
"runOnSourceChange":true,
"excelWorksheetName":"demo-data",
"overrideVals":{
"replaceAll":true
}
}
]
Highlights:
Datasource Example:
[
{
"name":"demoHttpPublic",
"url":"https://s3.amazonaws.com/datatesting/demo-data-public.csv",
"datasource":"http",
"dataType":"csv",
"localFile":"./downloadedFile.csv"
}
]
Query Example:
[
{
"entityName":"HTTP File Demo",
"dsName":"demoHttpPublic",
"queryStr":"select date(date) as Date, sum(sent) as Total Opened, sum(opened) as Opened group by date(date)",
"overrideVals":{
"replaceAll":true
}
}
]
Highlights:
For files that require authentication/authorization, create credentials from AWS:
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to s3 |
s3RegionName | Region associated to the bucket. Example: us-east-1 |
s3BucketName | S3 Bucket Name/Identifier |
s3FileKey | S3 File Name |
datatype | File format. Supported: json/csv |
s3FileKey | S3 File Name |
datatype | File format. Supported: json/csv. Defaults to csv if not specified. |
localFile | Location and name of the local file upon download. Optional - if not specified, it'll pull same name as the S3 key into the base agent directory. |
authAccessToken | S3 Credentials key obtained for the user during the auth step |
authAccessTokenSecret | S3 Credentials secret obtained for the user using the auth step |
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data from the worksheet. Use select * to select everything from the worksheet. See Cloud9QL docs |
runOnSourceChange | The query is run when the file has been modified, if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved). |
frequency | Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes. |
startTime | Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoS3CSV",
"datasource":"s3",
"s3RegionName":"us-east-1",
"s3BucketName":"datatesting",
"s3FileKey":"demo-data.csv",
"dataType":"csv",
"authAccessToken":"someToken",
"authAccessTokenSecret":"someSecret",
"localFile":"./downloadedFile.csv"
},
{
"name":"demoS3JSON",
"datasource":"s3",
"s3RegionName":"us-east-1",
"s3BucketName":"datatesting",
"s3FileKey":"demo-data.json",
"dataType":"json",
"authAccessToken":"someToken",
"authAccessTokenSecret":"someSecret"
}
]
For processing s3BucketName or s3FileKey that has a date based name, use ${date: <dateFormat>} to process based on current date.
Examples:
i. If the current date is 08/01/2014 and today's data is identified by 20140801-file.csv in S3, this can be defined as ${date:yyyyMMdd}-file.csv for processing.
ii. If the current date is 08/01/2014 and the file has multiple days worth of data, identified by 20140801-20140727-file.csv, define it as ${date:yyyyMMdd}-${date:yyyyMMdd-5d}-file.csv
Query Example:
[
{
"entityName":"S3 CSV",
"dsName":"demoS3CSV",
"queryStr":"select * limit 10",
"overrideVals":{
"replaceAll":true
}
},
{
"entityName":"S3 JSON",
"dsName":"demoS3JSON",
"queryStr":"select * where sent > 10000",
"overrideVals":{
"replaceAll":true
}
}
]
For public files on S3 that does not require S3 credentials, see our HTTP File Downloads section