Kafka – Part 4 – Data sets in Pega
In this blog article, we will see some basics about data set rule using YouTube as the data source.
This tutorial is implemented using Pega personal edition 8.4, but the core concepts remain the same in higher versions as well
It is recommended to go through previous blog articles on Kafka fundamentals before proceeding here. You can go through the series in order.
https://myknowtech.com/tag/kafka
What is a data set rule?
– The name speaks itself. It is a collection or a set of data.
– The source data can be from different systems and can be of different formats.
– Data set is a rule instance that belongs to category – Data model and of applies to class Rule-Decision-DataSet
In the above picture, you see there are different types of data set rule available that can source the data from different systems like – Kafka, Hadoop, Amazon kinesis, social network platforms, database tables etc.
In the later part of the Kafka series, we will have a real business scenario to make use of the Kafka data set rule. For now, let’s have a fun scenario 🙂
Scenario: Create a YouTube data set to filter the YouTube videos that are related to Chelsea Football club.
How to create a new data set rule?
Step 1: Create a new data set rule
Records -> Data Model ->Data Set -> Create new
You see the Data set type is of four category
- DBM (database management)
- File system
- General category
- Social
- Stream
The configuration points of the data set rule vary based on the data set type.
I would recommend you go through the other OOTB decisioning data set rules to see different configuration points for different data set types.
For now, we will use the YouTube Channel.
Important note: When you use Social category, your applies to class should be Data-Social-YouTube/Facebook
Step 2: Fill out the configuration points in the YouTube data set.
Access details
In this block, you can provide the YouTube access details.
Google API key –
We know that YouTube is part of Google’s world and so we need a Google API key, to access YouTube services.
Just browse in Google, there are many good articles around to get the Google API key
In the short, the steps will be
Login to google developer console – Create new project – Enable Youtube data API – Create credentials.
Input the API key in the data set rule form.
Retrieve Video URL & Retrieve comments – the name explains it. You can use the checkbox to retrieve the Video URL and comments.
For now, I am interested only in the video URL and not in the comments.
Keywords
In this block, you can add one or more keywords, that can be used to filter the YouTube videos.
As per my requirement, I am going to filter only the video metadata that has kthe eyword “Chelsea FC”
Authors
In this block, you can specify user names, from whom the videos or comments are to be ignored.
We know that on social platforms, you will see all types of videos, and comments. So it is always good to have an option to ignore a few authors in your data set.
For my use case, I keep it empty.
Step 3: Save the rule.
Step 4: Run the rule manually.
In the run window, you can browse the first 50 results.
There you see the data set results for Chelsea FC videos 🙂
You see some interesting attributes like Author, authorhandle, publisherid etc. You can verify all those details in the Google YouTube data API documentation.
We saw before that data sets fall under different category and type (like YouTube, Facebook etc)
In the App Explorer, navigate to the Data-Admin-DataSet and check the child classes that inherit.
Apart from the available data types, you see three more standard implementations.
Pega created three default data types.
Important note: Do not create any new data types in these three classes!!
- pxAdaptiveAnalytics – Represents adaptive inputs.
- pxEventSummary – Used to read and write data created in the event catalog
- pxInteractionHistory – Represents the interaction history results
This is mostly used with decisioning and marketing frameworks to use the Interaction history data set to make some decisions.
As a summary,
– Data set is a collection or set of data.
– Data set can source the data from different systems/applications like Kafka, Hadoop, database table, Kinesis, Facebook, YouTube etc.
– The configuration points for the data set rule vary based on the data set type.
– There are a lot of OOTB decisioning data sets available for your exploration.
Please explore the other data set type option on your own. Everything is easy when you spend some time and investigate how it works. Happy learning.