Kafka – Part 4 – Data sets in Pega

December 10, 2024 Code Vault Curators

In this blog article, we will see some basics about data set rule using YouTube as the data source.

This tutorial is implemented using Pega personal edition 8.4, but the core concepts remain the same in higher versions as well

It is recommended to go through previous blog articles on Kafka fundamentals before proceeding here. You can go through the series in order.

https://myknowtech.com/tag/kafka

What is a data set rule?

– The name speaks itself. It is a collection or a set of data.

– The source data can be from different systems and can be of different formats.

– Data set is a rule instance that belongs to category – Data model and of applies to class Rule-Decision-DataSet

In the above picture, you see there are different types of data set rule available that can source the data from different systems like – Kafka, Hadoop, Amazon kinesis, social network platforms, database tables etc.

In the later part of the Kafka series, we will have a real business scenario to make use of the Kafka data set rule. For now, let’s have a fun scenario 🙂

Scenario: Create a YouTube data set to filter the YouTube videos that are related to Chelsea Football club.

How to create a new data set rule?

Step 1: Create a new data set rule

Records -> Data Model ->Data Set -> Create new

You see the Data set type is of four category

DBM (database management)
File system
General category
Social
Stream

The configuration points of the data set rule vary based on the data set type.

I would recommend you go through the other OOTB decisioning data set rules to see different configuration points for different data set types.

For now, we will use the YouTube Channel.

Important note: When you use Social category, your applies to class should be Data-Social-YouTube/Facebook

Step 2: Fill out the configuration points in the YouTube data set.

Access details

In this block, you can provide the YouTube access details.

Google API key –

We know that YouTube is part of Google’s world and so we need a Google API key, to access YouTube services.

Just browse in Google, there are many good articles around to get the Google API key

In the short, the steps will be

Login to google developer console – Create new project – Enable Youtube data API – Create credentials.

Input the API key in the data set rule form.

Retrieve Video URL & Retrieve comments – the name explains it. You can use the checkbox to retrieve the Video URL and comments.

For now, I am interested only in the video URL and not in the comments.

Keywords

In this block, you can add one or more keywords, that can be used to filter the YouTube videos.

As per my requirement, I am going to filter only the video metadata that has kthe eyword “Chelsea FC”

Authors

In this block, you can specify user names, from whom the videos or comments are to be ignored.

We know that on social platforms, you will see all types of videos, and comments. So it is always good to have an option to ignore a few authors in your data set.

For my use case, I keep it empty.

Step 3: Save the rule.

Step 4: Run the rule manually.

In the run window, you can browse the first 50 results.

There you see the data set results for Chelsea FC videos 🙂

You see some interesting attributes like Author, authorhandle, publisherid etc. You can verify all those details in the Google YouTube data API documentation.

We saw before that data sets fall under different category and type (like YouTube, Facebook etc)