Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How do I specify which enrichment fields to use when enriching source data

Highlighted

How do I specify which enrichment fields to use when enriching source data

New Contributor

I have loaded in some data from Active Directory to HBase via a CSV file using flatfile_loader.sh. My AD data has 20+ columns including all those relevant to user accounts in my environment. I am likely to need this enrichment data for a number of sensors, with different fields from the AD data being relevant in different cases. I have pasted the extractor JSON definition at the bottom of this post.

After reading a number of documents about how to enrich source data, it is unclear to me how I would select specific fields for an enrichment. For example, in one sensor I might want to enrich using the physical address fields, in another sensor I might want data relating to organizational hierarchy (e.g. department & manager).

I have got as far as this with my sensor configuration:

{
  "enrichment": {
    "fieldMap": {
      "hbaseEnrichment": [
        "user"
      ]
    },
    "fieldToTypeMap": {
      "user": [
        "active_directory_by_user"
      ]
    },
    "config": {}
  },
  ...
}

What I want is to pull only the department field from the active_directory_by_user enrichment data/type. As far as I can tell, what I have got so far is going to pull all fields from the AD data, which I don't want: it would double the size of the event.

I can see in the documentation that I could use stellar for more complex use cases (which I was hoping didn't include this), but it isn't clear how I would choose the specific field from the AD data.

I also see that the (currently empty) config: {} section could be of use, but I haven't found out how to use that.

Can anyone help me get just the department field added from the AD data based on user name?


Extractor JSON

In case it helps, here is the JSON describing the CSV data, etc. used in the import to HBase:

{
    "config":  {
                   "columns":  {
                                   "cn":  0,
                                   "company":  1,
                                   "department":  2,
                                   "description":  3,
                                   ...
                                   "user":  18,
                                   "userAccountControl":  19,
                                   "userPrincipalName":  20,
                                   "whenChanged":  21,
                                   "whenCreated":  22
                               },
                   "indicator_column":  "user",
                   "type":  "active_directory_by_user",
                   "separator":  ","
               },
    "extractor":  "CSV"
}