Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

New Contributor

 

Hi,

 

I have DAT file as input and I am doing dq check for nulls 

mydata{
input {
type = filesystem
path = "hdfs:///TEST.DAT"
format = text
header=true
translator
{
type=delimited
delimiter="|"
field.names=[RECORD_ID,REC_KEY]
field.types=[string,string]
} }
print.schema.enabled=true
}

dqcheck{

dependencies=[mydata]

deriver {
type = dq
scope = row
rules {
r1 {
type = checknulls
fields = [ "RECORD_ID", "REC_KEY" ]

}

it throws below error 

 

java.lang.UnsupportedOperationException: fieldIndex on a Row without schema is undefined.
at org.apache.spark.sql.Row$class.fieldIndex(Row.scala:342)

at org.apache.spark.sql.catalyst.expressions.GenericRow.fieldIndex(rows.scala:166)
at com.cloudera.labs.envelope.derive.dq.CheckForNullsRowRule.check(CheckForNullsRowRule.java:42)

 

 

 

3 REPLIES 3

Re: cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

Rising Star

Hi,

 

Thanks for sending in that error. You have hit a bug with the 'text' format of the Filesystem input. The good news is that this will be fixed in the Envelope 0.7.0 release that is due out very soon.

 

The better news is that you will probably find that the 'csv' format for the Filesystem input is easier to use for your data files. You should be able to adapt this for your pipeline:

 

steps {
  mydata {
    input {
      type = filesystem
      path = "TEST.DAT"
      format = csv
      separator = "|"
      field.names = [record_id, rec_key]
      field.types = [string, string]
    }
    print.data.enabled = true
  }

  dqcheck {
    dependencies = [mydata]
    deriver {
      type = dq
      scope = row
      rules {
        r1 {
          type = checknulls
          fields = ["record_id", "rec_key"]
        }
      }
    }
    print.data.enabled = true
  }
}

Jeremy

Highlighted

Re: cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

Master Collaborator
what is envelope? I noticed that as a concept 2 years ago, is it part of the "product" ?

Re: cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

New Contributor

Thanks Jeremy . We will try csv.