Reply
vys
New Contributor
Posts: 1
Registered: ‎04-16-2019

cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

[ Edited ]

 

Hi,

 

I have DAT file as input and I am doing dq check for nulls 

mydata{
input {
type = filesystem
path = "hdfs:///TEST.DAT"
format = text
header=true
translator
{
type=delimited
delimiter="|"
field.names=[RECORD_ID,REC_KEY]
field.types=[string,string]
} }
print.schema.enabled=true
}

dqcheck{

dependencies=[mydata]

deriver {
type = dq
scope = row
rules {
r1 {
type = checknulls
fields = [ "RECORD_ID", "REC_KEY" ]

}

it throws below error 

 

java.lang.UnsupportedOperationException: fieldIndex on a Row without schema is undefined.
at org.apache.spark.sql.Row$class.fieldIndex(Row.scala:342)

at org.apache.spark.sql.catalyst.expressions.GenericRow.fieldIndex(rows.scala:166)
at com.cloudera.labs.envelope.derive.dq.CheckForNullsRowRule.check(CheckForNullsRowRule.java:42)

 

 

 

Cloudera Employee
Posts: 50
Registered: ‎08-26-2015

Re: cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

Hi,

 

Thanks for sending in that error. You have hit a bug with the 'text' format of the Filesystem input. The good news is that this will be fixed in the Envelope 0.7.0 release that is due out very soon.

 

The better news is that you will probably find that the 'csv' format for the Filesystem input is easier to use for your data files. You should be able to adapt this for your pipeline:

 

steps {
  mydata {
    input {
      type = filesystem
      path = "TEST.DAT"
      format = csv
      separator = "|"
      field.names = [record_id, rec_key]
      field.types = [string, string]
    }
    print.data.enabled = true
  }

  dqcheck {
    dependencies = [mydata]
    deriver {
      type = dq
      scope = row
      rules {
        r1 {
          type = checknulls
          fields = ["record_id", "rec_key"]
        }
      }
    }
    print.data.enabled = true
  }
}

Jeremy

Master
Posts: 430
Registered: ‎07-01-2015

Re: cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

what is envelope? I noticed that as a concept 2 years ago, is it part of the "product" ?
New Contributor
Posts: 4
Registered: ‎04-19-2019

Re: cloudera envelope - Checknulls Issue - fieldIndex on a Row without schema

Thanks Jeremy . We will try csv.