<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Kudu Columns Null Everywhere in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90579#M12176</link>
    <description>&lt;P&gt;Mystery solved: &lt;STRONG&gt;the columns are not null&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;I am new to all this. What happened was that&amp;nbsp;I ran queries in the Impala CLI like so:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;SELECT * FROM table_name WHERE pk_col = some_value;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;This returns the values of the PK and a lot of empty spaces, so I thought all those were NULLs.&lt;/P&gt;&lt;P&gt;Even this query behaves the same way:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;SELECT&amp;nbsp;col1, col2&amp;nbsp;FROM table_name WHERE pk_col = some_value;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;The col1 value gets printed, but not col2. But this works as expected:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;SELECT&amp;nbsp;col2&amp;nbsp;FROM table_name WHERE pk_col = some_value;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;What I am learning here is that Kudu is a columnar data store and (for some reason) you cannot query more than one column at a time.&lt;/P&gt;&lt;P&gt;But &lt;STRONG&gt;the data is there, and if you query it one column at a time, you can see it&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Not sure why Impala has to behave this way when querying a Kudu table (e.g. throw no errors / give no hint of what is going on), but I am new to all of this.&lt;/P&gt;</description>
    <pubDate>Thu, 16 May 2019 21:55:09 GMT</pubDate>
    <dc:creator>RaduManolescu</dc:creator>
    <dc:date>2019-05-16T21:55:09Z</dc:date>
    <item>
      <title>Kudu Columns Null Everywhere</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90576#M12174</link>
      <description>&lt;P&gt;I am trying to copy a table from an Oracle DB to an Impala table having the same structure, in Spark, through Kudu. This is intended to be a 1-to-1 copy of data from Oracle to Impala. We found that all the rows have been copied, but every column except for the PK (partition key) is null everywhere. Why would that happen?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have extracted the Oracle schema of the source table and created a target Impala table with the same structure (same column names, converted to lower case, and a reasonable mapping of data types). The PK of the Oracle table is the PK and "PARTITION BY HASH" of the Impala table. The Impala table is "STORED AS KUDU".&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;We used Spark to read the data from Oracle. Then we used a kuduContext to insert the data into Impala. No errors were raised, the row counts match, and we can find the same values in the table PK in Oracle and in Impala. But in Impala, every column except the PK is null everywhere. The PK which gets populated correctly) is a NUMBER in Oracle and an int64 in Impala. Other columns of the same type end up being null. &lt;STRONG&gt;How can we troubleshoot this?&lt;/STRONG&gt;&lt;BR /&gt;Attaching the (anonymized) code we used. You can see also &lt;A href="https://stackoverflow.com/questions/56156382/spark-dataframe-cast-column-for-kudu-compatibility/56171112#56171112" target="_blank" rel="noopener"&gt;details at StackOverflow&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:23:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90576#M12174</guid>
      <dc:creator>RaduManolescu</dc:creator>
      <dc:date>2022-09-16T14:23:43Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Columns Null Everywhere</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90577#M12175</link>
      <description>&lt;P&gt;Attachments don't seem to work, so here is the code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;import com.myco.util.config.AppConfig&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;import com.myco.util.jdbc.{ExtractStructure, Table}&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;import org.apache.kudu.spark.kudu._&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;import org.apache.spark.sql.{Dataset, SparkSession}&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;object TableNameToKudu {&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp; def main(args: Array[String]): Unit = {&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val appConfig: AppConfig = AppConfig()&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val dataExport = appConfig.dataExport&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val dataModel: Map[String, Table] = ExtractStructure.dataModel(appConfig.config)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val tableNameTable: Table = dataModel("table_name")&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val colNamesLower: Seq[String] = tableNameTable.columnNames.map(_.toLowerCase)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val customSchema: String = tableNameTable.toSparkSchema.mkString(", ")&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val spark = SparkSession.builder.appName("Save TableName in Kudu format")&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .config("spark.master", "local")&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .config("spark.sql.warehouse.dir", "hdfs://server.name.myco.com:1234/user/hive/warehouse")&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .getOrCreate()&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val kuduContext = new KuduContext(appConfig.kuduMaster, spark.sparkContext)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val minId = 1L&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val maxId = 400000000000L&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val step = (maxId - minId) / dataExport.getInt("numParts")&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val partitions = Range.Long(minId, maxId, step).map { start =&amp;gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; val end = start + step&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; s"$start &amp;lt;= equity_uid AND equity_uid &amp;lt; $end"&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }.toArray&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val props = appConfig.jdbcProps&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; props.put("customSchema", customSchema)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val startTime = System.currentTimeMillis()&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Need this import to get an implicit encoder to make this compile: ".as[case_class_for_table]"&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; import spark.implicits._&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val df: Dataset[case_class_for_table] = spark.read&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .option("fetchsize", appConfig.fetchsize.toString)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .option("driver", appConfig.jdbcDriver)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .jdbc(appConfig.dbURL, "schema_name.table_name", partitions, props)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .as[case_class_for_table]&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; kuduContext.insertRows(df.toDF(colNamesLower: _*), "impala::schema_name.table_name")&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; val endTime = System.currentTimeMillis()&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; println("TOTAL TIME = " + (endTime - startTime))&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;&amp;nbsp; }&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="2"&gt;}&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2019 21:08:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90577#M12175</guid>
      <dc:creator>RaduManolescu</dc:creator>
      <dc:date>2019-05-16T21:08:31Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Columns Null Everywhere</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90579#M12176</link>
      <description>&lt;P&gt;Mystery solved: &lt;STRONG&gt;the columns are not null&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;I am new to all this. What happened was that&amp;nbsp;I ran queries in the Impala CLI like so:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;SELECT * FROM table_name WHERE pk_col = some_value;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;This returns the values of the PK and a lot of empty spaces, so I thought all those were NULLs.&lt;/P&gt;&lt;P&gt;Even this query behaves the same way:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;SELECT&amp;nbsp;col1, col2&amp;nbsp;FROM table_name WHERE pk_col = some_value;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;The col1 value gets printed, but not col2. But this works as expected:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;SELECT&amp;nbsp;col2&amp;nbsp;FROM table_name WHERE pk_col = some_value;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;What I am learning here is that Kudu is a columnar data store and (for some reason) you cannot query more than one column at a time.&lt;/P&gt;&lt;P&gt;But &lt;STRONG&gt;the data is there, and if you query it one column at a time, you can see it&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Not sure why Impala has to behave this way when querying a Kudu table (e.g. throw no errors / give no hint of what is going on), but I am new to all of this.&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2019 21:55:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90579#M12176</guid>
      <dc:creator>RaduManolescu</dc:creator>
      <dc:date>2019-05-16T21:55:09Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Columns Null Everywhere</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90599#M12177</link>
      <description>&lt;P&gt;You can most certainly project more than one column at a time in an Impala query, be it from a table in Kudu or from HDFS. Based on your problem description, it almost sounds like a problem with your terminal, or with the impala-shell configuration. Have you looked at the&amp;nbsp;&lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_shell_options.html" target="_self"&gt;impala-shell configuration options&lt;/A&gt;? Maybe something there can help solve the problem.&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 04:52:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Columns-Null-Everywhere/m-p/90599#M12177</guid>
      <dc:creator>adar</dc:creator>
      <dc:date>2019-05-17T04:52:51Z</dc:date>
    </item>
  </channel>
</rss>

