Member since
02-11-2019
81
Posts
3
Kudos Received
0
Solutions
11-11-2020
08:15 AM
Have a list of about 100+ SQL Count Queries to run against a Hive Data Table,
Looking for the most efficient way to run these queries.
Queries are accessed at runtime as a list of queries stored in another Hive Table, generated by a different process.
Queries like these each with a different where clause, where clauses are complex:
1. Select count(1) as count1 from MyTable where (... complex where clause here...)
2. Select Count(1) as Count1 from MyTable where (... where clause here ...)
3. etc..
Environment:
Cloudera CDH 6.2
... View more
Labels:
04-15-2020
12:27 PM
Thanks @pauldefusco I would like to do it in spark - scala
... View more
04-15-2020
11:30 AM
I have a source table Like ID USER DEPT 1 User1 Admin 2 User1 Accounts 3 User2 Finance 4 User3 Sales 5 User3 Finance I want to generate a DataFrame like this ID USER DEPARTMENT 1 User1 Admin,Accounts 2 User2 Finance 3 User3 Sales,Finance
... View more
Labels:
- Labels:
-
Apache Spark
03-21-2020
09:35 AM
Still struggling with this... See exception stack below 2020-03-21 12:27:31,694 ERROR [IPC Server handler 10 on 45536] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1584785234978_6403_m_000000_0 - exited : com.teradata.connector.common.exception.ConnectorException: index outof boundary
at com.teradata.connector.teradata.converter.TeradataConverter.convert(TeradataConverter.java:179)
at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.write(ConnectorOutputFormat.java:111)
at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.write(ConnectorOutputFormat.java:70)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at com.teradata.connector.common.ConnectorMMapper.map(ConnectorMMapper.java:134)
at com.teradata.connector.common.ConnectorMMapper.run(ConnectorMMapper.java:122)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
... View more
03-20-2020
09:37 AM
Thanks EricL. At least I know it will work
... View more
03-18-2020
01:04 PM
I'm trying to export data from a hdfs location to teradata. I Have created a table with same schema in teradata Export Command: sqoop export --connect jdbc:teradata://teradataserver/Database=dbname --username xxxx --password xxxx --table teradataTbl --export-dir /hdfs/parquet/files/path/ Exception: 2020-03-18 14:32:00,754 ERROR [IPC Server handler 3 on 41836] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1584475869533_13501_m_000002_0 - exited : com.teradata.connector.common.exception.ConnectorException: index outof boundary at com.teradata.connector.teradata.converter.TeradataConverter.convert(TeradataConverter.java:179) at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.write(ConnectorOutputFormat.java:111) at com.teradata.connector.common.ConnectorOutputFormat$ConnectorFileRecordWriter.write(ConnectorOutputFormat.java:70) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at com.teradata.connector.common.ConnectorMMapper.map(ConnectorMMapper.java:134) at com.teradata.connector.common.ConnectorMMapper.run(ConnectorMMapper.java:122) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
03-06-2020
07:52 AM
Thanks a million. Got same issue after re-adding a host that was removed prior to upgrade from CDH 5.6 to CDH 6.2 Fixed it by deleting /var/lib/cloudera-scm-agent/cm_guid on the node
... View more
01-27-2020
02:04 PM
Hi,
I'm getting this error below while trying to submit a scala jar built in Intellij using maven
spark version 2.3.0
scala version 2.11.11
Command used:
spark2-submit --master="yarn" --deploy-mode="cluster" --queue root.myyarnqueue --executor-memory 12G --driver-memory 12G --class MyClassName /projects/myscala.jar arg_1 arg_2
Error Message
java.lang.ClassNotFoundException: MyClassName at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:239) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 20/01/27 15:54:47 INFO util.ShutdownHookManager: Shutdown hook called
... View more
Labels:
- Labels:
-
Apache Spark
01-18-2020
10:39 PM
What is the most efficient way to get count of records meeting different search criteria from a Hive table.
1. count all records where column-a = Null
2. count all records where column-b in [1, 3, 5]
3. count all records where column-c = 'xxx'
etc.
there are a couple hundred of these counts, in groups of 3 or 4.
... View more
Labels:
- Labels:
-
Apache Hive
01-14-2020
07:43 AM
I'm having same issue.... everything works fine, but these config warning are on all services in cloudera manager.
... View more