Member since
10-28-2016
392
Posts
7
Kudos Received
20
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2716 | 03-12-2018 02:28 AM | |
4259 | 12-18-2017 11:41 PM | |
3029 | 07-17-2017 07:01 PM | |
2119 | 07-13-2017 07:20 PM | |
6272 | 07-12-2017 08:31 PM |
12-18-2017
11:27 PM
i'm setting up Kafka Mirror Maker - to transfer data between 2 clusters, and somehow it seems the MirrorMaker is not able to pull data from the Producer cluster. Here is the command used : $CONFLUENT/bin/kafka-mirror-maker --consumer.config $CONFLUENT/mprops/mmConsumer-qa.properties --producer.config $CONFLUENT/mprops/mmProducer-qa.properties --whitelist="mmtest" --new.consumer --num.streams 4 mmConsumer-qa.properties :
bootstrap.servers=localhost:7092, localhost:7082,localhost:7072
group.id=mmtest
client.id=mm_consumer
mmProducer-qa.properties :
bootstrap.servers=localhost:8092
acks=1
batch.size=100
client.id=mm_producer
linger.ms=5
Any ideas on how to debug/fix this ?
... View more
Labels:
- Labels:
-
Apache Kafka
12-14-2017
11:46 PM
thanks, that helps !
... View more
12-14-2017
09:27 PM
Hello - is there a way to get the YARN logs for an applicationId for a specific period ? when i use command : yarn logs application -applicationId <applicationId> -log_files spark.log .. it is not giving me the older log files (eg. 2 days old log files) any way for to get this log file, w/o having to goto the consolidated yarn resource manager log files btw, the yarn logs retention is set to 30 days in yarn-site.xml Another question : whet s the option -> -log_files used for ? what are the options i can provide for this ?
... View more
Labels:
- Labels:
-
Apache YARN
12-12-2017
07:23 AM
@bkosaraju - .. i re-checked this & the issue seems to be when i include the column - deviceid bigint - in the query
... View more
12-12-2017
07:22 AM
fyi .. i re-checked this & the issue seems to be when i include the column - deviceid bigint - in the query
... View more
12-12-2017
06:54 AM
@bkosaraju - this is the query fired .. select deviceid, devicename, indicatorname, topic_k, partition_k, offset_k from powerpoll where year=2017 and month=12 and day=11 limit 5; There is no column called format, can you pls. clarify what you meant ?
... View more
12-12-2017
12:52 AM
@bkosaraju, @mqureshi - any ideas on this ?
... View more
12-12-2017
12:32 AM
hello - i'm getting error in querying Hive table (data in Parquet format) : select * from powerpoll_k1 where year=2017 and month=12 and day=11 limit 5;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 102.0 failed 4 times, most recent failure: Lost task 0.3 in stage 102.0 (TID 20602, msc02-jag-dn-016.uat.gdcs.apple.com): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionaryat org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:52)at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:274)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)at org.apache.spark.scheduler.Task.run(Task.scala:86)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748) Table DDL is shown below : CREATE EXTERNAL TABLE `powerpoll_k1`(`topic_k` varchar(255), `partition_k` int, `offset_k` bigint, `timestamp_k` timestamp, `deviceid` bigint, `devicename` varchar(50), `deviceip` varchar(128), `peerid` int, `objectid` int, `objectname` varchar(256), `objectdesc` varchar(256), `oid` varchar(50), `pduoutlet` varchar(50), `pluginid` int, `pluginname` varchar(255), `indicatorid` int, `indicatorname` varchar(255), `format` int, `snmppollvalue` varchar(128) COMMENT 'value in kafka avsc', `time` double, `clustername` varchar(50) COMMENT 'rpp or power', `peerip` varchar(50))COMMENT 'external table at /apps/hive/warehouse/amp.db/sevone_power'PARTITIONED BY (`year` int, `month` int, `day` int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'WITH SERDEPROPERTIES ('serialization.format' = '1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION 'hdfs://application/apps/hive/warehouse/amp.db/power'TBLPROPERTIES ('transient_lastDdlTime' = '1513022286') any ideas on what the issue is ?
... View more
Labels:
- Labels:
-
Apache Hive
12-11-2017
11:42 PM
@bkosaraju i'm getting following error in querying the table, any ideas ? 0: jdbc:hive2://msc02-jag-hve-002.uat.gdcs.ap> select deviceid, devicename, indicatorname, topic_k, partition_k, offset_k from powerpoll where year=2017 and month=12 and day=11 limit 5;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 86.0 failed 4 times, most recent failure: Lost task 0.3 in stage 86.0 (TID 19049, msc02-jag-dn-011.uat.gdcs.apple.com): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionaryat org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:52)at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:274)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)at org.apache.spark.scheduler.Task.run(Task.scala:86)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)
... View more
12-11-2017
10:52 PM
thanks @bkosaraju - that worked !
... View more