Member since
04-09-2018
35
Posts
0
Kudos Received
0
Solutions
04-29-2019
10:17 AM
@Andrew Lim Is there anything possible for nested xml to multiple csvs ?Attached is the nested xml file ad.xml
... View more
04-25-2019
09:42 AM
@Turker TUNALI Is there anything possible for this xml to csv conversion ?
... View more
04-25-2019
09:41 AM
@Matt Burgess Is there anything possible for this xml to csv conversion ?
... View more
04-23-2019
11:20 AM
Hi Andrew , this works for single xml but I have nested xml. How to convert nested xml to multiple csvs ?
... View more
04-22-2019
11:58 AM
could you please share the nifi template for this. I am not able to achieve this.
... View more
04-05-2019
09:13 AM
Hi Andrew, I am not able to convert xml to csv as suggested by you to do the inverse of this. Could you please provide the nifi xml file so that I can upload the template and check.
... View more
04-02-2019
09:47 AM
After restarting kafka cluster one of the node is not starting up and throwing this error. ERROR [KafkaServer id=1001] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.lang.NoSuchMethodError: org.apache.kafka.common.metrics.Sensor.add(Lorg/apache/kafka/common/MetricName;Lorg/apache/kafka/common/metrics/MeasurableStat;)Z at kafka.server.ClientQuotaManager.<init>(ClientQuotaManager.scala:180) at kafka.server.QuotaFactory$.instantiate(QuotaFactory.scala:64) at kafka.server.KafkaServer.startup(KafkaServer.scala:231) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38) at kafka.Kafka$.main(Kafka.scala:75) at kafka.Kafka.main(Kafka.scala) [2019-04-02 12:24:05,939] INFO [KafkaServer id=1001] shutting down (kafka.server.KafkaServer) [2019-04-02 12:24:05,941] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient) [2019-04-02 12:24:05,944] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient) [2019-04-02 12:24:05,947] INFO [KafkaServer id=1001] shut down completed (kafka.server.KafkaServer) [2019-04-02 12:24:05,948] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
... View more
Labels:
- Labels:
-
Apache Kafka
03-25-2019
01:49 PM
We need to read sql server transnational logs for CDC instead of hitting DB using jdbc . Is that possible ?
... View more
03-25-2019
05:54 AM
Labels:
- Labels:
-
Apache NiFi
01-04-2019
11:12 AM
any one get solution for this ?
... View more
12-21-2018
01:37 PM
@Sidharth Kumar How you resolved this ? Can you share the exact steps or command. In my case is showing like below. hive.server2.zookeeper.namespace=hiveserver2 hive.zookeeper.namespace=hive_zookeeper_namespace
... View more
11-27-2018
02:56 PM
I used the below command to connect Hive Warehouse Connector. spark-shell --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar Then I executed the below query to create a table >>> spark.sql('create table C_DAMAGE_join stored as orc as select tdc.* from T_DAMAGE_CODE CD LEFT JOIN C_DAMAGE TDC ON CD.ID = TDC.DAMAGE_CODE_ID') Now the issue is table created in hive with only metadata , without any data.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
11-27-2018
07:02 AM
I used the below command to connect Hive Warehouse
Connector. spark-shell --jars
/usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar Then I executed the below query to create a table >>> spark.sql('create table C_DAMAGE_join stored
as orc as select tdc.* from T_DAMAGE_CODE CD LEFT JOIN C_DAMAGE TDC ON CD.ID =
TDC.DAMAGE_CODE_ID') Now the issue is table created in hive with only metadata ,
without any data.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
11-14-2018
11:31 AM
After setting up llap I am not able to start HIVESERVER2 INTERACTIVE and its throwing below errors. WARN cli.LlapStatusServiceDriver: Watch mode enabled and got YARN error. Retrying..
LLAP status unknown
--------------------------------------------------------------------------------
WARN cli.LlapStatusServiceDriver: Watch mode enabled and got YARN error. Retrying..
WARN cli.LlapStatusServiceDriver: Watch mode enabled and got YARN error. Retrying..
WARN cli.LlapStatusServiceDriver: Watch mode enabled and got YARN error. Retrying..
raise Fail("Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.")
resource_management.core.exceptions.Fail: Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache YARN
08-14-2018
01:32 PM
Its working now. The format should be "(SELECT * FROM T_DISTRICT_TYPE_test) as abc". Without alias its not working
... View more
08-14-2018
12:40 PM
@Harald Berghoff I am not getting a clear syntax for this. The below code is working fine for table but not for a sql query. I want to load from select query with some where condition , not the complete table. If possible could you please modify the code to support sql query instead of tables. import os from pyspark import SparkConf,SparkContext from pyspark.sql import HiveContext from pyspark.sql import SparkSession from pyspark.sql import Row spark = (SparkSession
.builder
.appName("data_import")
.config("spark.dynamicAllocation.enabled", "true")
.config("spark.shuffle.service.enabled", "true")
.config("spark.sql.parquet.writeLegacyFormat","true")
.enableHiveSupport()
.getOrCreate()) lst = list(["T_DISTRICT_TYPE_test"]); for tbl in lst:
df = spark.read.jdbc("jdbc:sqlserver://10.24.25.25;database=CORE_13_2_TEST;username=core;password=password",tbl)
df.write.format("orc").save("/tmp/orc_query_output_"+tbl) df.write.mode('append').format('orc').saveAsTable(tbl)
... View more
08-14-2018
10:50 AM
Hi, I am running below code to fetch data from sql server tables and loading it to hive tables. import os from pyspark import SparkConf,SparkContext from pyspark.sql import HiveContext from pyspark.sql import SparkSession from pyspark.sql import Row spark = (SparkSession
.builder
.appName("data_import")
.config("spark.dynamicAllocation.enabled", "true")
.config("spark.shuffle.service.enabled", "true")
.config("spark.sql.parquet.writeLegacyFormat","true")
.enableHiveSupport()
.getOrCreate()) df = spark.read.jdbc("jdbc:sqlserver://10.24.25.25;database=CORE_13_2_TEST;username=core;password=password;table=(select * from T_DISTRICT_TYPE_test)") df.write.mode('append').format('orc').saveAsTable(test) But I am getting below error while running this. df = spark.read.jdbc("jdbc:sqlserver://10.24.25.25;database=CORE_13_2_TEST;username=core;password=password;table=(select * from T_DISTRICT_TYPE_test)") TypeError: jdbc() takes at least 3 arguments (2 given)
... View more
Labels:
- Labels:
-
Apache Spark
08-10-2018
07:15 AM
@Sandeep Nemuri This is also not working. Every time a new record is inserted instead of updating the existing record. It means the table in hive now has 2 records , both old and new with the same ID
... View more
08-09-2018
11:46 AM
How to do incremental update when I am loading data from SQL server to hive tables using sqoop without creating extra temporary tables ? Incremental insert is working using below command. --incremental append --check-column id --last-value 5 But update is not working using below --incremental lastmodified --check-column UPDATE_DATE --last-value '2018-07-19 16:14:38'
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
07-18-2018
11:59 AM
@Felix Albani There is still some issue. Tables were exist in hive but I am not able to access it. Its showing below error while I am doing a select * from table. <small>java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1531811351810_0064_1_00, diagnostics=[Task failed, taskId=task_1531811351810_0064_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/t_currency/part-00000-2feb31ba-70a4-40a0-a64f-e976b8dd587a-c000.snappy.parquet
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
<strong>Caused by: java.lang.RuntimeException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://sandbox-hdp.hortonworks.com:8020/apps/hive/warehouse/t_currency/part-00000-2feb31ba-70a4-40a0-a64f-e976b8dd587a-c000.snappy.parquet</strong>
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)</small>
... View more
07-17-2018
12:35 PM
Hi, The below code is not working in Spark 2.3 , but its working in 1.7. Can someone modify the code as per Spark 2.3 import os from pyspark import SparkConf,SparkContext from pyspark.sql import HiveContext conf = (SparkConf() .setAppName("data_import") .set("spark.dynamicAllocation.enabled","true") .set("spark.shuffle.service.enabled","true")) sc = SparkContext(conf = conf) sqlctx = HiveContext(sc) df = sqlctx.load( source="jdbc", url="jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", dbtable="test") ## this is how to write to an ORC file df.write.format("orc").save("/tmp/orc_query_output") ## this is how to write to a hive table df.write.mode('overwrite').format('orc').saveAsTable("test") Error : AttributeError: 'HiveContext' object has no attribute 'load'
... View more
Labels:
- Labels:
-
Apache Spark
07-17-2018
12:28 PM
Hi, The below code is not working in Spark 2.3 , but its working in 1.7. Can someone modify the code as per Spark 2.3 import os from pyspark import SparkConf,SparkContext from pyspark.sql import HiveContext conf = (SparkConf()
.setAppName("data_import")
.set("spark.dynamicAllocation.enabled","true")
.set("spark.shuffle.service.enabled","true")) sc = SparkContext(conf = conf) sqlctx = HiveContext(sc) df = sqlctx.load(
source="jdbc",
url="jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd",
dbtable="test") ## this is how to write to an ORC file
df.write.format("orc").save("/tmp/orc_query_output") ## this is how to write to a hive table
df.write.mode('overwrite').format('orc').saveAsTable("test") Error : AttributeError: 'HiveContext' object has no attribute 'load'
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
07-10-2018
10:42 AM
Will it work for spark version 2.3.0 ? Could you please update it as per this version.
... View more
04-10-2018
02:25 PM
From where I can get the jar for com.ibm.spss.hive.serde2.xml.XmlInputFormat ?
... View more
04-09-2018
01:50 PM
Hi Team, I am facing the same issue and not able to fix. java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
... View more