Member since
02-15-2016
17
Posts
4
Kudos Received
0
Solutions
12-16-2016
01:58 PM
Hi Gurus, It may not be practical question, however, I am wandering, if it is possible to load data in a Bucketed Table (Non-Partitioned) through insert-overwrite. I am getting NullPointerException while I am trying to do so. CREATE TABLE my_stg.mytable1 (
employee_id int,
employee_name string,
dept STRING,
country STRING
)
CLUSTERED BY (employee_id) INTO 256 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; set hive.enforce.bucketing = true;
INSERT OVERWRITE TABLE my_stg.mytable1 SELECT employee_id,employee_name,dept,country FROM my_stg.mytable;
FAILED: NullPointerException null Thanks, Soumya
... View more
Labels:
- Labels:
-
Apache Hive
12-10-2016
09:55 AM
Thanks. From the link below, I found the explanation - http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.
... View more
12-10-2016
07:18 AM
Hi All, I am trying to read a valid Json as below through Spark Sql.
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]} My Code is like below : >>> from pyspark.sql import SparkSession
>>> spark = SparkSession \
... .builder \
... .appName("Python Spark SQL basic example") \
... .config("spark.some.config.option", "some-value") \
... .getOrCreate()
>>> df = spark.read.json("/Users/soumyabrata_kole/Documents/spark_test/employees.json")
>>> df.show()
+---------------+---------+--------+
|_corrupt_record|firstName|lastName|
+---------------+---------+--------+
| {"employees":[| null| null|
| null| John| Doe|
| null| Anna| Smith|
| null| Peter| Jones|
| ]}| null| null|
+---------------+---------+--------+
>>> df.createOrReplaceTempView("employees")
>>> sqlDF = spark.sql("SELECT * FROM employees")
>>> sqlDF.show()
+---------------+---------+--------+
|_corrupt_record|firstName|lastName|
+---------------+---------+--------+
| {"employees":[| null| null|
| null| John| Doe|
| null| Anna| Smith|
| null| Peter| Jones|
| ]}| null| null|
+---------------+---------+--------+
>>> As per my understanding, there should be only two columns -firstName and lastName. Is it a wrong understanding ? Why _corrupt_record is coming and how to avoid it ? Thanks and Regards, Soumya
... View more
Labels:
- Labels:
-
Apache Spark
03-01-2016
08:53 AM
2 Kudos
Hi, I was trying to load a file in Pig which contains data like : {(3),(mary),(19)} {(1),(john),(18)} {(2),(joe),(18)} Following command is falling : A = LOAD 'data3' AS (B: bag {T: tuple(t1:int), F:tuple(f1:chararray), G:tuple(g1:int)}); How to do it in correct way ? Thanks, Soumya
... View more
Labels:
- Labels:
-
Apache Pig
02-18-2016
03:31 PM
1 Kudo
Thanks Neeraj for your answer. However, I could not find how to enable ACID transactions from the link -https://hortonworks.app.box.com/files/0/f/2070270300/1/f_37967540402 Also other links which are present in the page of above link are not working. Could you please tell me the steps to enable ACID transactions. Thanks again ! Soumya
... View more
02-18-2016
03:01 PM
1 Kudo
Hi Experts, I was trying to do insert,update and delete in a Hive table.
Though insert worked for me update and delete didn't worked. I set following properties before executing any DDL/DML : set hive.support.concurrency=true; set hive.enforce.bucketing=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.compactor.initiator.on=true; set hive.compactor.worker.threads=1; Then following table created : CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC
TBLPROPERTIES ('transactional'='true'); Following insert worked : INSERT INTO TABLE students
VALUES ('AA', 23, 1.28), ('BB', 32, 2.32); Following update/delete are falling : UPDATE students SET gpa = 3.12 WHERE name='AA';
delete from students WHERE age=32; Could you please help me to understand the issue ? Hive version is as below - [hdfs@sandbox ~]$ hive --version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Hive 1.2.1.2.3.2.0-2950
Subversion git://c66-slave-20176e25-6/grid/0/jenkins/workspace/HDP-2.3-maint-centos6/bigtop/build/hive/rpm/BUILD/hive-1.2.1.2.3.2.0 -r c67988138ca472655a6978f50c7423525b71dc27 Compiled by jenkins on Wed Sep 30 19:07:31 UTC 2015 Thanks, Soumya
... View more
Labels:
- Labels:
-
Apache Hive