About bwalter1

bwalter1 · ‎05-10-2016

Try select `_c0` from xml_table

bwalter1 · ‎05-09-2016

Well, this goes now into bash programming: The part between the lines "cat << EOF" and "EOF" is a so called "here doc" that writes the actual pig script. Everything starting with $ is a variable ($0,$1,... are predefined in bash with $0 containing the script/program name, $1 the first actual parameter, $2 the second and so on; $@ gives back all provided parameters joined with a space ' ' by default). Note: Setting variables (e.g. FUN, TAB) is done without $, referencing with $ So you can add any logic before "cat << EOF" to set variables leveraging your input parameters and reference them in the "here doc" to get the pig script you want. For more see e.g. http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html and http://www.tldp.org/LDP/abs/html/index.html (and at many other locations).

bwalter1 · ‎05-09-2016

Have you considered wrapping it into a shell script? Here is a simple example (test.sh) #!/bin/bash TMPFILE="/tmp/script.pig" FUN=$1 # pass the pig function as first parameter TAB=$2 # pass the column as second parameter cat <<EOF > "$TMPFILE" iris = load '/tmp/iris.data' using PigStorage(',') as (sl:double, sw:double, pl:double, pw:double, species:chararray); by_species = group iris by species; result = foreach by_species generate group as species, $FUN(iris.$TAB); dump result; EOF pig -x tez "$TMPFILE" You can call it e.g. as bash ./test.sh MAX sw to get the maximum of column "sw", or bash ./test.sh AVG sl to get the average of column "sl"

bwalter1 · ‎04-22-2016

I understand https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions that Hive ACID (which is necessary for DELETE commands) only works on bucketed ORC tables. I would expect that even the INSERT wouldn't work when you use 'transactional'='true' without being compliant with the mentioned prerequsites If you want to have SQL on HBase I would go for Apache Phoenix (https://phoenix.apache.org/)

bwalter1 · ‎04-22-2016

Set the environment variable HIVE_SERVER to your hive server machine, or use beeline -u jdbc:hive2://this.is.my.hiveserver.name:10000/default?hive.execution.engine=tez -n hive

bwalter1 · ‎04-22-2016

you need to compile it. spark-submit wants the jar file, see http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications

bwalter1 · ‎04-22-2016

And if you just want to quickly see whether hive works you can use command line beeline: beeline -u jdbc:hive2://$HIVE_SERVER:10000/default?hive.execution.engine=tez -n hive

bwalter1 · ‎04-22-2016

Have you tried restarting ambari-server (command line as root: "ambari-server restart"). This the JVM where I think the Hive views runs in ...

bwalter1 · ‎04-22-2016

See http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications "Self-Contained Applications" for all languages

bwalter1 · ‎04-22-2016

Can you try to change that via Ambari Zookeeper config? Ambari will overwrite everything in the config files that you change directly on the filesystem

Online	Offline
Last Visited	‎06-07-2017 08:08 AM

Member Since	‎10-07-2015 10:28 PM
Last Visited	‎06-07-2017 08:08 AM
Posts	107
Kudos received	72

Cloudera Community

Re: Spark and HIVE

Re: Spark SQL 2.0 - performance of Plain SQL query...

Re: ReplaceText Regex to replace double quotes in ...

Re: How could I use pandas library in Pyspark in Z...

Re: Got 401 - Full authentication is required to a...

Re: Derived column created as _c0 in Hive . Cant ...

Re: Transformation with Pig

Re: Transformation with Pig

Re: Tez error when trying to delete from HIVE tabl...

Re: Hive error: H110 Unable to submit statement. I...

Re: How to run the SparkWordCount.scala file in hd...

Re: Hive error: H110 Unable to submit statement. I...

Re: Hive error: H110 Unable to submit statement. I...

Re: How to create jar file from spark scala file?

Re: how do I configure log4j.properties for zookee...