About EricL

douggie · ‎11-14-2017

Hi Eric, Thanks once again, by updating the view I assume you mean running 'alter view' with the same complex SQL statement via the following process: 1) Run show table to get all the tables 2) Run describe view formatted over each table to see which are views and extract the corresponding complex SQL. 3) For each view, itterate over the complex SQL and determine if they it is selecting from any of the updated tables 4) For each view that needs updating run alter view for each view with the original complex SQL extracted in step 2. I can code this up, but was wondering if there is another way.

mathieu.d · ‎11-03-2017

The timestamp column is not "suitable" for a partition (unless you want thousands and thousand of partitions). What is suitable : - is to create an Hive table on top of the current not partitionned data, - create a second Hive table for hosting the partitionned data (the same columns + the partition column), - eventualy load the data from the first table to the second one using a query that will "parse" the timestamp column and extract what should be a suitable value for the partition column (for example the year or the year-and-the-month, ...). Example : INSERT INTO TABLE my_partitioned_table PARTITION (part_col_name) SELECT *, year(to_date(my_timestamp_column)) FROM my_not_partitioned_table; You don't have to put the partition value in the insert statement if you enable dynamic partition in Hive. set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; And on your sample it's not working properly because you didn't parse the timestamp column, you use it as is. Each unique value will create a partition. For a timestamps, it's almost each value that is unique.

chriswalton007 · ‎11-03-2017

Thanks so much for the help!

ebeb · ‎11-02-2017

The solution was to put the python script in Hue->Query->Editor->Spark in the Libs field with the full path of the python script example: Libs: /user/userxyz/myscript.py and run the query. Clicking the job_xxxxx link will show if the script ran successfully or not.

EricL · ‎11-01-2017

This will not work. set hivevar:tab_dt= substr(date_sub(current_date,1),1,10); Only sets variable hivevar:tab_dt to be string "substr(date_sub(current_date,1),1,10)", not the value as the result of evaluation of the function call. You will need get the date string outside of Hive and then pass in as the variable. So below will work: set hivevar:tab_dt=2017_10_01; create table test_${hivevar:tab_dt} (a int);

ebeb · ‎10-29-2017

The good news is even though the shell script didnt work, I was able to run the same python script using Spark Hivecontext using the Spark action in Hue->Workflow instead of Shell action. The shell script is shexample7.sh: ------------------------------------------------- #!/usr/bin/env bash export PYTHONPATH=/usr/bin/python export PYSPARK_PYTHON=/usr/bin/python echo "starting..." /usr/bin/spark-submit --master yarn-cluster pyexample.py The python script is pyexample.py: ----------------------------------------------- #!/usr/bin/env python from pyspark import SparkContext from pyspark.sql import HiveContext sc = SparkContext("local", "pySpark Hive App") # Create a Hive Context hive_context = HiveContext(sc) print "Reading Hive table..." mytbl = hive_context.sql("SELECT * FROM xyzdb.testdata1") print "Registering DataFrame as a table..." mytbl.show() # Show first rows of dataframe mytbl.printSchema() The python job successfully displays the data but somehow the final status comes back as KILLED even though the python script ran and got back data from hive in stdout.

EricL · ‎10-26-2017

You might want to try map parquet tables by index, rather than column name: SET parquet.column.index.access=true;

EricL · ‎10-19-2017

Glad that we identified the issue!

ebeb · ‎10-18-2017

ok thanks a lot for the very good info. Will get rid of Sqoop2 thanks!!

EricL · ‎10-18-2017

ACID is still not considered as production ready at this stage from Cloudera, as they are still experimental, please see our doc below: https://www.cloudera.com/documentation/enterprise/latest/topics/hive_ingesting_and_querying_data.html

Online	Offline
Last Visited	‎08-12-2020 03:17 AM

Member Since	‎03-23-2015 01:24 PM
Last Visited	‎08-12-2020 03:17 AM
Posts	1,288
Kudos received	113

Cloudera Community

Re: max() function generating an error in sqoop

Re: Add a dynamic variable to a Hive view

Re: Hive Server 2 failing to start CDP ,Cloudera M...

Re: Sqoop export from hive to teradata - > issue ...

Re: Cloudera Hadoop internal workings

Re: describe view not updated with additional colu...

Re: Hive partitions based on date from timestamp

Re: How does Hue decide Impala Node as Coordinator

Re: How to run Python script in Hue through oozie

Re: Concatenating variables in Hive

Re: ImportError: No module named pyspark from oozi...

Re: Hive select when column name different in parq...

Re: error_message : You have exceeded your daily r...

Re: What are the "From link" and "To Link" in Sqoo...

Re: ACID and CUD transactions