Member since
03-23-2015
1288
Posts
114
Kudos Received
98
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3341 | 06-11-2020 02:45 PM | |
5043 | 05-01-2020 12:23 AM | |
2845 | 04-21-2020 03:38 PM | |
3556 | 04-14-2020 12:26 AM | |
2340 | 02-27-2020 05:51 PM |
11-14-2017
03:32 AM
Hi Eric, Thanks once again, by updating the view I assume you mean running 'alter view' with the same complex SQL statement via the following process: 1) Run show table to get all the tables 2) Run describe view formatted over each table to see which are views and extract the corresponding complex SQL. 3) For each view, itterate over the complex SQL and determine if they it is selecting from any of the updated tables 4) For each view that needs updating run alter view for each view with the original complex SQL extracted in step 2. I can code this up, but was wondering if there is another way.
... View more
11-03-2017
06:53 AM
1 Kudo
The timestamp column is not "suitable" for a partition (unless you want thousands and thousand of partitions). What is suitable : - is to create an Hive table on top of the current not partitionned data, - create a second Hive table for hosting the partitionned data (the same columns + the partition column), - eventualy load the data from the first table to the second one using a query that will "parse" the timestamp column and extract what should be a suitable value for the partition column (for example the year or the year-and-the-month, ...). Example : INSERT INTO TABLE my_partitioned_table PARTITION (part_col_name) SELECT *, year(to_date(my_timestamp_column)) FROM my_not_partitioned_table; You don't have to put the partition value in the insert statement if you enable dynamic partition in Hive. set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict; And on your sample it's not working properly because you didn't parse the timestamp column, you use it as is. Each unique value will create a partition. For a timestamps, it's almost each value that is unique.
... View more
11-03-2017
05:31 AM
Thanks so much for the help!
... View more
11-02-2017
06:35 AM
1 Kudo
The solution was to put the python script in Hue->Query->Editor->Spark in the Libs field with the full path of the python script example: Libs: /user/userxyz/myscript.py and run the query. Clicking the job_xxxxx link will show if the script ran successfully or not.
... View more
11-01-2017
08:51 PM
This will not work. set hivevar:tab_dt= substr(date_sub(current_date,1),1,10); Only sets variable hivevar:tab_dt to be string "substr(date_sub(current_date,1),1,10)", not the value as the result of evaluation of the function call. You will need get the date string outside of Hive and then pass in as the variable. So below will work: set hivevar:tab_dt=2017_10_01; create table test_${hivevar:tab_dt} (a int);
... View more
10-29-2017
04:07 PM
The good news is even though the shell script didnt work, I was able to run the same python script using Spark Hivecontext using the Spark action in Hue->Workflow instead of Shell action. The shell script is shexample7.sh: ------------------------------------------------- #!/usr/bin/env bash export PYTHONPATH=/usr/bin/python export PYSPARK_PYTHON=/usr/bin/python echo "starting..." /usr/bin/spark-submit --master yarn-cluster pyexample.py The python script is pyexample.py: ----------------------------------------------- #!/usr/bin/env python from pyspark import SparkContext from pyspark.sql import HiveContext sc = SparkContext("local", "pySpark Hive App") # Create a Hive Context hive_context = HiveContext(sc) print "Reading Hive table..." mytbl = hive_context.sql("SELECT * FROM xyzdb.testdata1") print "Registering DataFrame as a table..." mytbl.show() # Show first rows of dataframe mytbl.printSchema() The python job successfully displays the data but somehow the final status comes back as KILLED even though the python script ran and got back data from hive in stdout.
... View more
10-26-2017
03:55 AM
You might want to try map parquet tables by index, rather than column name: SET parquet.column.index.access=true;
... View more
10-19-2017
08:14 PM
Glad that we identified the issue!
... View more
10-18-2017
06:30 PM
ok thanks a lot for the very good info. Will get rid of Sqoop2 thanks!!
... View more
10-18-2017
04:21 AM
1 Kudo
ACID is still not considered as production ready at this stage from Cloudera, as they are still experimental, please see our doc below: https://www.cloudera.com/documentation/enterprise/latest/topics/hive_ingesting_and_querying_data.html
... View more