About gimp077

gimp077 · ‎09-27-2019

Hi Eric, My table is partitioned I was expecting that after I do a refresh on the table I would see the most recent data in the table. However sometimes there is a lag from when the refresh completes to when I see the most recent data. I think invalidate metadata would fix this issue but it will be costly to run on a large table. Thanks

gimp077 · ‎09-25-2019

I have an Oozie Workflow where I am have a job job which loads some data into a table, I refresh the table in Impala and then have an Impala query to export the most recent data in this table to a CSV File. My Problem is that even after doing the Impala refresh I do not get the most recent data only the data for the previous load. For Example I have a process that starts running at 1pm spark job finishes at 1:15pm impala refresh is executed 1:20pm then at 1:25 my query to export the data runs but it only shows the data for the previous workflow which run at 12pm and not the data for the workflow which ran at 1pm. I am using Oozie and cdh 5.15.1. Sample Warning Message Read 972.32 MB of data across network that was expected to be local. Block locality metadata for table '..' may be stale. Consider running "INVALIDATE METADATA ... Thanks

gimp077 · ‎02-23-2018

Is it possible to save the output of an impala query to hdfs . Sample query impala-shell --ssl -i "${load_balancer}" -f "${2}" -o "${3}" Would like to have it saved not to local but to hdfs. Thanks

gimp077 · ‎10-10-2017

I am doing something like this create table test2 stored as parquet as select * from t1; And I would like to make sure that only 2 parquet files are created lets say is this possible somehow. As know there is no predictable threshold for how many files will be created. Thanks

gimp077 · ‎08-22-2017

In Impala 5.7 can I do compute incremental stats on dynamic partiitons like compute incremental stats table partition(id>1 and id<10) or with a where clause somewhere. I receive an error requires = identifer not allowed >. Is there way to compute stats for specific partitions and not others. Right now can only do compute incremental stats table partition(iid=1) Thanks

gimp077 · ‎07-28-2017

How do I add comments to a column in an impala table for specific columns after creating it. Thanks

gimp077 · ‎04-02-2017

Properties file # Environment settings queueName = default kerberos_realm = A jobTracker = B:8032 nameNode = hdfs://nameservice1 hive2_server = C hive2_port = 10000 impala_server = D:21000 edge_server = E jobTracker = yarnrm # Project specific paths projectPath = /user/${user.name}/oozie/mediaroom-logs keyTabLocation = /user/${user.name}/keytabs # job path oozie.wf.application.path = ${projectPath}/BXDB/wf # Project specific jars and other libraries oozie.libpath = ${projectPath}/lib,${projectPath}/util # Standard useful properties oozie.use.system.libpath = true oozie.wf.rerun.failnodes = true # Keytab specifics keyTabName = A.keytab keyTabUsername = A focusNodeLoginIng = A focusNodeLogin = A # Email notification list emailList = B xml file <workflow-app xmlns="uri:oozie:workflow:0.4" name="bxdb"> <global> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> </global> <credentials> <credential name="hive2_credentials" type="hive2"> <property> <name>hive2.jdbc.url</name> <value>jdbc:hive2://${hive2_server}:${hive2_port}/default</value> </property> <property> <name>hive2.server.principal</name> <value>hive/${hive2_server}@${kerberos_realm}</value> </property> </credential> </credentials> <start to="sshFileTransfer"/> <action name="sshFileTransfer"> <ssh xmlns="uri:oozie:ssh-action:0.1"> <host>${focusNodeLoginIng}</host>  <command>/A/B/EsdToHDFS.sh</command> <args>A</args> <args> B</args> <args> C</args> <capture-output /> </ssh> <ok to="process-bxdb"/> <error to="sendEmailDQ_SRC"/> </action>    <action name="process-bxdb"> <spark xmlns="uri:oozie:spark-action:0.2"> <master>yarn</master> <mode>cluster</mode> <name>Process BXDB</name> <class>IngestBXDB</class> <jar>bxdb_sproc_cataloguereport-1.0-SNAPSHOT.jar</jar> <spark-opts>--num-executors 8 --executor-cores 2 --executor-memory 4G --driver-memory 4g --driver-cores 2</spark-opts> <arg>${nameNode}/user/hive/warehouse/belltv_lnd.db/bxdb_sproc_cataloguereport</arg> <arg>Hello</arg> <arg>World</arg> </spark> <ok to="impala-refresh-iis"/> <error to="sendEmailDQ_SRC"/> </action>  <action name="impala-refresh-iis"> <shell xmlns="uri:oozie:shell-action:0.3"> <exec>impala-command.sh</exec> <argument>${keyTabName}</argument> <argument>${keyTabUsername}</argument> <argument>${impala_server}</argument> <argument>refresh belltv_expl.bxdb_sproc_cataloguereport</argument> <file>${nameNode}/${keyTabLocation}/${keyTabName}</file> </shell> <ok to="end"/> <error to="fail"/> </action> <action name="sendEmailDQ_SRC"> <email xmlns="uri:oozie:email-action:0.1"> <to>${emailList}</to> <subject>Error in the workflow please verify</subject> <body>BXDB project returned an error please verify</body> </email> <ok to="fail"/> <error to="fail"/> </action> <kill name="fail"> <message>"BXDB ingestion failure"</message> </kill> <end name='end'/> </workflow-app> command to run oozie job -abc.properties -run

gimp077 · ‎03-31-2017

I set up my workflow put it into hdfs as well and I try to run the conf directory the properties file with this syntax. I am really not sure why it is not working if I have a typo in my workflow.xml or job.properties or if I need to modify some config setting. Thanks Error message Here is the link to the error message, https://ibb.co/dkHnJv

gimp077 · ‎02-15-2017

I am not sure what you mean when you say metadata tab as I see no tab named metadata after clicking on the job. Thanks

gimp077 · ‎02-15-2017

thanks for the response really good and detailed could you give a little bit of a lower level response as well say how would I add data from a dataframe in spark to a table in hive effeciently. The goal is to improve the speed by using spark instead of hive or impala for db insertions thanks.

Online	Offline
Last Visited	‎09-27-2019 03:31 PM

Member Since	‎02-01-2017 10:40 AM
Last Visited	‎09-27-2019 03:31 PM
Posts	42
Kudos received	1

Cloudera Community

Re: After Impala Refresh Metadata is still stale

After Impala Refresh Metadata is still stale

Saving output of impala query to hdfs

Limit number of parquet files when doing an insert...

DROP / COMPUTE incremental stats with dynamic part...

Add comments to columns in an impala table

Re: Trying to run an oozie job receiving a null po...

Trying to run an oozie job receiving a null pointe...

Re: View SQL for Hive job

Re: converting hive sql to spark sql