About gimp077

clumZZey · ‎02-07-2023

Indeed. There is no metadata tab. It is really annoying that it is nearly impossible to find your own queries in this. Also the filtering options are not great.

EricL · ‎10-06-2019

@gimp077 , Did you mean that "REFRESH" takes time, and eventually you can see the update data, but just some delay? How big is the table? I mean in terms of number of partitions and number of files in HDFS? Eric

EricL · ‎09-18-2019

You can do alter like I mentioned before: ALTER TABLE test CHANGE col1 col1 int COMMENT 'test comment'; But I do not think you can remove it, but rather to just empty it. Cheers Eric

StephK · ‎03-14-2018

Try hdfs dfs -ls /

AcharkiMed · ‎02-27-2018

Hi @gimp077 I think there is two ways to do it: 1- You can put the output of impala-query in HDFS after you get it in a system file with PUT HDFS command: sudo -u hdfs hdfs dfs -put "${3}" hdfs_path 2- You can use a directe insert into a result_table (stored in HDFS) just before your select statement: INSERT INTO result_tables YOUR_QUERY

mauricio · ‎10-10-2017

Another option I forgot to mention: if your table is partitioned, and your insert query uses dynamic partitioning, it will generate 1 file per partition: insert into table2 partition(par1,par2) select col1, col2 .. colN, par1, par2 from table1; ... again up to the max parquet file size currently set, so you can play with that max to achieve 2 files per partition. https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_partitioning.html#partition_static_dynamic

alex.behm · ‎08-23-2017

https://issues.apache.org/jira/browse/IMPALA-1570 That feature is available since Impala 2.8 (CDH 5.11)

tseader · ‎04-03-2017

Try addig some arguments into your Oozie run command like so: $ oozie job -oozie http://localhost:11000/oozie -config job.properties -run If those changes don't work for you might try the following: Put your job.properties out in HDFS in the same directory as your workflow, then use Hue FileBrowser to execute the workflow and see if that works. To do that, just checkmark the workflow.xml and a button will appear for you to take action like a submit. Reduce your workflow down to a simple email, then test... add the SSH, then test... keep adding and testing along the way. If things fail at the first and most simple test (email action), then we've eliminated the other actions as being the culprit, and likely quite a few of your job.properties variables too.

mbigelow · ‎02-16-2017

I don't know specifically, but yes, it is most likely because the libraries used were not built for distributed system. For instance, if you had three executors running the code in the library then all three would be reading from the sftp side and directory all vying for the same files and copying them to the destination. It would be a mess.

gimp077 · ‎02-15-2017

thanks for the response really good and detailed could you give a little bit of a lower level response as well say how would I add data from a dataframe in spark to a table in hive effeciently. The goal is to improve the speed by using spark instead of hive or impala for db insertions thanks.

Online	Offline
Last Visited	‎09-27-2019 03:31 PM

Member Since	‎02-01-2017 10:40 AM
Last Visited	‎09-27-2019 03:31 PM
Posts	42
Kudos received	1

Cloudera Community

Re: View SQL for Hive job

Re: After Impala Refresh Metadata is still stale

Re: Add comments to columns in an impala table

Re: Not able to access HDFS, getting Connection ex...

Re: Saving output of impala query to hdfs

Re: Limit number of parquet files when doing an in...

Re: DROP / COMPUTE incremental stats with dynamic ...

Re: Trying to run an oozie job receiving a null po...

Re: sftp transfer to hdfs in spark as opposed to u...

Re: converting hive sql to spark sql