About ChineduLB

Shu_ashu · ‎04-15-2020

Hi @ChineduLB , You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function. val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT") import org.apache.spark.sql.expressions.Window df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()

EricL · ‎03-22-2020

@ChineduLB , Did you mean that you got the same error while trying to export the sample data you provided earlier? Have you tried to update your driver in case it might be old? Cheers Eric

jsensharma · ‎01-31-2020

@ChineduLB Are you able to get this working?

tsk · ‎01-20-2020

@ChineduLB What is your exact Query? You can write count Queries SQL for Hive table. In general you can refer below articles: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/performance-tuning/content/hive_prepare_to_tune_performance.html https://www.qubole.com/blog/5-tips-for-efficient-hive-queries/ Thanks, Tamil Selvan K

ChineduLB · ‎12-19-2019

We don't have any indexes or collection in this cluster. Just trying to use it for Solr Search functions now. We were just running the tutorial to validate the Solr service configuration. Any steps we can take to re-initialize the service as new and get rid of any left over artifacts from the 5.14 will also be good as we don't have anything in the Search service yet. Regards

ChineduLB · ‎12-19-2019

We don't have any indexes or collection in this cluster. Just trying to use it for Solr Search functions now. We were just running the tutorial to validate the Solr service configuration. Any steps we can take to re-initialize the service as new and get rid of any left over artifacts from the 5.14 will also be good as we don't have anything in the Search service yet. Regards

senthh · ‎12-17-2019

Please check if userid 'solr' is the member of "supergroup"; If not add solr into supergroup.

ChineduLB · ‎11-28-2019

I ended up creating a new column in new data frame via withColumn and used regex to populate the new column with the trimmed vals thanks

tmater · ‎10-17-2019

Hi @ChineduLB, UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing. A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches: https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details: https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/

EricL · ‎09-16-2019

@ChineduLB No you can't, you can only save data into temp tables, or simply use sub-query instead. Cheers Eric

Online	Offline
Last Visited	‎05-21-2024 09:00 AM

Member Since	‎02-11-2019 07:55 AM
Last Visited	‎05-21-2024 09:00 AM
Posts	81
Kudos received	3

Cloudera Community

Re: Get column values in comma separated value

Re: Export Hive Parquet table data to Teradata: In...

Re: ClassNotFoundException Spark2-submit

Re: get counts of rows meeting different filter cr...

Re: Cloudera Search Tutorial Error

Re: Cloudera Search Tutorial Error

Re: Unable to create core test_collection_shard1_r...

Re: Remove Leading zeros from column in Dataframe...

Re: Impala view based on UDF

Re: Store intermediate Query Result to variable in...