Member since
02-11-2019
81
Posts
3
Kudos Received
0
Solutions
04-15-2020
05:01 PM
Hi @ChineduLB , You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function. val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT") import org.apache.spark.sql.expressions.Window df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()
... View more
03-22-2020
03:12 PM
@ChineduLB , Did you mean that you got the same error while trying to export the sample data you provided earlier? Have you tried to update your driver in case it might be old? Cheers Eric
... View more
01-20-2020
03:17 AM
@ChineduLB What is your exact Query? You can write count Queries SQL for Hive table. In general you can refer below articles: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/performance-tuning/content/hive_prepare_to_tune_performance.html https://www.qubole.com/blog/5-tips-for-efficient-hive-queries/ Thanks, Tamil Selvan K
... View more
12-19-2019
11:56 AM
We don't have any indexes or collection in this cluster. Just trying to use it for Solr Search functions now. We were just running the tutorial to validate the Solr service configuration. Any steps we can take to re-initialize the service as new and get rid of any left over artifacts from the 5.14 will also be good as we don't have anything in the Search service yet. Regards
... View more
12-19-2019
11:56 AM
We don't have any indexes or collection in this cluster. Just trying to use it for Solr Search functions now. We were just running the tutorial to validate the Solr service configuration. Any steps we can take to re-initialize the service as new and get rid of any left over artifacts from the 5.14 will also be good as we don't have anything in the Search service yet. Regards
... View more
12-17-2019
10:06 PM
Please check if userid 'solr' is the member of "supergroup"; If not add solr into supergroup.
... View more
11-28-2019
07:30 PM
I ended up creating a new column in new data frame via withColumn and used regex to populate the new column with the trimmed vals thanks
... View more
10-17-2019
01:26 AM
Hi @ChineduLB, UDFs let you code your own application logic for processing column values during an Impala query. Adding a refresh/invalidate to it could cause unexpected behavior during value processing. A general recommendation for Invalidate metadata/Refresh is to execute it after the ingestion finished. This way the Impala user does not have to worry about the staleness of the metadata. There is a blogpost on how to handle "Fast Data" and make it available to Impala in batches: https://blog.cloudera.com/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ Additionally, just wanted to mention that the Invalidate metadata/Refresh can be executed from beeline as well, just need to connect from beeline to Impala, this blogpost has the details: https://www.ericlin.me/2017/04/how-to-use-beeline-to-connect-to-impala/
... View more
09-16-2019
03:14 PM
@ChineduLB No you can't, you can only save data into temp tables, or simply use sub-query instead. Cheers Eric
... View more