About shalini_goel

shalini_goel · ‎06-25-2020

I will check our spark 2.4.5 application code compatibility with spark 2.3.2 version. Is Ambari & HDP going to be discontinued in near future as part of cloudera and hortonworks merger going ? We need to plan our choice of softwares accordingly.

shalini_goel · ‎06-25-2020

Thanks for the reply. Can we install hadoop & spark 2.4.5 packages on multi node cluster without using hdp, ambari & cloudera ? We already have spark applications running on spark 2.4.5 version and we do not want to go back to backward versions. Even we are planning to upgrade them soon to spark 3 because of better delta lake compatibility. If we install hadoop and spark packages manually on each node of the cluster, can there be any maintanance issues at later stage in production ?

shalini_goel · ‎06-24-2020

Hi, I need to setup a 5 node cluster with Hadoop 3.1.0 and Spark 2.4.5 . Someone recommended to use Ambari to do so. I checked Ambari but it seems Ambari can be used only to install HDP and latest HDP do not support Spark 2.4.5 version. Please suggest in this aspect, what will be the best way to setup the required big data cluster.

shalini_goel · ‎08-13-2017

Use mapPartitions if we want to add header in all files or if there is single partition. topPriceResultsDF .map(x => x.mkString(",")) .mapPartitions(iter => Iterator(header) ++ iter) .saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40resultsWithHeader.csv") Use mapPartitionsWithIndex if we want to add header in only first file topPriceResultsDF.map(x => x.mkString(",")) .repartition(2) .mapPartitionsWithIndex ({ case (0, iter) => Iterator(header) ++ iter case (_, iter) => iter }) .saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40resultsWithHeader.csv")

shalini_goel · ‎08-11-2017

Hi All, How can we add a header to Spark SQL Query results before saving the results in a textfile? Spark version is 1.6 val topPriceResultsDF = sqlContext.sql("SELECT * FROM retail_db.yahoo_stock_orc WHERE open_price > 40 AND high_price > 40 ORDER BY date ASC") topPriceResultsDF.map(x => x.mkString(",")).saveAsTextFile("/user/sparkuser/myspark/data/output/yahoo_above40_results(comma).csv") It saves only data but I need to add header like (date,open_price,high_price,low_price,close_price,volume,adj_price) as well . Please help if anyone has idea !! I cannot use databricks library. O/P should be like date,open_price,high_price,low_price,close_price,volume,adj_price 1997-07-09,40.75008,45.12504,40.75008,43.99992,37545600,1.83333 Thanks !!

shalini_goel · ‎04-01-2017

I got the same issue in hortonworks sandbox environment. Script was correct but was throwing this error Unable to open iterator foralias I found Jobhistory server was not working by default. I could not relate the connection between the two but after starting histoyserver , my pig script worked in both tez and mapreduce mode. Try it if it works for yoou as well. [mapred@sandbox ~]$ cd /usr/hdp/current/hadoop-mapreduce-historyserver/sbin [mapred@sandbox sbin]$ ls mr-jobhistory-daemon.sh [mapred@sandbox sbin]$ mr-jobhistory-daemon.sh start historyserver

shalini_goel · ‎03-30-2017

Hi All, I have downloaded millionsongssubset data from http://static.echonest.com/millionsongsubset_full.tar.gz and tried to upload it and print sample songs = LOAD '/user/root/datasets/millionsongsubset_full.tar.gz' songs_limit = LIMIT songs 10; DUMP songs_limit; Records are displayed as below. Please suggest how to upload above downloaded data in right format grunt> DUMP songs_limit;2017-03-30 05:03:24,383 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT2017-03-30 05:03:24,458 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized2017-03-30 05:03:24,459 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}2017-03-30 05:03:24,474 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 12017-03-30 05:03:24,479 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized2017-03-30 05:03:24,507 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 12017-03-30 05:03:24,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 12017-03-30 05:03:24,524 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor [.gz]2017-03-30 05:03:24,609 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://sandbox.technocrafty:8020/tmp/temp-607255022/tmp-1815565156/_temporary/0/task__0001_m_0000012017-03-30 05:03:24,646 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized2017-03-30 05:03:24,655 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 12017-03-30 05:03:24,656 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1(MillionSongSubset/0000755000175000017500000000000011516357374014450 5ustar thierrythierryMillionSongSubset/AdditionalFiles/0000755000175000017500000000000011516366075017501 5ustar thierrythierryMillionSongSubset/AdditionalFiles/subset_unique_tracks.txt0000644000175000017500000317201311516365717024515 0ustar thierrythierryTRAAAAW128F429D538<SEP>SOMZWCG12A8C13C480<SEP>Casual<SEP>I Didn't Mean To)(TRAAABD128F429CF47<SEP>SOCIWDW12A8C13D406<SEP>The Box Tops<SEP>Soul Deep)(TRAAADZ128F9348C2E<SEP>SOXVLOJ12AB0189215<SEP>Sonora Santanera<SEP>Amor De Cabaret)(TRAAAEF128F4273421<SEP>SONHOTT12A8C13493C<SEP>Adam Ant<SEP>Something Girls)(TRAAAFD128F92F423A<SEP>SOFSOCN12A8C143F5D<SEP>Gob<SEP>Face the Ashes)(TRAAAMO128F1481E7F<SEP>SOYMRWW12A6D4FAB14<SEP>Jeff And Sheri Easter<SEP>The Moon And I (Ordinary Day Album Version))(TRAAAMQ128F1460CD3<SEP>SOMJBYD12A6D4F8557<SEP>Rated R<SEP>Keepin It Real (Skit))(TRAAAPK128E0786D96<SEP>SOHKNRJ12A6701D1F8<SEP>Tweeterfriendly Music<SEP>Drop of Rain)(TRAAARJ128F9320760<SEP>SOIAZJW12AB01853F1<SEP>Planet P Project<SEP>Pink World)(TRAAAVG12903CFA543<SEP>SOUDSGM12AC9618304<SEP>Clp<SEP>Insatiable (Instrumental Version)) Thanks !!

shalini_goel · ‎03-21-2017

Thanks Jay !! I got it. I need not to make change in Ambari, I can do it simply through CLI. It worked. Making change in Ambari was not reflected in "/etc/hive/conf/hive-site.xml". I don't know why. hive> set hive.exec.post.hooks; hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook, org.apache.atlas.hive.hook.HiveHook hive> set hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook; hive> select current_database(); OK default Time taken: 3.074 seconds, Fetched: 1 row(s)

shalini_goel · ‎03-21-2017

hive> select current_database(); FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerDatabase(HiveMetaStoreBridge.java:109) at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerTable(HiveMetaStoreBridge.java:270) at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:309) at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:202) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:160) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

shalini_goel · ‎03-21-2017

In Advanced Settings > General > Property name "hive.exec.post.hooks" , I removed "org.apache.atlas.hive.hook.HiveHook" entry Still same error !

Online	Offline
Last Visited	‎10-19-2023 07:04 AM

Member Since	‎03-21-2017 06:26 AM
Last Visited	‎10-19-2023 07:04 AM
Posts	18
Kudos received	2

Cloudera Community

Re: Hadoop 3.1.0 and Spark 2.4.5 installation usin...

Re: Hadoop 3.1.0 and Spark 2.4.5 installation usin...

Hadoop 3.1.0 and Spark 2.4.5 installation using Am...

Re: How to append column header to Spark SQL query...

How to append column header to Spark SQL query res...

Re: PigStorage in mapreduce mode

Load MillionSongsSubset data in Pig

Re: current_database() function not working in Hiv...

Re: current_database() function not working in Hiv...

Re: current_database() function not working in Hiv...