About gbraccialli3

gbraccialli3 · ‎12-28-2015

@Suresh Bonam If you want to use hint, you have to set paremeter below as false (default is true): set hive.ignore.mapjoin.hint=false;

gbraccialli3 · ‎12-28-2015

@Peter Lasne Please, give write access to everyone to folder /tmp/admin/data also, not only file /tmp/admin/data/trucks.csv

gbraccialli3 · ‎12-23-2015

Follow steps below to run this sample spark-shell code to index a Spark Data Frame into a Solr Cloud: 1- Install java 8 in your sandbox 2.3.2 yum install all java-1.8.0-openjdk* 2- Define java 8 as your default java, using alternatives alternatives --config java ##choose /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java 3- Logoff / Logon again exit ssh root@sandbox.hortonworks.com 4- Change java version in spark, in spark-env.sh vi /usr/hdp/current/spark-client/conf/spark-env.sh ##change JAVA_HOME like below: export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-0.b17.el6_7.x86_64 5- Download and build spark-solr git clone https://github.com/LucidWorks/spark-solr.git cd spark-solr mvn clean install -DskipTests 6- Create a sample text file and write into hdfs echo "1,guilherme" > your_text_file echo "2,isabela" >> your_text_file hadoop fs -put your_text_file /tmp/ 7- Start sandbox solr cloud cd ./start_solr.sh 8- Create a solr collection /opt/lucidworks-hdpsearch/solr/bin/solr create -c testsparksolr -d data_driven_schema_configs 7- Open spark-shell with spark-solr package dependency spark-shell --packages com.lucidworks.spark:spark-solr:1.2.0-SNAPSHOT 8- Run spark/scala code below import com.lucidworks.spark.SolrSupport; import org.apache.solr.common.SolrInputDocument; val input_file = "hdfs:///tmp/your_text_file" case class Person(id: Int, name: String) val people_df1 = sc.textFile(input_file).map(_.split(",")).map(p => Person(p(0).trim.toInt, p(1))).toDF() val docs = people_df1.map{doc=> val docx=SolrSupport.autoMapToSolrInputDoc(doc.getAs[Int]("id").toString, doc, null) docx.setField("scala_s", "supercool") docx.setField("name_s", doc.getAs[String]("name")) docx } SolrSupport.indexDocs("sandbox.hortonworks.com:2181", "testsparksolr", 10, docs); val solrServer = com.lucidworks.spark.SolrSupport.getSolrServer("sandbox.hortonworks.com:2181") solrServer.setDefaultCollection("testsparksolr") solrServer.commit(false, false) 9- Check results: PS: If you are using SparkStreaming, you can use method below with a DStream object, instead of indexDocs(): SolrSupport.indexDStreamOfDocs("sandbox.hortonworks.com:2181", "testsparksolr", 10, docs);

gbraccialli3 · ‎12-22-2015

Awesome!

gbraccialli3 · ‎12-22-2015

@pooja khandelwal Try this: 1- Create directory below in all hiveserver2 hosts: mkdir /usr/hdp/current/hive-server2/auxlib 2- Copy your jar to the above folder in all hiveserver2 hosts. 3- Restart all hiveserver2 services 4- Create your UDF without 'using jar' clause (just once): create function HVDB_Analysis_Archived.UDF_ForLogDatatoXML as 'UDF_ForLogDatatoXML'

gbraccialli3 · ‎12-22-2015

Awesome! Marked as favorite post now. 🙂

gbraccialli3 · ‎12-21-2015

@bhaskaran periasamy Try this: 1- Open you ami vnc 2- Open terminal (icon in your ami/vnc desktop) 3- Execute commands below: ssh root@namenode ##password is: hadoop service ambari-agent restart exit ./start_all_services.sh 4- Open ambari (icon in your ami/vnc desktop)

gbraccialli3 · ‎12-19-2015

When I created new queue and refreshed RM it only executed one AM on new queue others got stuck in accept state, solved when I restarted RM, but when I repeated same steps it didn't happen again.

gbraccialli3 · ‎12-18-2015

About #2 I had similar issues with new sessions being stuck with ACCEPTED status in yarn ui for a recently created queue. It happened twice, I'm trying to reproduce it, but seems to be a capacity scheduler issue that you can fix restarting resource manager. After fixing this capacity scheduler issue, default queues and session per queue works fine. Notice that hiveserver2 starts session one-by-one, it may take a while to initialize them all. About #3, I see the same. I will open a support ticket to get engineering on it. I found few old internal jiras related to this, but not with 2.3.2

gbraccialli3 · ‎12-18-2015

Yes, @Neeraj Sabharwal and I tried this few times, but we can't see column statistics, only table level.

Online	Offline
Last Visited	‎09-28-2021 03:33 PM

Member Since	‎09-25-2015 05:42 PM
Last Visited	‎09-28-2021 03:33 PM
Posts	230
Kudos received	236

Cloudera Community

Re: How to reset Ambari Admin password?

Re: Connection Refused trying to access port 8000 ...

Re: Flume + Knox

Re: Ambari stuck with "Install Pending" when creat...

Re: HDP 2,3.4- Running jobs is not getting display...

Re: Why hive query with hint cannot be converted i...

Re: Permission Denied for user=hive on LOAD DATA I...

Spark DataFrame to Solr Cloud - runs on Sandbox 2....

Re: Handling multiple records in hive

Re: Error executing a hive UDF through jbdc

Re: How to find nth highest salary in hive.

Re: In Hortonworks HDPCDeveloper_2.2 PracticeExam_...

Re: Hive Default Queue Won't Work?

Re: Hive Default Queue Won't Work?

Re: Viewing Hive Column or Table level Statistics