Member since
09-25-2015
230
Posts
276
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
25400 | 07-05-2016 01:19 PM | |
8681 | 04-01-2016 02:16 PM | |
2202 | 02-17-2016 11:54 AM | |
5821 | 02-17-2016 11:50 AM | |
12895 | 02-16-2016 02:08 AM |
12-28-2015
07:44 PM
@Suresh Bonam If you want to use hint, you have to set paremeter below as false (default is true): set hive.ignore.mapjoin.hint=false;
... View more
12-28-2015
06:21 PM
2 Kudos
@Peter Lasne Please, give write access to everyone to folder /tmp/admin/data also, not only file /tmp/admin/data/trucks.csv
... View more
12-23-2015
06:17 PM
7 Kudos
Follow steps below to run this sample spark-shell code to index a Spark Data Frame into a Solr Cloud: 1- Install java 8 in your sandbox 2.3.2 yum install all java-1.8.0-openjdk* 2- Define java 8 as your default java, using alternatives alternatives --config java
##choose /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java 3- Logoff / Logon again exit
ssh root@sandbox.hortonworks.com 4- Change java version in spark, in spark-env.sh vi /usr/hdp/current/spark-client/conf/spark-env.sh
##change JAVA_HOME like below:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-0.b17.el6_7.x86_64
5- Download and build spark-solr git clone https://github.com/LucidWorks/spark-solr.git
cd spark-solr
mvn clean install -DskipTests 6- Create a sample text file and write into hdfs echo "1,guilherme" > your_text_file
echo "2,isabela" >> your_text_file
hadoop fs -put your_text_file /tmp/ 7- Start sandbox solr cloud cd
./start_solr.sh 8- Create a solr collection /opt/lucidworks-hdpsearch/solr/bin/solr create -c testsparksolr -d data_driven_schema_configs 7- Open spark-shell with spark-solr package dependency spark-shell --packages com.lucidworks.spark:spark-solr:1.2.0-SNAPSHOT 8- Run spark/scala code below import com.lucidworks.spark.SolrSupport;
import org.apache.solr.common.SolrInputDocument;
val input_file = "hdfs:///tmp/your_text_file"
case class Person(id: Int, name: String)
val people_df1 = sc.textFile(input_file).map(_.split(",")).map(p => Person(p(0).trim.toInt, p(1))).toDF()
val docs = people_df1.map{doc=>
val docx=SolrSupport.autoMapToSolrInputDoc(doc.getAs[Int]("id").toString, doc, null)
docx.setField("scala_s", "supercool")
docx.setField("name_s", doc.getAs[String]("name"))
docx
}
SolrSupport.indexDocs("sandbox.hortonworks.com:2181", "testsparksolr", 10, docs);
val solrServer = com.lucidworks.spark.SolrSupport.getSolrServer("sandbox.hortonworks.com:2181")
solrServer.setDefaultCollection("testsparksolr")
solrServer.commit(false, false)
9- Check results: PS: If you are using SparkStreaming, you can use method below with a DStream object, instead of indexDocs(): SolrSupport.indexDStreamOfDocs("sandbox.hortonworks.com:2181", "testsparksolr", 10, docs);
... View more
Labels:
12-22-2015
02:50 PM
3 Kudos
@pooja khandelwal Try this: 1- Create directory below in all hiveserver2 hosts: mkdir /usr/hdp/current/hive-server2/auxlib 2- Copy your jar to the above folder in all hiveserver2 hosts. 3- Restart all hiveserver2 services 4- Create your UDF without 'using jar' clause (just once): create function HVDB_Analysis_Archived.UDF_ForLogDatatoXML as 'UDF_ForLogDatatoXML'
... View more
12-22-2015
02:35 PM
1 Kudo
Awesome! Marked as favorite post now. 🙂
... View more
12-21-2015
10:58 PM
3 Kudos
@bhaskaran periasamy Try this: 1- Open you ami vnc 2- Open terminal (icon in your ami/vnc desktop) 3- Execute commands below: ssh root@namenode
##password is: hadoop
service ambari-agent restart
exit
./start_all_services.sh 4- Open ambari (icon in your ami/vnc desktop)
... View more
12-19-2015
02:34 AM
1 Kudo
When I created new queue and refreshed RM it only executed one AM on new queue others got stuck in accept state, solved when I restarted RM, but when I repeated same steps it didn't happen again.
... View more
12-18-2015
03:34 PM
1 Kudo
About #2 I had similar issues with new sessions being stuck with ACCEPTED status in yarn ui for a recently created queue. It happened twice, I'm trying to reproduce it, but seems to be a capacity scheduler issue that you can fix restarting resource manager. After fixing this capacity scheduler issue, default queues and session per queue works fine. Notice that hiveserver2 starts session one-by-one, it may take a while to initialize them all. About #3, I see the same. I will open a support ticket to get engineering on it. I found few old internal jiras related to this, but not with 2.3.2
... View more
12-18-2015
01:57 PM
Yes, @Neeraj Sabharwal and I tried this few times, but we can't see column statistics, only table level.
... View more