Member since
10-02-2015
76
Posts
80
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1972 | 11-15-2016 03:28 PM | |
3393 | 11-15-2016 03:15 PM | |
2113 | 07-25-2016 08:03 PM | |
1741 | 05-11-2016 04:10 PM | |
3604 | 02-02-2016 08:09 PM |
11-21-2016
03:26 PM
@Gobi Subramani Driver Program is the process that runs the main() function of the application and creates the Spark Context. The Cluster manger then acquires resources on the cluster. After this an executor process is launched on the resources acquired by the cluster manager. The task/s then gets sent to the individual executors for execution.
... View more
11-16-2016
03:09 AM
By default, zeppelin.spark.maxResult is set to 1000 which means that Spark SQL will result only a total of 1000 results regardless of the size of the dataset. If your notebook hangs, then you should restart the zeppelin service from Ambari. You can then navigate to your notebook and delete the paragraph manually.
... View more
11-16-2016
02:56 AM
Can you paste the output of the stderr log?
... View more
11-16-2016
02:06 AM
@Gundrathi babu You should use this package: https://spark-packages.org/package/HyukjinKwon/spark-xml val selectedData = df.select("author", "_id")
selectedData.write
.format("com.databricks.spark.xml")
.option("rootTag", "books")
.option("rowTag", "book")
.save("newbooks.xml")
... View more
11-15-2016
04:44 PM
@Zeeshan Ahmed This needs to be done after the step: remove version 1.3.1 I personally have not tried it, but you can download the version of Spark you are looking for from here: https://github.com/apache/ambari/tree/2ad42074f1633c5c6f56cf979bdaa49440457566/ambari-server/src/main/resources/common-services/SPARK Create a directory called SPARK in /var/lib/ambari-server/resources/stacks/HDP/2.3/services/ and copy the downloaded contents into this directory. Restart ambari server and you should see this version of Spark as an option in 'Add a service' menu in Ambari. Follow the steps to install the service.
... View more
11-15-2016
03:28 PM
@Zeeshan Ahmed Stop & Delete Spark Ambari service: $ curl -u admin:admin
-H "X-Requested-By:ambari" -i PUT -d '{"RequestInfo":{"context":"Stop
Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}'
http://AMBARI-URL:8080//api/v1/clusters/CLUSTERNAME/services/SPARK
$ curl -u admin:admin
-H "X-Requested-By:ambari" -X DELETE http://AMBARI-URL:8080/api/v1/clusters/CLUSTERNAME/services/SPARK Stop Spark 1.3.1 history server: su
- spark -c
"/usr/hdp/current/spark-client/sbin/stop-history-server.sh" Remove Spark 1.3.1: yum erase "spark*" Add the node where
you want Spark 1.4.1 History Server and Client to run: $ su - root wget -nv
$ http://s3.amazonaws.com/dev.hortonworks.com/HDP/centos6/2.x/BUILDS/2.3.2.0-2950/hdpbn.repo
-O /etc/yum.repos.d/Spark141TP.repo
$ yum install spark_2_3_2_0_2950-master -y
$ conf-select create-conf-dir --package spark --stack-version 2.3.2.0-2950 --conf-version 0 cp
/etc/spark/2.3.0.0-2950/0/* /etc/spark/2.3.2.0-2950/0/
$ conf-select set-conf-dir --package spark --stack-version 2.3.2.0-2950 --conf-version 0
$ hdp-select set spark-client 2.3.2.0-2950
$ hdp-select set spark-historyserver 2.3.2.0-2950 Validate the Spark
installation. As user spark, run SparkPI example: su - spark -c
"cd /usr/hdp/current/spark-client" ./bin/spark-submit
--class org.apache.spark.examples.SparkPi --master yarn-client --num-executors
3 --driver-memory 512m --executor-memory 512m --executor-cores 1
lib/spark-examples*.jar 10
... View more
11-15-2016
03:15 PM
@Anchika Agarwal
Assuming that reading and writing data from Teradata is like MySQL or Postgresql.... You will need to include the JDBC driver for Teradata on the spark classpath.
$ SPARK_CLASSPATH=teradata-jdbc.jar bin/spark-shell
Use the following code in Spark shell. Modify and pass all necessary parameters
scala> val jdbcUsername = "USER_NAME"
scala> val jdbcPassword = "PASSWORD"
scala> val jdbcHostname = "HOSTNAME"
scala> val jdbcPort = port_num
scala> val jdbcDatabase ="DATABASE"
scala> val jdbcUrl = s"jdbc:teradata://${jdbcHostname}:${jdbcPort}/${jdbcDatabase}?user=${jdbcUsername}&password=${jdbcPassword}"
scala> val connectionProperties = new java.util.Properties()
scala> Class.forName("com.teradata.jdbc.Driver")
scala> import java.sql.DriverManager
scala> val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword) connection.isClosed()
scala> sqlContext.table("jdbcDF").withColumnRenamed("table", "table_number") .write .jdbc(jdbcUrl, "tablename", connectionProperties)
... View more
10-22-2016
07:40 PM
What version of Tableau are you using? Spark connection only works with Tableau 9.2 and later.
... View more
10-12-2016
10:58 PM
1 Kudo
@mohamed sabri marnaoui Can you past the full stack trace and the code you are trying to run? You can get the spark job from the Yarn Resource manager UI. Go to Ambari -> Yarn -> QuickLinks -> Resource Manager UI SparkContext can shutdown for many different reasons including code errors.
... View more
09-22-2016
01:49 PM
4 Kudos
1 Challenges managing LAS files Siloed datasets Working with disparate, complex datasets under a traditional analysis model limits innovation and does not allow for the speed required for unconventionals LAS File Volume A single well could have 10s or 100s of LAS files making it difficult to provide a consolidated view for analysis Extrapolating this volume out across 1000s of wells requires an automated approach Manual QC process Identifying out of range data is time consuming and challenging even for experienced geoscientists and petrophysicists Management and storage is expensive What if cost could be reduced from $23/Gb to $.19/Gb; $55 GB could cost $1,200 or $10 Delta is 1-2 orders of magnitude Download Sample Data Set The wellbook concept is about a single view of an oil well and its history- something akin to a "Facebook Wall" for oil wells. This repo is built from data collected and made available by the North Dakota Industrial Commission. I used the wellindex.csv file to obtain a list of well file numbers (file_no), scraped their respective Production, Injection, Scout Ticket web pages, any available LAS format well logfiles, and loaded them into HDFS (/user/dev/wellbook/) for analysis. To avoid the HDFS small files problem I used the Apache Mahout seqdirectory tool for combining my textfiles into SequenceFiles: the keys are the filenames and the values are the contents of each textfile. Then I used a combination of Hive queries and the pyquery Python library for parsing relevant fields out of the raw HTML pages. List of Tables: wellbook.wells -- well metadata including geolocation and owner wellbook.well_surveys -- borehole curve wellbook.production -- how much oil, gas, and water was produced for each well on a monthly basis wellbook.auctions -- how much was paid for each parcel of land at auction wellbook.injections -- how much fluid and gas was injected into each well (for enhanced oil recovery and disposal purposes) wellbook.log_metadata -- metadata for each LAS well log file wellbook.log_readings -- sensor readings for each depth step in all LAS well log files
wellbook.log_key -- map of log mnemonics to their descriptions wellbook.formations -- manually annotated map of well depths to rock formations wellbook.formations_key -- Descriptions of rock formations
wellbook.water_sites -- metadata for water quality monitoring stations in North Dakota 2 Watch video to get started Automated Analysis of LAS Files 3 Join with Production / EOR / Auction data (Power BI) Get a 360-degree view of the well <Hive tables - Master> a. Predictive Analytics (Linear Regression) b. Visualize the data using Yarn Ready applications 4 Dynamic Well Logs Query for multiple mnemonic readings per well or multiple wells in a given region. Normalize and graph data for specific depth steps on the fly. 5 Dynamic Time Warping Run the algorithm per well or for all wells and all mnemonics and visualize the results to know what readings belong to the same curve class. Using supervised machine learning, enable automatic bucketing of mnemonics belonging to the same curve class. Build on your own Clone the git below and follow the steps in Readme to create your own demo. $ git clone https://github.com/vedantja/wellbook.git For more questions, please contact Vedant Jain. Special thanks to Randy Gelhausen and Ofer Mendelevitch for the work and help put into this.
... View more