Member since
09-24-2015
527
Posts
136
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2267 | 06-30-2017 03:15 PM | |
3277 | 10-14-2016 10:08 AM | |
8557 | 09-07-2016 06:04 AM | |
10333 | 08-26-2016 11:27 AM | |
1530 | 08-23-2016 02:09 PM |
10-06-2016
09:17 AM
Hi: Why is to slow the read.json method in sqlcontext??? ima trying read from hdfs 8gb df1 = sqlContext.read.json("hdfs://xxxx:8020/tmp/file.json") thanks
... View more
Labels:
- Labels:
-
Apache Spark
09-21-2016
03:20 PM
but, i want 1 agent peer machine, i mean host1 agent 1 and host 2 agent2 , but i cant change it from ambari. All the configuration affect all agents
... View more
09-21-2016
03:00 PM
Hi: from Ambari, i cant configure 2 diferents fluem agent with diferents configuration i mean, when i change the config file from one agent, this file change in bouth agent. Please any help??
... View more
Labels:
09-13-2016
07:28 PM
Hi: I am trying executing the h2o cluster but when i execute this command, the shell start correctly but the h20.scala program doestn start why? /sparkling-shell --num-executors 5 --executor-memory 3g --driver-memory 2g --master yarn-client -i h2o.scala
the h2o program is: import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
import h2oContext._
please what happend?? thanks
... View more
Labels:
- Labels:
-
Apache Spark
09-07-2016
12:27 PM
thanks, so remember when one client(producer and consumer need to conect to a remote cluster, we need to open the ports from the servers, and also use the same version of kafka client and server) 🙂
... View more
08-29-2016
01:03 PM
1 Kudo
Hi: I have executing this code public class productor {
public static void main(String[] args) {
Properties properties = new Properties();
properties.put("metadata.broker.list", "xxxxxx:6667");
properties.put("bootstrap.servers", "xxxxxx:6667");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");// key serializer
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //value serializer
properties.put("acks", "1"); //message durability -- 1 mean ack after writing to leader is success. value of "all" means ack after replication.
properties.put("security.protocol", "PLAINTEXT"); // Security protocol to use for communication.
properties.put("batch.size", "16384");// maximum size of message
KafkaProducer<String,String> producer = new KafkaProducer<String, String>(properties);
try {
for (int i = 0; i < 10; i++) {
// send lots of messages
producer.send(new ProducerRecord<String, String>("RSI","sancho"+i));
System.out.println("RSI"+ " sancho"+i);
}
} catch (Throwable throwable) {
System.out.println(throwable.getMessage());
} finally {
producer.close();
}
}
and receiving this error, what is happening?? run:
[2016-08-29 15:05:09,552] ERROR Uncaught error in kafka producer I/O thread: (org.apache.kafka.clients.producer.internals.Sender:136)
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'brokers': Error reading field 'host': Error reading string of length 27758, only 169 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Kafka
08-26-2016
11:27 AM
1 Kudo
Hi: i resolved like that: pyspark --master yarn --deploy-mode client --num-executors 5 --executor-cores 1 --executor-memory 1G --jars ./spark-csv_2.11-1.4.0.jar --jars ./commons-csv-1.4.jar --jars ./univocity-parsers-2.2.1.jar
... View more
08-26-2016
09:15 AM
hi; i have behind proxy, so there is any configuration special from spark files??
... View more
08-26-2016
08:49 AM
Hi: Iam trying to use the com.databricks.spark.csv class, but doesnt work, iam behind proxy, so how can i donwload: pyspark --master yarn --deploy-mode client --num-executors 5 --executor-cores 1 --executor-memory 1G --jars ./spark-csv_2.11-1.4.0.jar --jars ./commons-csv-1.4.jar also doesnt work like that: pyspark --master yarn --deploy-mode client --num-executors 5 --executor-cores 1 --executor-memory 1G --packages com.databricks:spark-csv_2.11-1.4.0 so, any suggestions?? Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/2.4.0.0-169/spark/python/pyspark/sql/readwriter.py", line 137, in load
return self._df(self._jreader.load(path))
File "/usr/hdp/2.4.0.0-169/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/2.4.0.0-169/spark/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/usr/hdp/2.4.0.0-169/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.csv.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at scala.util.Try.orElse(Try.scala:82)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
... 14 more
>>> :: resolution report :: resolve 252496ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: com.databricks#spark-csv_2.10;1.3.0
==== local-m2-cache: tried
file:/root/.m2/repository/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom
-- artifact com.databricks#spark-csv_2.10;1.3.0!spark-csv_2.10.jar:
file:/root/.m2/repository/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar
==== local-ivy-cache: tried
/root/.ivy2/local/com.databricks/spark-csv_2.10/1.3.0/ivys/ivy.xml
==== central: tried
https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom
-- artifact com.databricks#spark-csv_2.10;1.3.0!spark-csv_2.10.jar:
https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar
==== spark-packages: tried
http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom
-- artifact com.databricks#spark-csv_2.10;1.3.0!spark-csv_2.10.jar:
http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: com.databricks#spark-csv_2.10;1.3.0: not found
::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom (java.net.ConnectException: Connection timed out)
Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar (java.net.ConnectException: Connection timed out)
Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom (java.net.ConnectException: Connection timed out)
Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar (java.net.ConnectException: Connection timed out)
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.databricks#spark-csv_2.10;1.3.0: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1068)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:287)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
... View more
Labels:
- Labels:
-
Apache Spark