About pacosoplas

pacosoplas · ‎10-06-2016

Hi: Why is to slow the read.json method in sqlcontext??? ima trying read from hdfs 8gb df1 = sqlContext.read.json("hdfs://xxxx:8020/tmp/file.json") thanks

pacosoplas · ‎09-21-2016

but, i want 1 agent peer machine, i mean host1 agent 1 and host 2 agent2 , but i cant change it from ambari. All the configuration affect all agents

pacosoplas · ‎09-21-2016

Hi: from Ambari, i cant configure 2 diferents fluem agent with diferents configuration i mean, when i change the config file from one agent, this file change in bouth agent. Please any help??

pacosoplas · ‎09-13-2016

Hi: I am trying executing the h2o cluster but when i execute this command, the shell start correctly but the h20.scala program doestn start why? /sparkling-shell --num-executors 5 --executor-memory 3g --driver-memory 2g --master yarn-client -i h2o.scala the h2o program is: import org.apache.spark.h2o._ val h2oContext = new H2OContext(sc).start() import h2oContext._ please what happend?? thanks

pacosoplas · ‎09-07-2016

thanks, so remember when one client(producer and consumer need to conect to a remote cluster, we need to open the ports from the servers, and also use the same version of kafka client and server) 🙂

pacosoplas · ‎09-07-2016

Hi: I resolved the problem opening the port 6667. Regars

pacosoplas · ‎08-29-2016

Hi: I have executing this code public class productor { public static void main(String[] args) { Properties properties = new Properties(); properties.put("metadata.broker.list", "xxxxxx:6667"); properties.put("bootstrap.servers", "xxxxxx:6667"); properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");// key serializer properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //value serializer properties.put("acks", "1"); //message durability -- 1 mean ack after writing to leader is success. value of "all" means ack after replication. properties.put("security.protocol", "PLAINTEXT"); // Security protocol to use for communication. properties.put("batch.size", "16384");// maximum size of message KafkaProducer<String,String> producer = new KafkaProducer<String, String>(properties); try { for (int i = 0; i < 10; i++) { // send lots of messages producer.send(new ProducerRecord<String, String>("RSI","sancho"+i)); System.out.println("RSI"+ " sancho"+i); } } catch (Throwable throwable) { System.out.println(throwable.getMessage()); } finally { producer.close(); } } and receiving this error, what is happening?? run: [2016-08-29 15:05:09,552] ERROR Uncaught error in kafka producer I/O thread: (org.apache.kafka.clients.producer.internals.Sender:136) org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'brokers': Error reading field 'host': Error reading string of length 27758, only 169 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) at java.lang.Thread.run(Thread.java:745)

pacosoplas · ‎08-26-2016

Hi: i resolved like that: pyspark --master yarn --deploy-mode client --num-executors 5 --executor-cores 1 --executor-memory 1G --jars ./spark-csv_2.11-1.4.0.jar --jars ./commons-csv-1.4.jar --jars ./univocity-parsers-2.2.1.jar

pacosoplas · ‎08-26-2016

hi; i have behind proxy, so there is any configuration special from spark files??

pacosoplas · ‎08-26-2016

Hi: Iam trying to use the com.databricks.spark.csv class, but doesnt work, iam behind proxy, so how can i donwload: pyspark --master yarn --deploy-mode client --num-executors 5 --executor-cores 1 --executor-memory 1G --jars ./spark-csv_2.11-1.4.0.jar --jars ./commons-csv-1.4.jar also doesnt work like that: pyspark --master yarn --deploy-mode client --num-executors 5 --executor-cores 1 --executor-memory 1G --packages com.databricks:spark-csv_2.11-1.4.0 so, any suggestions?? Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/hdp/2.4.0.0-169/spark/python/pyspark/sql/readwriter.py", line 137, in load return self._df(self._jreader.load(path)) File "/usr/hdp/2.4.0.0-169/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/usr/hdp/2.4.0.0-169/spark/python/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/usr/hdp/2.4.0.0-169/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o45.load. : java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.databricks.spark.csv.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) at scala.util.Try.orElse(Try.scala:82) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) ... 14 more >>> :: resolution report :: resolve 252496ms :: artifacts dl 0ms :: modules in use: --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 0 | 0 | 0 || 0 | 0 | --------------------------------------------------------------------- :: problems summary :: :::: WARNINGS module not found: com.databricks#spark-csv_2.10;1.3.0 ==== local-m2-cache: tried file:/root/.m2/repository/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom -- artifact com.databricks#spark-csv_2.10;1.3.0!spark-csv_2.10.jar: file:/root/.m2/repository/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar ==== local-ivy-cache: tried /root/.ivy2/local/com.databricks/spark-csv_2.10/1.3.0/ivys/ivy.xml ==== central: tried https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom -- artifact com.databricks#spark-csv_2.10;1.3.0!spark-csv_2.10.jar: https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar ==== spark-packages: tried http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom -- artifact com.databricks#spark-csv_2.10;1.3.0!spark-csv_2.10.jar: http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar :::::::::::::::::::::::::::::::::::::::::::::: :: UNRESOLVED DEPENDENCIES :: :::::::::::::::::::::::::::::::::::::::::::::: :: com.databricks#spark-csv_2.10;1.3.0: not found :::::::::::::::::::::::::::::::::::::::::::::: :::: ERRORS Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom (java.net.ConnectException: Connection timed out) Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar (java.net.ConnectException: Connection timed out) Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.pom (java.net.ConnectException: Connection timed out) Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar (java.net.ConnectException: Connection timed out) :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.databricks#spark-csv_2.10;1.3.0: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1068) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:287) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Online	Offline
Last Visited	‎11-16-2019 11:43 AM

Member Since	‎09-24-2015 09:57 AM
Last Visited	‎11-16-2019 11:43 AM
Posts	527
Kudos received	136

Cloudera Community

Re: hdfs block corrupt

Re: MARIDB & MYSQL & HDP2.5

Re: kafka producer error I/O

Re: spark com.databricks.spark.csv doesnt work

Re: many alert after add new host from ambari

read json with sqlcontext very slow

Re: I cant configure 2 flume anget

I cant configure 2 flume anget

sparkling-shell and scala program doesnt start

Re: kafka producer error I/O

Re: kafka producer error I/O

kafka producer error I/O

Re: spark com.databricks.spark.csv doesnt work

Re: spark com.databricks.spark.csv doesnt work

spark com.databricks.spark.csv doesnt work