Member since
04-14-2016
54
Posts
9
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
17552 | 06-27-2016 07:20 AM | |
1185 | 05-09-2016 10:10 AM |
07-11-2016
12:58 PM
Hello,
I try to create a job with a command oozie Sqoop got this error:
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
this is my xml file :
<workflow-app name="exemple_hive" xmlns="uri:oozie:workflow:0.5">
<global>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>DES</value>
</property>
</configuration>
</global>
<start to="sqoop-9fb3"/>
<kill name="Kill">
<message>L'action a échoué, message d'erreur[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-9fb3">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>sqoop import -Dmapred.job.queue.name=DES --connect "jdbc:jtds:sqlserver://xxxx.xxxx.xxxx.xxxx:xxxx;databaseName=xxxxxxxx;user=xxxxxxxx;password=xxxxxxxx;instance=MSPAREBTP02" --driver net.sourceforge.jtds.jdbc.Driver --username hdp-import --table qvol_ccy --hive-import --hive-table test.qvol_ccy -m 1</command>
<file>/dev/datalake/app/des/dev/lib/jtds-1.3.1.jar#jtds-1.3.1.jar</file>
<file>/dev/datalake/app/des/dev/script/hive-site.xml#hive-site.xml</file>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
07-01-2016
01:55 PM
Thank you.
Please is it possible to replace part-00000 by a file name I want? Example command.txt
... View more
07-01-2016
11:12 AM
Thank.
But it generates an error:
AttributeError: 'DataFrameWriter' object has no attribute 'text'
... View more
07-01-2016
08:35 AM
1 Kudo
Hello, I work with the spark dataframe please and I would like to know how to store the data of a dataframe in a text file in the hdfs. I tried with saveAsTextfile () but it does not workthank you
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
06-30-2016
09:57 AM
Thank you . it works well .
But this is performed in local Initiation and When I ' execute on cluster with the command:
spark-submit --master yarn-cluster --py-files hdfs:///dev/datalake/app/des/dev/script/lastloader.py --queue DES hdfs:///dev/datalake/app/des/dev/script/return.py
it generates this error in the logs : Log Type: stdout
Log Upload Time: Thu Jun 30 09:19:20 +0200 2016
Log Length: 3254
Traceback (most recent call last):
File "return.py", line 10, in <module>
df = Lastloader()
File "/DATA/fs6/hadoop/yarn/local/usercache/atsafack/appcache/application_1465374541433_9209/container_e52_1465374541433_9209_02_000001/__pyfiles__/lastloader.py", line 13, in Lastloader
qvol1 = hive_context.table("lake_des_statarbmfvol.qvol_bbg_closes")
File "/DATA/fs6/hadoop/yarn/local/usercache/atsafack/appcache/application_1465374541433_9209/container_e52_1465374541433_9209_02_000001/pyspark.zip/pyspark/sql/context.py", line 565, in table
File "/DATA/fs6/hadoop/yarn/local/usercache/atsafack/appcache/application_1465374541433_9209/container_e52_1465374541433_9209_02_000001/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/DATA/fs6/hadoop/yarn/local/usercache/atsafack/appcache/application_1465374541433_9209/container_e52_1465374541433_9209_02_000001/pyspark.zip/pyspark/sql/utils.py", line 36, in deco
File "/DATA/fs6/hadoop/yarn/local/usercache/atsafack/appcache/application_1465374541433_9209/container_e52_1465374541433_9209_02_000001/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o53.table.
: org.apache.spark.sql.catalyst.analysis.NoSuchTableException
at org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:123)
at org.apache.spark.sql.hive.client.ClientInterface$$anonfun$getTable$1.apply(ClientInterface.scala:123)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:123)
at org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:61)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:406)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:410)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:203)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:203)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:410)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:739)
at org.apache.spark.sql.SQLContext.table(SQLContext.scala:735)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207) Can you help me please?
Best regards
... View more
06-30-2016
07:17 AM
Hello ! I use PySpark
... View more
06-29-2016
07:43 AM
Hello,
I tried but I get an error that is: code with "r" like parameter: df=hive_context.sql(s"select c.`date`, c.blglast from qvol1_temp as c join qvol2_temp as uv on c.udl_id = uv.udl_id where uv.ric =$r and c.`date` >= '2016-06-13 00:00:00' and c.`date` <= '2016-06-17 00:00:00' and c.adj_split = False") error: SyntaxError: invalid syntax
... View more
06-28-2016
11:31 AM
For example df= HiveContext.sql("SELECT * FROM src WHERE col1 = ${VAL1}") Thank
... View more
Labels:
- Labels:
-
Apache Hive
06-28-2016
06:42 AM
Hello Paul Hargis Here is the command that I run with the parameter --files but it generates me an error: bash-4.1$ spark-submit --master yarn-cluster --queue DES --files hdfs://dev/datalake/app/des/dev/script/return.py Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output My cordial Thanks
... View more
06-27-2016
02:04 PM
Thank you. I managed to run it. Except that my file is local and when I specify the path of a file on the cluster, I receive an error: bash-4.1$ spark-submit --master yarn-client --queue DES hdfs:///dev/datalake/app/des/dev/script/return.py
Error: Only local python files are supported: Parsed arguments:
master yarn-client
deployMode client
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile /usr/hdp/current/spark-client/conf/spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath /usr/hdp/current/share/lzo/0.6.0/lib/hadoop-lzo-0.6.0.jar:/usr/local/jdk-hadoop/ojdbc7.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/metrics-core-2.2.0.jar
driverExtraLibraryPath /usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/
driverExtraJavaOptions null
supervise false
queue DES
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource hdfs:///dev/datalake/app/des/dev/script/return.py
name return.py
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose false
... View more
06-27-2016
12:11 PM
Thank you. But I've already done this step and I needed to handle multiple files.
Currently this is solved thank you
... View more
06-27-2016
07:20 AM
Hello ! Thank you very much for your suggestions.
These methods have worked and I found another very suitable method is that of using the dataframe . Cordially
... View more
06-27-2016
07:16 AM
Hello,
I want to know is how can I run a python script that contains commands spark ? Here is my python script that I would run into a python environment : #!/usr/bin/python2.7 from pyspark.sql import HiveContext from pyspark import SparkContext from pandas.DataFrame.ix import DataFrame as df hive_context = HiveContext(sc) qvol1 = hive_context.table("table") qvol2 = hive_context.table("table") qvol1.registerTempTable("qvol1_temp") qvol2.registerTempTable("qvol2_temp") df=hive_context.sql("request") df.show()
... View more
Labels:
- Labels:
-
Apache Spark
06-24-2016
07:50 AM
2 Kudos
Hello,
Please I want to read a hive table from a python script.
Can you help me please?
My cordial thanks
... View more
Labels:
- Labels:
-
Apache Hive
06-22-2016
01:54 PM
Thank you. but I would go directly from the csv file to the hive orc table format without creating the textfile data. Thank
... View more
06-22-2016
01:39 PM
Hello, Is it possible to import data from a CSV file into a hive table the orc format? Thank
... View more
Labels:
- Labels:
-
Apache Hive
06-21-2016
08:30 AM
Hello,
Thank you for the directive. But I 'm new to the dataframe and what I try to do is be able to make it to retrieve the values of the indices i and i + 1 for example.
Best regards
... View more
06-14-2016
08:43 AM
Hello,
Please I will like to iterate and perform calculations accumulated in a column of my dataframe but I can not. Can you help me?
Thank you Here the creation of my dataframe. I would like to calculate an accumulated blglast the column and stored in a new column from pyspark.sql import HiveContext
from pyspark import SparkContext
from pandas import DataFrame as df
sc =SparkContext()
hive_context = HiveContext(sc)
tab = hive_context.table("table")
tab.registerTempTable("tab_temp")
df=hive_context.sql("SELECT blglast FROM tab_temp AS b limit 50") df.show()
... View more
Labels:
- Labels:
-
Apache Spark
06-13-2016
11:28 AM
Thank you. By adding the attribute --map-column-hive Date=Timestamp to Sqoop everything works.
... View more
06-10-2016
02:38 PM
Thanks. But it creates another error : "Hive does not support the SQL type for column date" Thanks
... View more
06-09-2016
03:38 PM
Hello, I have a smalldatetime data in my database server sql data and when I matter with Sqoop , this data is stored in the Hive String format because smalldatetime does not exist in the hive tool. This becomes probmématique for my work.
Anyone know if there a way to import through Sqoop , smallldatetime a data type of sql server and store the timestamp format recognized by Hive . thanks
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
Apache Sqoop
06-08-2016
08:42 AM
Hello, thank you. Here it goes. And i also found parquet file. Currently I am also looking to save as csv file and text if possible. Cordially
... View more
06-07-2016
07:49 AM
I tried with hive_context.write.format("orc").save("test_orc") but I receive this error:
>>> hive_context.write.format("orc").save("hdfs://dev/datalake/app/des/dev/transformer/test_orc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'HiveContext' object has no attribute 'write' Thanks
... View more
06-07-2016
07:36 AM
Thank you. But here are the errors generated by the two attributes saveAsHadoopFile and .write.format: This means that these two attributes are not recognized by HiveContext. Thank you !!!
... View more
06-07-2016
07:05 AM
1 Kudo
Hello I work with Hive- Context to load and manipulate data in my orc format. I
would now please know how to save in the hdfs file the results of a sql queries ? Help me please? Here is my Hive-Context code, I would like to save the contents of hive_context in a file on my hdfs : Thanks you in advance from pyspark.sql import HiveContext from pyspark import SparkContext
sc =SparkContext()
hive_context = HiveContext(sc) qvol = hive_context.table("<bdd_name>.<table_name>") qvol.registerTempTable("qvol_temp") hive_context.sql("select * from qvol_temp limit 10").show()
... View more
Labels:
- Labels:
-
Apache Hive
06-03-2016
07:07 AM
I created a query view with hive in Hue and I would like to modify its definition without deleting . Can you help me please
... View more
Labels:
- Labels:
-
Apache Hive
-
Cloudera Hue
05-23-2016
09:15 AM
Thank you for your suggestions
... View more
05-23-2016
09:14 AM
Hi !!! Thank you. But I found what to do . I just had to add in my script, --split-by "colonne_id"
... View more