Member since
12-02-2019
19
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2988 | 04-08-2020 06:13 AM | |
| 2716 | 03-06-2020 08:28 AM |
09-15-2021
05:05 AM
Hello,
For an application, I need to extract the maximum depth from an hdfs directory. I know how to do this action in shell: we can execute
find /tmp -type d -printf '%d\n' | sort -rn | head -1
So I wanted to do the same with the find function of hdfs:
hdfs dfs -find /tmp -type d
but the -type argument does not exist on hdfs, here is the error:
find: Unexpected argument: -type
Does anyone have any solution or advice for this problem ?
ps: my hadoop version Hadoop 2.6.0-cdh5.13.
regards,
thanks in advance
... View more
Labels:
- Labels:
-
Apache Sqoop
-
HDFS
04-08-2020
06:13 AM
1 Kudo
Hi, after some researches i have find a solution to this issues. The problem was from the Hive table definition for storing data. I was defining some properties of my table like this : hive.createTable("res_idos_0")
.ifNotExists()
.prop("serialization.encoding","UTF-8")
.prop("escape.delim" , "\t")
.column("t_date","TIMESTAMP") But when we are in writeStream and we use special characters, the use of property escape.delim is note supported and we can't save characters correctly. So, i have removed the property escape.delim in my hive table definition and i had also added this line in my code for being certain that file save in HDFS have the right encoding. System.setProperty("file.encoding", "UTF-8")
... View more
04-06-2020
04:17 AM
Hello,
I'm facing an issue with the display and storage of special charactere in hive.
I'm using spark for doing a WriteStream like this in Hive,
// Write result in hive
val query = trimmedDF.writeStream
//.format("console")
.format("com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource")
.outputMode("append")
.option("metastoreUri", metastoreUri)
.option("database", "dwh_prod")
.option("table", "res_idos_0")
.option("checkpointLocation", "/tmp/idos_LVD_060420_0")
.queryName("test_final")
.option("truncate", "false")
.option("encoding", "UTF-8")
.start()
query.awaitTermination()
but when I have a special charactere Hive doesn't display it. I have already fixe encoding UTF8 in the hive table :
select distinct(analyte) from res_idos_0;
+--------------------------------------------+
| analyte |
+--------------------------------------------+
| D02 |
| E |
| E - Hauteur Int��rieure jupe - 6,75mm |
| Hauteur totale |
| Long tube apparent (embout 408 assembl��) |
| Side streaming - poids apr��s |
| Tenue tube plongeur |
| 1 dose - poids avant |
| Diam��tre 1er joint de sertissage |
| HDS - Saillie Point Mort Bas |
| P - Epaisseur tourette P5 - 0,51mm |
+--------------------------------------------+
But if I display the data in console with writeStream the special chararacter are correctly display or if I use write fonction for write in hive like this:
final_DF.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
.mode("overwrite")
.option("table","dwh_prod.result_idos_lims3")
.save()
The charactere are correctly display in hive
+-------------------------------------------+
| analyte |
+-------------------------------------------+
| 1 dose |
| 1 dose (moyenne) - Kinf |
| 1 dose (écart type) |
| 1 dose - poids avant |
| 1 dose individuelle (maxi) |
| 1,00mm |
| 1,3,5-trioxane |
I use spark 2.3.2 an hive 3.1.0
Those anyone face this issue or have clue or a solution for me.
Thanks in advance,
Best Regards
... View more
Labels:
03-06-2020
08:28 AM
1 Kudo
Hello @pal_1990, I think your input is something like this: +----------------------------------------------------+
| semicolon.a |
+----------------------------------------------------+
| 1;13004211,13004211_02_13004212,4000000003378605589,1105,2000 |
+----------------------------------------------------+ 1 . You need to separate the one from other values, for this I use posexplode fonction: select pe.i,pe.x from semicolon lateral view posexplode(split(a,';')) pe as i,x;
+-------+----------------------------------------------------+
| pe.i | pe.x |
+-------+----------------------------------------------------+
| 0 | 1 |
| 1 | 13004211,13004211_02_13004212,4000000003378605589,1105,2000 |
+-------+----------------------------------------------------+ 2. You on only select where pe.i =1: select t.x from
(select pe.i,pe.x
from semicolon lateral view posexplode(split(a,';')) pe as i,x) t where t.i=1 ; +----------------------------------------------------+ | t.x | +----------------------------------------------------+ | 13004211,13004211_02_13004212,4000000003378605589,1105,2000 | +----------------------------------------------------+
3. You split values in columns; select split(t.x,',')[0] as col1,
split(t.x,',')[1] as col2,
split(t.x,',')[2] as col3,
split(t.x,',')[3] as col4,
split(t.x,',')[4] as col5
from
(select pe.i,pe.x
from semicolon lateral view posexplode(split(a,';')) pe as i,x) t where t.i=1 ; +-----------+-----------------------+----------------------+-------+-------+ | col1 | col2 | col3 | col4 | col5 | +-----------+-----------------------+----------------------+-------+-------+ | 13004211 | 13004211_02_13004212 | 4000000003378605589 | 1105 | 2000 | +-----------+-----------------------+----------------------+-------+-------+ I hope it will help you. Best regards
... View more