Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13343 | 02-20-2018 12:33 PM | |
1499 | 02-19-2018 05:12 AM | |
1858 | 12-28-2017 06:13 AM | |
7135 | 09-28-2017 09:25 AM | |
12162 | 09-25-2017 11:19 AM |
09-08-2024
10:36 PM
With the Hive (newer than Hive 2.2), you can use Merge INTO MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET
target.name = source.name,
target.age = source.age
WHEN NOT MATCHED THEN
INSERT (id, name, age)
VALUES (source.id, source.name, source.age);
... View more
11-23-2023
02:54 PM
CREATE EXTERNAL TABLE dwsimp.dim_agrupamento ( id INT, agrupamento_nome STRING, agrupamento_ordem INT, dim_relatorio_id INT, agrupamento_campo STRING ) STORED AS ORC TBLPROPERTIES (org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler, mapred.jdbc.driver.class = "oracle.jdbc.OracleDriver", mapred.jdbc.url = "jdbc:oracle:thin:@//jdbc:oracle:thin:@//host:port/servicename", mapred.jdbc.username = "user", mapred.jdbc.password= "password", mapred.jdbc.input.table.name="JDBCTable", mapred.jdbc.output.table.name="JDBCTable", mapred.jdbc.hive.lazy.split"= "false"); Error: Error while compiling statement: FAILED: ParseException line 10:2 cannot recognize input near 'org' '.' 'apache' in table properties list (state=42000,code=40000)
... View more
06-02-2023
08:59 AM
how can I check for the largest file in a certain tenant in hadoop?
... View more
06-22-2022
08:18 AM
I know its a old post. But I am getting the same error. In my case it working fine with CDH 5.13 buth then we upgraded our cluster to CDH 6.3.4. It is not working now. I am getting the same error as mentioned above while trying to add columns to a hive table with MultiDelimiterSerde.
... View more
06-02-2022
06:59 PM
have u tried move out (or delete the folder for that partition ) from hdfs, then run: msck repair table <tablename>
... View more
11-26-2021
04:28 AM
@nikkie_thomas You can set below if you are using Tez set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; set hive.merge.smallfiles.avgsize=<some value>; set hive.merge.size.per.task=<some value>; set hive.merge.tezfiles=true;
... View more
03-03-2021
09:51 PM
I have a tab separated file like this Copyright details 2021 ID \t NUMBER \t ADDRESS \t ZIPCODE 10 \t 9877777777 \t India \t 400322 13 \t 2983359852 \t AUS \t 84534 26 \t 7832724424 34 \t 8238444294 \t RSA \t 74363 Here the first row is a comment and the row with ID 26 doesn't have ending columns values. Even it doesn't have \t at the end . So I need to read file skipping first line and handle missing delimiters at end. I tried this import org.apache.spark.sql.DataFrame val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val data = sc.textFile("sample_badtsv.txt") val comments = data.first() val fdata = data.filter(x => x != comments) val header = fdata.filter(x => x.split("\t")(1) == "NUMBER").collect().mkString val df = fdata.filter(x => x.split("\t")(1) != "NUMBER") .map(x => x.split("\t")) .map(x => (x(0),x(1),x(2),x(3))) .toDF(header.split("\t"): _*) Since I have missing \t at the end of lines if empty, I am getting ArrayIndexoutofBoundsException. because when converting the rdd to dataframe we have less records for some rows. Please provide me a better solution so that I can skip first line and read the file correctly (even there are no \t the code needs to consider it as NULL values at the end like below) ID NUMBER ADDRESS ZIPCODE 10 9877777777 India 400322 13 2983359852 AUS 84534 26 7832724424 NULL NULL 34 8238444294 RSA 74363
... View more
01-14-2021
11:54 PM
Hi ravikirandasar1, I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?
... View more
07-28-2020
11:31 PM
"I highly recommend skimming quickly over following slides, specially starting from slide 7. http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey" This slide is not there at the path
... View more
04-20-2020
09:47 AM
hdfs dfs -ls -R <directory> |grep part-r* |awk '{print $8}' |xargs hdfs dfs -cat | wc -l
... View more