About balavignesh_nag

zzeng · ‎09-08-2024

With the Hive (newer than Hive 2.2), you can use Merge INTO MERGE INTO target_table AS target USING source_table AS source ON target.id = source.id WHEN MATCHED THEN UPDATE SET target.name = source.name, target.age = source.age WHEN NOT MATCHED THEN INSERT (id, name, age) VALUES (source.id, source.name, source.age);

ialmeida · ‎11-23-2023

CREATE EXTERNAL TABLE dwsimp.dim_agrupamento ( id INT, agrupamento_nome STRING, agrupamento_ordem INT, dim_relatorio_id INT, agrupamento_campo STRING ) STORED AS ORC TBLPROPERTIES (org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler, mapred.jdbc.driver.class = "oracle.jdbc.OracleDriver", mapred.jdbc.url = "jdbc:oracle:thin:@//jdbc:oracle:thin:@//host:port/servicename", mapred.jdbc.username = "user", mapred.jdbc.password= "password", mapred.jdbc.input.table.name="JDBCTable", mapred.jdbc.output.table.name="JDBCTable", mapred.jdbc.hive.lazy.split"= "false"); Error: Error while compiling statement: FAILED: ParseException line 10:2 cannot recognize input near 'org' '.' 'apache' in table properties list (state=42000,code=40000)

ale_na · ‎06-02-2023

how can I check for the largest file in a certain tenant in hadoop?

satyanayak · ‎06-22-2022

I know its a old post. But I am getting the same error. In my case it working fine with CDH 5.13 buth then we upgraded our cluster to CDH 6.3.4. It is not working now. I am getting the same error as mentioned above while trying to add columns to a hive table with MultiDelimiterSerde.

QiDam · ‎06-02-2022

have u tried move out (or delete the folder for that partition ) from hdfs, then run: msck repair table <tablename>

asish · ‎11-26-2021

@nikkie_thomas You can set below if you are using Tez set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; set hive.merge.smallfiles.avgsize=<some value>; set hive.merge.size.per.task=<some value>; set hive.merge.tezfiles=true;

Vijay1997 · ‎03-03-2021

I have a tab separated file like this Copyright details 2021 ID \t NUMBER \t ADDRESS \t ZIPCODE 10 \t 9877777777 \t India \t 400322 13 \t 2983359852 \t AUS \t 84534 26 \t 7832724424 34 \t 8238444294 \t RSA \t 74363 Here the first row is a comment and the row with ID 26 doesn't have ending columns values. Even it doesn't have \t at the end . So I need to read file skipping first line and handle missing delimiters at end. I tried this import org.apache.spark.sql.DataFrame val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val data = sc.textFile("sample_badtsv.txt") val comments = data.first() val fdata = data.filter(x => x != comments) val header = fdata.filter(x => x.split("\t")(1) == "NUMBER").collect().mkString val df = fdata.filter(x => x.split("\t")(1) != "NUMBER") .map(x => x.split("\t")) .map(x => (x(0),x(1),x(2),x(3))) .toDF(header.split("\t"): _*) Since I have missing \t at the end of lines if empty, I am getting ArrayIndexoutofBoundsException. because when converting the rdd to dataframe we have less records for some rows. Please provide me a better solution so that I can skip first line and read the file correctly (even there are no \t the code needs to consider it as NULL values at the end like below) ID NUMBER ADDRESS ZIPCODE 10 9877777777 India 400322 13 2983359852 AUS 84534 26 7832724424 NULL NULL 34 8238444294 RSA 74363

Amoli · ‎01-14-2021

Hi ravikirandasar1, I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?

BharatTiwari9 · ‎07-28-2020

"I highly recommend skimming quickly over following slides, specially starting from slide 7. http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey" This slide is not there at the path

Ram303 · ‎04-20-2020

hdfs dfs -ls -R <directory> |grep part-r* |awk '{print $8}' |xargs hdfs dfs -cat | wc -l

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Re: update data in Hive using join

Re: Hive - Create external table on database (MySQ...

Re: HDFS - Dropping HIVE table is not freeing up m...

Re: Alter external hive table fails

Re: How to delete a particular Partition (Timestam...

Re: Hive Multiple Small Files

Removing first line from CSV file and handle missi...

Re: what is the best way to get ftp file to hdfs c...

Re: Difference between mr and Tez?

Re: How to find no of lines in all the files in a ...