About sunile_manjee

sunile_manjee · ‎06-11-2018

".c This is file counter which means the number of files that have been written in the past for this specific partition". The schema is stored in hive metastore. If want native parquet files with schema, why not store on hdfs and create hive external table?

sunile_manjee · ‎06-11-2018

Please run a describe on the hive table. If it shows data storage as parquet, then you're good. More info on describe here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe

sunile_manjee · ‎06-11-2018

yes that is it. Basically inside the iterator you would create a large insert statement. INSERT INTO films (code, title, did, date_prod, kind) VALUES ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'), ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy'); Your column names can come from the dataframe and the values are from the dataframe it self. therefore nothing is hard coded and you can reuse this code for virtually any database which uses ansi sql inserts.

sunile_manjee · ‎06-10-2018

Try a map over partition and have each partition write several hundred/thousands of rows to pg.

sunile_manjee · ‎06-10-2018

Do you have this in your pom? >>>>> <dependency> >>>>> <groupId>org.apache.phoenix</groupId> >>>>> <artifactId>phoenix-core</artifactId> >>>>> <version>4.1.0</version> >>>>> </dependency>

sunile_manjee · ‎06-10-2018

"HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory." Also this question has already been answered here: https://community.hortonworks.com/questions/120790/is-it-possible-to-set-the-hadooptmpdir-value-to-hd.html

sunile_manjee · ‎06-10-2018

I found a similar question here: https://stackoverflow.com/questions/24805226/keeping-the-order-of-records-in-hive-collect Hope it helps

sunile_manjee · ‎04-06-2017

No. i just started all over

sunile_manjee · ‎03-27-2017

There are many ways to validate a json file against a avro schema to verify all is kosher. Sharing a practice I have been using for few years. Objective - Validate avro schema well bound to the json file First you must have a avro schema and json file. From there download the latest a avro-tools jar. At the moment 1.8.1 is the latest avro-tools version jar available. Store the avro schema and json file in the same directory. Issue a wget to fetch the avro-tools jar wget http://www.us.apache.org/dist/avro/avro-1.8.1/java/avro-tools-1.8.1.jar Here is what the directory looks like Objective Details - Validate avro schema student.avsc binds to student.json How - Issue the following java -jar ./avro-tools-1.8.1.jar fromjson --schema-file YourSchemaFile.avsc YourJsonFile.json > AnyNameForYourBinaryAvro.avro Using the student files example: java -jar ./avro-tools-1.8.1.jar fromjson --schema-file student.avsc student.json > student.avro Validation passed, a avro binary was created. Now as a last step lets break something. Another avro schema (student2.avsc) is created which does not conform to student.json. Lets verify the avro-tools jar will fails to build a avro binary As you can see from above output the avro binary failed to create due to validation errors

sunile_manjee · ‎03-24-2017

@Sriharsha Chintalapani Thank you very much. Found the issue. My SASL port is 6668 not 6667

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Write dataframe into parquet hive table ended ...

Re: Write dataframe into parquet hive table ended ...

Re: Why does write.mode("append") cause spark to c...

Re: Why does write.mode("append") cause spark to c...

Re: phoenix java API issues

Re: QQ: What needs to be done to write local /tmp ...

Re: Hive on Tez: How to order an array column?

Re: unable to start postgres for ambari

Validating avro schema and json file

Re: Kafka GSSAPI error