Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4062 | 08-20-2018 08:26 PM | |
| 1957 | 08-15-2018 01:59 PM | |
| 2382 | 08-13-2018 02:20 PM | |
| 4124 | 07-23-2018 04:37 PM | |
| 5041 | 07-19-2018 12:52 PM |
06-11-2018
03:24 PM
".c This is file counter which means the number of files that have been written in the past for this specific partition". The schema is stored in hive metastore. If want native parquet files with schema, why not store on hdfs and create hive external table?
... View more
06-11-2018
02:24 PM
Please run a describe on the hive table. If it shows data storage as parquet, then you're good. More info on describe here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe
... View more
06-11-2018
05:38 AM
yes that is it. Basically inside the iterator you would create a large insert statement. INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy'); Your column names can come from the dataframe and the values are from the dataframe it self. therefore nothing is hard coded and you can reuse this code for virtually any database which uses ansi sql inserts.
... View more
06-10-2018
03:33 AM
Try a map over partition and have each partition write several hundred/thousands of rows to pg.
... View more
06-10-2018
03:29 AM
Do you have this in your pom? >>>>> <dependency>
>>>>> <groupId>org.apache.phoenix</groupId>
>>>>> <artifactId>phoenix-core</artifactId>
>>>>> <version>4.1.0</version>
>>>>> </dependency>
... View more
06-10-2018
03:25 AM
"HDFS /tmp directory mainly used as a temporary storage during mapreduce operation. Mapreduce artifacts, intermediate data etc will be kept under this directory." Also this question has already been answered here: https://community.hortonworks.com/questions/120790/is-it-possible-to-set-the-hadooptmpdir-value-to-hd.html
... View more
06-10-2018
03:13 AM
I found a similar question here: https://stackoverflow.com/questions/24805226/keeping-the-order-of-records-in-hive-collect Hope it helps
... View more
03-27-2017
04:01 PM
3 Kudos
There are many ways to validate a json file against a avro schema to verify all is kosher. Sharing a practice I have been using for few years. Objective - Validate avro schema well bound to the json file First you must have a avro schema and json file. From there download the latest a avro-tools jar. At the moment 1.8.1 is the latest avro-tools version jar available. Store the avro schema and json file in the same directory. Issue a wget to fetch the avro-tools jar wget http://www.us.apache.org/dist/avro/avro-1.8.1/java/avro-tools-1.8.1.jar Here is what the directory looks like Objective Details - Validate avro schema student.avsc binds to student.json How - Issue the following java -jar ./avro-tools-1.8.1.jar fromjson --schema-file YourSchemaFile.avsc YourJsonFile.json > AnyNameForYourBinaryAvro.avro Using the student files example: java -jar ./avro-tools-1.8.1.jar fromjson --schema-file student.avsc student.json > student.avro Validation passed, a avro binary was created. Now as a last step lets break something. Another avro schema (student2.avsc) is created which does not conform to student.json. Lets verify the avro-tools jar will fails to build a avro binary As you can see from above output the avro binary failed to create due to validation errors
... View more
03-24-2017
01:59 AM
@Sriharsha Chintalapani Thank you very much. Found the issue. My SASL port is 6668 not 6667
... View more