About RahulSoni

RahulSoni · ‎03-26-2018

@Adithya Sajjanam sys_extract_utc is not a valid Hive function. You are translating your Oracle query as is and that's why getting this error. You can refer to the Hive documentation for reference to Date functions in hive and translate your query from Oracle syntax to Hive QL accordingly.

RahulSoni · ‎03-25-2018

@Sanjay Gurnani Looking at your issue in detail, seems you are facing problem committing the data when the there is no data. Otherwise, when you have data, it is working fine. I am not sure if that is a bug since you are able to persist the data with other formats but I will check the source code to see why this peculiar behavior happen only to ORC. In the meantime to get your code working, I would recommend checking if your dataset has some records before proceeding with the commit to the sink. This way you can escape having empty file issue, if you are getting any with Parquet etc, where the writeStream method is working.

RahulSoni · ‎03-25-2018

@Cathy Liu The table you are trying to create already exists 🙂 Either use the existing table or drop the table and recreate it if you want it afresh! Let know if you need any other help.

RahulSoni · ‎03-25-2018

Yes, the schema is available in the avro data file and you have to extract it to pass to the hive ddl. Hive ddl expects either the schema path or the schema literal to establish the schema in metastore

RahulSoni · ‎03-25-2018

@Kareem Amin There are two types of files when we talk about avro. Avro files - which have the data avsc files - avro schema files Looks like you have the avro data files but not the avro schema. Follows the steps which will help you to get the avro schema, the avsc files, from your data files and create table on top of them. //Take a few lines from your avro file hdfs dfs -cat <your avro file name> | head --bytes 10K > $SAMPLE_FILE //Extract the avro schema from your avro data file java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema $SAMPLE_FILE > $AVRO_SCHEMA_FILE //Upload the schema to hdfs hdfs dfs -put $AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR //Create the hive table using avro schema CREATE EXTERNAL TABLE sampe_table STORED AS AVRO LOCATION 'hdfs:///user/hive/' TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>'); PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table. Let know if that works for you.

RahulSoni · ‎03-24-2018

@Christian Lunesa If you are talking about a Sqoop Import, that's the only tag with your question :), it is always highly recommended to use an Integral column as split-by. But since you have only string/varchar columns in your data source, you can try following options based on the data that you have. add surrogate int PK and use it also as a split or split your data manually using a custom query with WHERE clause and run sqoop few times with num-mappers=1, or apply some deterministic Integer non-aggregation function to you varchar column, for example cast(substr(...) as int) as split-column. Let know if you need any other help!

RahulSoni · ‎03-24-2018

@Santanu Ghosh This is a typo in the practice exam questions! You are not doing anything wrong 🙂 Keep the spirits up and all the best for your exam. Let know if you need any other help!

RahulSoni · ‎03-24-2018

@Vincent van Oudenhoven Any specific reason for using ExecuteStream Command for this use case of yours? I will recommend using ExecuteScript or InvokeScript Processor and you can perform all the aforementioned operations from your question! For example, a very beginners example can be the following script which reads a file and create an empty file with all its attributes using ExecuteScript Processor. flowFile = session.get() attrMap = flowFile.getAttributes() session.remove(flowFile) newflowFile = session.create() newflowFile = session.putAllAttributes(newflowFile, attrMap) session.transfer(newflowFile, REL_SUCCESS) Or this groovy script in an ExecuteScript processor which can read the content of your flow files and accordingly redirect them to downstream connections. import org.apache.commons.io.IOUtils import java.nio.charset.StandardCharsets flowFile = session.get() if(!flowFile)return def text = '' def storeID = 0 session.read(flowFile, {inputStream -> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) storeID = text.tokenize("|")[2] } as InputStreamCallback) if(storeID >=1 && storeID <= 10) session.transfer(flowFile, REL_SUCCESS) else (storeID >10 && storeID <= 20) session.transfer(flowFile, REL_FAILURE) You can have an external script executed using ExecuteStream command also but why maintain any code outside when the inbuilt flow file handling logic in a processor like ExecuteScript processor can help you achieve the functionality more easily.

RahulSoni · ‎03-23-2018

@Shantanu kumar You can use the following flow. SplitText -> ExtractText -> RouteOnAttribute Follows a small description of what these processors should achieve. 1. SplitText - Same as the detailed flow above. Split each line into a new file. 2. ExtractText - Create a new attribute with the value of your StoreID column. Create a regex which will read your data, fetch the StoreID column from nth position and create StoreID attribute out of it. 3. RouteOnAttribute - Use expression language here for redirecting your flow files. For example ${StoreID:ge(1):and(${StoreID:le(10)}} //Route to processor handline store 1 to 10${StoreID:ge(11):and(${StoreID:le(20)}} //Route to processor handline store 11 to 20 And so on. This should redirect your data per your need.

RahulSoni · ‎03-23-2018

Sorry but can you please put the pattern a bit more clearly? A bit hard to understand in the format you mentioned.

Online	Offline
Last Visited	‎10-08-2020 11:27 AM

Member Since	‎08-03-2019 10:44 AM
Last Visited	‎10-08-2020 11:27 AM
Posts	186
Kudos received	33

Cloudera Community

Re: Hive / HBase migration - Different clusters

Re: Flowfiles are stuck in que/connection of Nifi

Re: Save dataframe with header in spark 1.6

Re: hive external table pointing to AVRO files

Re: sqoop 1.4.6.2.6.3.0-235 import failing

Re: HIve Error: parser exception cannot recognize ...

Re: Is there a issue with saving ORC data with Spa...

Re: FAILED: SemanticException org.apache.hadoop.hi...

Re: How to build Avro Hive table based on avro fil...

Re: How to build Avro Hive table based on avro fil...

Re: How do I split columns on table that has varch...

Re: HDPCD Practice Exam Task 10 error

Re: Can anyone provide an example of a python scri...

Re: Splitting Single file in to two file based on ...

Re: mismatched input 'AS' expecting RIGHT_PAREN in...