Member since
09-24-2015
178
Posts
113
Kudos Received
28
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1505 | 05-25-2016 02:39 AM | |
2121 | 05-03-2016 01:27 PM | |
409 | 04-26-2016 07:59 PM | |
10866 | 03-24-2016 04:10 PM | |
844 | 02-02-2016 11:50 PM |
01-20-2018
12:22 AM
Very helpful response Yolanda. One minor typo - In 1.a) the path is - /etc/yum.repos.d/ambari-hdp-*.repo For those who are not familiar with yum repo location. BTW - This solutuion works for me.
... View more
03-17-2017
11:34 PM
I am attempting to create and use a Phoenix table on HBase table that was originally created from Hive using HBaseStorageHandler. However, I am getting an error when selecting data from phoenix table.
Hive Table DDL create table MYTBL(
col1 string,
col2 int,
col3 int,
col4 string )
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key,
attr:col2,
attr:col3,
attr:col4")
TBLPROPERTIES("hbase.table.name" = "MYTBL");
Phoenix Table DDL CREATE TABLE "MYTBL" (
pk VARCHAR PRIMARY KEY,
"attr"."col2" INTEGER,
"attr"."col3" INTEGER,
"attr"."col4" VARCHAR ) Once both the tables are created, I insert the data into Hive table using - insert into table MYTBL values ("hive", 1, 2, "m"); At this point, the data is available in Hive table and underlying HBase table. HBase table shown below I can also insert data into Phoenix table and it shows up in underlying HBase table. upsert into "MYTBL" values ('phoenix', 3, 4, 'm+c'); One thing to note here is how the integer values are being stored for the data inserted through Phoenix. When I run a select query from Phoenix, it gives an error while parsing the integer field inserted from Hive -> HBase. Text version of the error below - p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #c33720}
span.s1 {font-variant-ligatures: no-common-ligatures}
span.Apple-tab-span {white-space:pre} 0: jdbc:phoenix:> select * from MYTBL; Error: ERROR 201 (22000): Illegal data. Expected length of at least 4 bytes, but had 2 (state=22000,code=201) java.sql.SQLException: ERROR 201 (22000): Illegal data. Expected length of at least 4 bytes, but had 2 at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:441) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) at org.apache.phoenix.schema.KeyValueSchema.next(KeyValueSchema.java:211) at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:165) at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:171) at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:175) at org.apache.phoenix.expression.ProjectedColumnExpression.evaluate(ProjectedColumnExpression.java:114) at org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:69) at org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608) at sqlline.Rows$Row.<init>(Rows.java:183) at sqlline.BufferedRows.<init>(BufferedRows.java:38) at sqlline.SqlLine.print(SqlLine.java:1650) at sqlline.Commands.execute(Commands.java:833) at sqlline.Commands.sql(Commands.java:732) at sqlline.SqlLine.dispatch(SqlLine.java:808) at sqlline.SqlLine.begin(SqlLine.java:681) at sqlline.SqlLine.start(SqlLine.java:398) at sqlline.SqlLine.main(SqlLine.java:292)
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
02-14-2017
06:54 PM
Is it possible to store the timestamp value in a column in Hive without the timezone part? For e.g. Oracle supports the timestamp value without a timezone attached to it - https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm#i1006760 The requirement simply is that store whatever value of the timestamp is given in a column. Currently, Hive automatically applies Day Light Savings adjustment based on the timezone value. Any inputs are appreciated. Thanks
... View more
Labels:
- Labels:
-
Apache Hive
02-09-2017
05:59 PM
Which timezone setting (OS/Hadoop Core/HDFS/Hive etc.) is used as the default timezone for Hive? Which configuration / property to change to use a different timezone?
... View more
Labels:
- Labels:
-
Apache Hive
01-10-2017
07:56 PM
1 Kudo
@Raj B I pondered over this when started using NiFi and realized that this is good feature; In fact almost necessary to support a real-life data flow scenario. Let us take a simple scenario - Lets say my group does risk evaluation in a bank and provides services to different LOB (consider credit & debit transaction groups only for this discussion) within the bank. Lets assume that the format of the transaction received from these two groups is exactly the same. However, the way data is received is different. While the Credit group places the data on a windows share, the Debit group requires the data to be read from their server via FTP. Now the data ingestion process of risk group, built on NiFi, will look something like this -
A process group to read the data from shared drive A process group to read the data using FTP Another process group to take the input from the above two process groups and apply some further processing like - split records into individual transactions, ensure all mandatory data elements are present (if not, route to error) and then do two things
A process group with flow to place the data on Kafka for the Storm topology to pick up and apply the model to evaluate risk Also, another process group with flow to store the data in HDFS for archival purpose to support audit, compliance requirement. Now as you can see you would need to be able to support multiple input ports and output ports to support this flow. Why cant we just place the entire flow? - Technically you can but wont that be messy, hard to manage, reduce the reusability drastically and make the overall flow less flexible / scalable. Hope this helps!
... View more
08-17-2016
03:21 PM
1 Kudo
I am using PutSyslog and need to pass in the content of the flow file. Q 1 - Is there a way to reference the content of flow file directly within the MessageBody field of the PutSyslog processor? Q 2 - If not, how do I add the content of the flow file as processor so that I can pass the attribute to the MessageBody?
... View more
Labels:
- Labels:
-
Apache NiFi
05-25-2016
02:39 AM
3 Kudos
@Sri Bandaru In ref to kerbeors, it is better to create hadoop accounts locally to avoid sending hadoop internal auth requests to AD and add to the AD load. Setting up hadoop accounts locally in a KDC and setting up one way trust between KDC and AD is the way to go.
... View more
05-18-2016
01:34 AM
+1 I am looking for the same information. Can someone also share the following pieces of information Sample code using Spark HBase Connector Link to latest documentation Is this GA yet?
... View more
05-03-2016
01:38 PM
1 Kudo
Fix is to delete that link manually and recreate it correctly. rm /usr/hdp/current/zookeeper-client/conf
ln -s /etc/zookeeper/2.3.2.0-2950/0 /usr/hdp/current/zookeeper-client/conf
... View more
05-03-2016
01:27 PM
4 Kudos
@Felix Karabanov
I recently experience something similar during an upgrade. I am not sure what the root cause within the software but here is what Ambari is expecting and work around that worked for me - Correct setup - The configuration directories are expected to be setup this way - Starting with /etc/zookeeper/conf [root@sandbox zookeeper]# ll /etc/zookeeper/conf
lrwxrwxrwx 1 root root 38 2015-10-27 14:31 /etc/zookeeper/conf -> /usr/hdp/current/zookeeper-client/conf
[root@sandbox zookeeper]# ll /usr/hdp/current/zookeeper-client/conf
lrwxrwxrwx 1 root root 29 2015-10-27 14:31 /usr/hdp/current/zookeeper-client/conf -> /etc/zookeeper/2.3.2.0-2950/0
In your case, the link /usr/hdp/current/zookeeper-client/conf probably points back to /etc/zookeeper/conf, which causes the issue.
... View more
04-26-2016
07:59 PM
@Sunile Manjee The dependencies are derived based on the entity description, once you create those entities using Falcon (UI or CLI). So for e.g., you define your cluster in the cluster entity xml, you specify the name.. <cluster colo="location1" description="primaryDemoCluster" name="primaryCluster" xmlns="uri:falcon:cluster:0.1"> When you define this cluster in a feed entity, the dependency gets created when you create the feed entity.. <feed description="Demo Input Data" name="demoEventData" xmlns="uri:falcon:feed:0.1">
<tags>externalSystem=eventData,classification=clinicalResearch</tags>
<groups>events</groups>
<frequency>minutes(3)</frequency>
<timezone>GMT+00:00</timezone>
<late-arrival cut-off="hours(4)"/>
<clusters>
<cluster name="primaryCluster" type="source">
<validity start="2015-08-10T08:00Z" end="2016-02-08T22:00Z"/>
<retention limit="days(5)" action="delete"/>
</cluster>
</clusters>
The same concept applies to processes to feed dependencies.. Take a look at this example for working set of falcon entities - https://github.com/sainib/hadoop-data-pipeline/tree/master/falcon
... View more
04-13-2016
05:30 PM
Try the same command after performing kinit - kinit -kt <PATH_TO_KEYTAB> <YOUR-PRINCIPAL-ID>
... View more
03-24-2016
04:10 PM
1 Kudo
@Alex Raj
So it appears your calling a Shell action which is expected to produce some output (within file system or hdfs) and you want to see that, is that correct? Or Are you actually wanting to capture the output (echo statements) in the script for the purpose of referencing those values in subsequent steps in the Oozie workflow?
If its the latter, see the response from @Benjamin Leonhardi
If its the former, which I believe you are asking then the answer is (you wont be thrilled) - It depends.
It depends on what the script is doing. I can imagine few scenarios and will talk through that but let us know if you are doing something different in which case, we can talk specific about that. So here is what you MAY be doing in the script - writing to a local file with absolute path writing to a local file with relative path writing to a HDFS file with absolute path Writing to a local file with absolute path -
Lets say the script does this - touch /tmp/a.txt In this case, the output gets created on the local filesystem of nodemanager where the task got executed. There is really no way to tell which one.. so you would have to check all nodes. The good thing is that you know what the absolute path is. Writing to a local file with relative path - Lets say the script does this - touch ./a.txt In this case, the output gets created on the local filesystem of nodemanager, where the task got executed, but relative to the working temp directory where workflow temporary files are created. There is really no way to tell which note and we may never even see the actual file because usually the temporary files are cleaned up after the workflow is executed. SO if the file is within the subdirectory then it will most likely be deleted.
Writing to a HDFS file with absolute path <- This is the best way to setup the program because you know where to look for output. Lets say the script does this -
echo "my content" >> /tmp/a.txt
hdfs dfs -put /tmp/a.txt /tmp/a.txt In this case, the output gets created on HDFS & you know the path. So its easy to find. If you are not following the last approach, I would recommend that. Hope this helps.
... View more
02-12-2016
12:10 AM
1 Kudo
Is it possible to have multiple NFS Gateways on different nodes on a single cluster?
... View more
Labels:
- Labels:
-
Apache Hadoop
02-02-2016
11:50 PM
1 Kudo
@khushi kalra Like Neeraj and Artem pointed out, Apache Atlas is the right tool for managing metadata for Hadoop. Falcon is more for managing the data pipeline and data workflow management which is big part of overall data governance but not metadata. In addition to the links and resources provided, here is a Apache Atlas presentation video by Governance product manager, Andrew Ahn.. https://www.youtube.com/watch?v=LZR4qhKJeSI
... View more
01-22-2016
08:36 PM
@Balu Back to the original error.
... View more
01-20-2016
04:23 AM
@Balu
Some more updates - Tried couple of things but still having issue A) Tried changing dataIn=org.apache.oozie.extensions.OozieELExtensions#ph3_dataIn, =org.apache.oozie.coord.CoordELFunctions#ph3_coord_nominalTime, To dataIn=org.apache.oozie.extensions.OozieELExtensions#ph3_dataIn, nominalTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_nominalTime, B) Also added dataIn to the "oozie.service.ELService.ext.functions.coord-job-submit-instances" Still getting error --- "Caused by: E1004 : E1004: Expression language evaluation error, Unable to evaluate :${dataIn('eventData', 'null')}:"
... View more
01-20-2016
04:13 AM
@Balu Does this look okay to you? This is from the documentation - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring_oozie_for_falcon.html
... View more
01-20-2016
04:10 AM
@Balu Following 3 properties were missing in the sandbox, that I think should be there because we dont want folks using Sandbox to get stuck with this issue. However the issue is not yet resolved.. Previously, I was misisng the function "now" which got added with the properties detailed at that link but now I am missing another function - dataIn Missing Properties New Exception - Caused by: E1004 : E1004: Expression language evaluation error, Unable to evaluate :${dataIn('eventData', 'null')}:
at org.apache.oozie.client.OozieClient.handleError(OozieClient.java:612)
... View more
01-20-2016
03:48 AM
I agree that Coord El function props are missing but I wasnt sure about the steps to add those. Let me try http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring_oozie_for_falcon.html and will update this thread.
... View more
01-20-2016
03:47 AM
Balu - I am using the latest build - [root@sandbox ~]# cat sandbox.info
Sandbox information:
Created on: 27_10_2015_15_18_06 for vmware
Hadoop stack version: Hadoop 2.7.1.2.3.2.0-2950
Ambari Version: 2.1.2
Ambari Hash: 0ef0b7b62cf14eaaff3c5c3f416253f568f323f9
Ambari build: Release : 377
OS Version: CentOS release 6.7 (Final)
... View more
01-20-2016
03:41 AM
@niraj nagle I think you have to create those folders. Do the following from command line as root user. su - hdfs
hdfs dfs -mkdir /user/admin
hdfs dfs -chmod 755 /user/admin
hdfs dfs -chown admin:hadoop /user/admin
... View more
01-20-2016
03:18 AM
@rmolina @Shivaji
... View more
01-20-2016
03:17 AM
2 Kudos
Unable to use expression language functions in oozie workflow with Falcon. It seems some jar files are missing but unsure what. Here is the exception when using - http://hortonworks.com/hadoop-tutorial/defining-processing-data-end-end-data-pipeline-apache-falcon/ The error occurs at this step - falcon entity -type process -schedule -name rawEmailIngestProcess 2016-01-19 21:08:31,969 ERROR CoordSubmitXCommand:517 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] XException,
org.apache.oozie.command.CommandException: E1004: Expression language evaluation error, Unable to evaluate :${now(0,0)}:
at org.apache.oozie.command.coord.CoordSubmitXCommand.submitJob(CoordSubmitXCommand.java:259)
at org.apache.oozie.command.coord.CoordSubmitXCommand.submit(CoordSubmitXCommand.java:203)
at org.apache.oozie.command.SubmitTransitionXCommand.execute(SubmitTransitionXCommand.java:82)
at org.apache.oozie.command.SubmitTransitionXCommand.execute(SubmitTransitionXCommand.java:30)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.CoordinatorEngine.dryRunSubmit(CoordinatorEngine.java:561)
at org.apache.oozie.servlet.V1JobsServlet.submitCoordinatorJob(V1JobsServlet.java:228)
at org.apache.oozie.servlet.V1JobsServlet.submitJob(V1JobsServlet.java:95)
at org.apache.oozie.servlet.BaseJobsServlet.doPost(BaseJobsServlet.java:102)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:304)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:171)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:595)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554)
at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:176)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.oozie.coord.CoordinatorJobException: E1004: Expression language evaluation error, Unable to evaluate :${now(0,0)}:
at org.apache.oozie.command.coord.CoordSubmitXCommand.resolveTagContents(CoordSubmitXCommand.java:1003)
at org.apache.oozie.command.coord.CoordSubmitXCommand.resolveIOEvents(CoordSubmitXCommand.java:889)
at org.apache.oozie.command.coord.CoordSubmitXCommand.resolveInitial(CoordSubmitXCommand.java:797)
at org.apache.oozie.command.coord.CoordSubmitXCommand.basicResolveAndIncludeDS(CoordSubmitXCommand.java:606)
at org.apache.oozie.command.coord.CoordSubmitXCommand.submitJob(CoordSubmitXCommand.java:229)
... 32 more
Caused by: java.lang.Exception: Unable to evaluate :${now(0,0)}:
at org.apache.oozie.coord.CoordELFunctions.evalAndWrap(CoordELFunctions.java:723)
at org.apache.oozie.command.coord.CoordSubmitXCommand.resolveTagContents(CoordSubmitXCommand.java:999)
... 36 more
Caused by: javax.servlet.jsp.el.ELException: No function is mapped to the name "now"
at org.apache.commons.el.Logger.logError(Logger.java:481)
at org.apache.commons.el.Logger.logError(Logger.java:498)
at org.apache.commons.el.Logger.logError(Logger.java:525)
at org.apache.commons.el.FunctionInvocation.evaluate(FunctionInvocation.java:150)
at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:263)
at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190)
at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204)
at org.apache.oozie.coord.CoordELFunctions.evalAndWrap(CoordELFunctions.java:714)
... 37 more
... View more
Labels:
- Labels:
-
Apache Falcon
-
Apache Oozie
01-14-2016
05:16 PM
didnt realize the question was about nifi.. my bad.
... View more
01-14-2016
05:11 PM
Assuming you are okay with using Hive for this, you would just create a table with one column (column name something like row) and then load the whole file into that table. Run a query to then split the columns and insert in another table. Here are more details and code snippet. https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205
... View more
01-14-2016
01:32 PM
1 Kudo
@Akshay Shingote See this question. This issue is not caused by how your workflow.xml is configured but the permissions on it. The root cause of this issue is that the user you are using to run the workflow does not have permission to read the workflow.xml. Change the permissions on workflow.xml to 777 or 755 and try again.
Also make sure that the directories (absolute path) that contains the workflow.xml also has at least 755, so that the user is able to get to the file and then read it Here is the method that is generating this error.
... View more
01-13-2016
03:49 PM
Also make sure that the directories (absolute path) that contains the workflow.xml also has at least 755, so that the user is able to get to the file and then read it.
... View more
01-13-2016
03:48 PM
2 Kudos
@Hefei Li The root cause of this issue is that the user you are using to run the workflow does not have permission to read the workflow.xml. Change the permissions on workflow.xml to 777 or 755 and try again. Here is the method that is generating this error.
... View more
01-13-2016
03:00 PM
Is there a way to see the list of unanswered questions to be able to help those who havent been helped?
... View more
- Tags:
- hcc