About steven-matison

steven-matison · ‎02-22-2023

@merlioncurry Lacking a bit of deatils, so making some assumptions that you used an Ambari UI to upload to HDFS. So those files are going to be in hdfs://users/maria_dev, not on the actual machine location for the same users. You will need use hdfs commands to view them. If they do not work, then the path you uploaded may be different. From the sandbox prompt: hdfs dfs -ls /users/ hdfs dfs -ls /users/maria_dev

steven-matison · ‎02-22-2023

@fahed The HDFS Service inside of the DataLake is supporting of the environment, and its services. For example: Atlas. Ranger, Solr, Hbase. It's size, is based on the environment scale. You are correct in the assumption that your end user HDFS Service is part of Data Hubs deployed around the environment. You should not try to use the environment's HDFS Service for applications and workloads that would be part of deeper Data Hubs.

steven-matison · ‎02-09-2023

Click into that doc and check out the other escape option. I think you need to handle the quotes too.

steven-matison · ‎02-09-2023

@Techie123 Well, like i said, you have to learn the aws side of providing access to a bucket. A public bucket starting point will show you what you have to do, inside of the bucket config, to allow other systems to access that bucket. For example starting from public open bucket, to whatever access control level you ultimately need to have. Getting lost in that space is not necessarily a "nifi" thing.... so my recommendation is to build nifi with public bucket, THEN when it works, start testing the deeper access requirements. The controller service configuration provides multiple ways to access a bucket and a bunch of settings. Make sure you have a working access/key credentials tested directly in the processor before moving to the Controller Service.

steven-matison · ‎02-09-2023

@codiste_m By default hive will be using Static Partitioning. With Hive you can do Dynamic Partitioning, but i am not sure how well that works with existing data in existing folders. I believe this creates the correct partitions based on the schema, and is creating those partition folders as the data inserts into the storage path. It sounds like you will need to execute a load data command for all partitions you want to query.

steven-matison · ‎02-09-2023

@ShobhitSingh You need to handle the escape with another option: .option("escape", "\\") You may need to experiment with the actual string in the match argument ("//") to suit your needs. Be sure to check spark docs specific to your version. For example: https://spark.apache.org/docs/latest/sql-data-sources-csv.html

steven-matison · ‎02-09-2023

@Iwantkakao There are 2 things i see right off bat: Tue Jan 31 19:47:13 UTC 2023 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. ^^ consider the recommendation: useSSL=false in your sqoop command. 23/01/31 19:47:14 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied for user ''@'localhost' (using password: NO) java.sql.SQLException: Access denied for user ''@'localhost' (using password: NO) ^^ this error is saying that your user does not have access to mysql. You are going to need to provide a specific user, password, and host, with permissions and grants accordingly. If your user is root, add the username and password to the command. Last but not least, sqoop project is now "in the attic" which means the project is no longer actively getting support and developement from the open source community. I recommend that you learn other techniques to complete the same outcome.

steven-matison · ‎02-09-2023

@Techie123 You are going to need to provide credentials for the nifi calls against any s3 bucket with access controls. I would recommend working in lower nifi dev environments and use a public S3 buckets to get comfortable. This will test basics of your flow without access issues. This will also remove confusion around nifi flow functionality vs AWS access issues and help you learn when/where to use the different ways (key, or a controler service w/ credentials) to provide access from nifi to s3.

steven-matison · ‎02-07-2023

@Abdulrahmants if you need to talk to someone about getting those added, please reach out in direct message. Another approach could be to create an API input endpoint on nifi (handleHttpRequest/handleHttpResponse), and make a scripted (python,java,etc) process to send the file to the nifi endpoint.

steven-matison · ‎02-06-2023

I believe this is a job for MiNiFi https://nifi.apache.org/minifi/index.html Basically, you create a small minifi flow to run on the server/network with privelaged access, and this flow will send its results to NiFi.

Online	Offline
Last Visited	‎10-28-2024 11:50 AM

Member Since	‎02-01-2022 01:27 PM
Last Visited	‎10-28-2024 11:50 AM
Posts	269
Kudos received	94

Cloudera Community

Re: Apache Nifi Release 2.0 M1 & M2 High CPU Utili...

Re: error nifi connecting as cluster

Re: Difficulty Sending GraphQL POST Requests Using...

Re: Should i have to restart entire cluster if CM ...

Re: NIFI ListenUDP with TLS support?

Re: Can't locate files uploaded through ambari

Re: CDP Public Cloud Datalake HDFS usage

Re: Reading CSV File Spark - Issue with Backslash

Re: Getting Error While fetching files from s3 buc...

Re: How to load existing partitoned parquet data i...

Re: Reading CSV File Spark - Issue with Backslash

Re: sqoop,mysql

Re: Getting Error While fetching files from s3 buc...

Re: Ingesting Malware Features into Nifi and Cloud...

Re: Ingesting Malware Features into Nifi and Cloud...