About wfloyd

wfloyd · ‎11-09-2015

There is also an article on "How to size memory only RDDs" which references setting spark.executor.memory and spark.yarn.executor.memoryOverhead. Should we use these as well in planning memory/RDD usage? https://www.altiscale.com/blog/tips-and-tricks-for-running-spark-on-hadoop-part-3-rdd-persistence/

wfloyd · ‎11-09-2015

1) How should we approach the question of persist() or cache() when running Spark on YARN. E.g. how should the Spark developer know approximately how much memory will be available to their YARN Queue and use this number to guide their persist choice()? Or should they use some other technique? http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence 2) With Spark on YARN does the "RDD" exist only as long as the SparkDriver lives, as long as the RDD's related spark worker containers live, or based on some other time frame? Thanks!

wfloyd · ‎10-30-2015

What is the maximum number of partitions allowed for a Hive table? E.g. 2k ... 10k? Are there any performance implications we should consider as we get close to this number?

wfloyd · ‎10-29-2015

When a single drive fails on a worker node in HDFS, can this adversely affect performance of jobs running on this node? Or does the DataNode process quickly mark the drive and its HDFS blocks as "unusable". If this could cause a performance impact, how can our customers monitor for these drive failures in order to take corrective action? Ambari Alerts?

wfloyd · ‎10-27-2015

Is there a way the LDAP password can be stored somewhere other than "main.ldapRealm.contextFactory.systemPassword" in the topology XML config file? Customer would like to store this password elsewhere for added security. Thanks!

wfloyd · ‎10-27-2015

Our customer is planning to take advantage of the new Apache Solr auditing capability in HDP 2.3. They would also like to keep their exisitng MySQL DB auditing in place. In HDP 2.3+ is the DB Auditing still supported (deprecated)? Or will we be dropping support for this going forward? Thank you! References: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_Install_Guide/content/ranger_database_settings.html http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_Install_Guide/content/ch_install_solr.html

wfloyd · ‎10-21-2015

1.)Does Knox support active directory searches using nested OUs? I’m reading in some of the documentation that it does not. The main.ldapRealm.userDnTemplate value we are trying to use is samaccountname={0},ou=corp,ou=associates,OU=MY_COMPANY Accounts,DC=amer,DC=qa_my_company,DC=com but the users are not being found. 2.)Does Knox support multiple AD search strings? Not all users that need access to Knox protected services can be found using the single search string above. Would these require multiple Knox Topology files to be applied at once?

wfloyd · ‎10-20-2015

@Guilherme Braccialli Good point. I'm surprised the error shows WEBLOG in all caps, because our HBase and Phoenix table definition is in all lower case. Do you suggest we make all our HBase and Phoenix tables in all Upper case to get past the issue? What is your rule of thumb generally for Phoenix/HBase table names?

wfloyd · ‎10-20-2015

No need for benchmarks or performance numbers. Between using "fair share" and "tez persistent queues" , I'm curious if we should use both techniques in tandem OR understand when we should choose one vs the other? Perhaps the "fair share" approach is best when trying to reconcile many users sharing resources, then "tez persistent queues" are valuable when absolute lowest latency for queries is the primary goal?

wfloyd · ‎10-20-2015

What is the proper way to map an existing HBase table to a new Phoenix table? Phoenix documentation gives a light example on how to do this. When we try this on existing HBase table the Phoenix "create table" command is accepted, however it fails with "table undefined" error when we try to query the new Phoenix table. How could the create table syntax succeeed, yet the "table undefined" error occur when we query it? Hbase table definition tablename = weblog Columnfamily = clicks column=clicks:Compression_ratio column=clicks:Cookie1 column=clicks:Cookie2 column=clicks:Data_center column=clicks:Host column=clicks:Incoming_client_protocol column=clicks:Rqst_status column=clicks:Tran_dt column=clicks:Xforwarder column=clicks:fdx_cbid column=clicks:header_size column=clicks:input_bytes column=clicks:millisecs_to_serv_rqst column=clicks:output_bytes column=clicks:referring_ip_addr column=clicks:rqst_first_ln column=clicks:unknown column=clicks:user_agent column=clicks:web_user column=clicks:web_user2 View DDL used in phoenix CREATE VIEW "weblog" ( pk VARCHAR PRIMARY KEY, "clicks".Compression_ratio VARCHAR, "clicks".Cookie1 VARCHAR, "clicks".Cookie2 VARCHAR, "clicks".Data_center VARCHAR, "clicks".Host VARCHAR, "clicks".Incoming_client_protocol VARCHAR, "clicks".Rqst_status VARCHAR, "clicks".Tran_dt VARCHAR, "clicks".Xforwarder VARCHAR, "clicks".fdx_cbid VARCHAR, "clicks".header_size VARCHAR, "clicks".input_bytes VARCHAR, "clicks".millisecs_to_serv_rqst VARCHAR, "clicks".output_bytes VARCHAR, "clicks".referring_ip_addr VARCHAR, "clicks".rqst_first_ln VARCHAR, "clicks".unknown VARCHAR, "clicks".user_agent VARCHAR, "clicks".web_user VARCHAR, "clicks".web_user2 VARCHAR) List of tables from Phoenix 0: jdbc:phoenix:drh70018.bigdata.fedex.com:21> !tables +------------------------------------------+------------------------------------------+-----------------------------------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | +------------------------------------------+------------------------------------------+-----------------------------------------+ | null | SYSTEM | CATALOG | | null | SYSTEM | SEQUENCE | | null | SYSTEM | STATS | | null | null | CUSTOMERS | | null | null | EXAMPLE | | null | null | WEB_STAT | | null | null | weblog Error from Phoenix: 0: jdbc:phoenix:drh70018.bigdata.fedex.com:21> select * from weblog; Error: ERROR 1012 (42M03): Table undefined. tableName=WEBLOG (state=42M03,code=1012) 0: jdbc:phoenix:drh70018.bigdata.fedex.com:21>

Online	Offline
Last Visited	‎04-24-2017 02:32 PM

Member Since	‎09-23-2015 09:15 PM
Last Visited	‎04-24-2017 02:32 PM
Posts	88
Kudos received	109

Cloudera Community

Re: Is there is any workaround to map csv columns ...

Re: Choosing RDD Persistence and Caching with Spar...

Choosing RDD Persistence and Caching with Spark on...

Maximum Hive Table Partitions allowed & recommende...

How to Alert for HDFS Disk Failures

Does Knox allow LDAP Password to be stored outside...

Ranger Audit Options - Is DB Audit still supported...

Does Knox support active directory searches using ...

Re: How to Map HBase Table to Phoenix ("Table unde...

Re: Hive User Concurrency - Reconciling YARN Capac...

How to Map HBase Table to Phoenix ("Table undefine...