Member since
02-05-2016
47
Posts
9
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3052 | 08-29-2017 01:21 PM | |
2955 | 06-16-2017 11:19 AM |
10-01-2017
08:12 PM
Looks like you don't have the Python development packages installed. On my Ubuntu system that's the libpythonx.y-dev package, where 'x' is 2 and 'y' is 7. On Red Hat or Cent OS systems I think you want to install the python-devel package. On my test Cent OS 6.6 system python-devel provides /usr/lib64/libpython2.6.so.
... View more
09-19-2017
11:42 AM
You need to download/distribute/activate the KUDU parcel in addition to the CDH parcel. First, add http://archive.cloudera.com/kudu/parcels/5.11.0/ to your parcel repository list in Cloudera Manager. Then a KUDU parcel should show up in the list of available parcels.
... View more
09-18-2017
02:37 PM
Based on the error message, it sounds like the Cloudera Manager server timed out while trying to access the parcel repo URL you specified. Perhaps double check that you didn't introduce any typos, and that you can download <parcel repo URL>/manifest.json using curl or wget from the machine running the CM server? Since this is an issue with CM and parcel installation, if you can't solve the problem I suggest you use this forum: http://community.cloudera.com/t5/Cloudera-Manager-Installation/bd-p/CMInstall
... View more
08-29-2017
01:21 PM
No, --max_clock_sync_error_usec is a Kudu parameter, not an ntp parameter. You need to reconfigure Kudu to use it. See the Kudu documentation and the Cloudera Manager documentation to learn how to reconfigure Kudu.
... View more
08-29-2017
01:18 PM
--max_clock_sync_error_usec is a Kudu gflag parameter, which means it can be passed on the command line (i.e. 'kudu-tserver --fs_wal_dir=... --fs_data_dirs=... --max_clock_sync_error_usec=...'), or it can be included in a gflagfile (i.e. 'kudu-tserver --gflagfile=/path/to/file', then /path/to/file is a file containing "--max_clock_sync_error_usec=..." along with other parameters, each on its own line). If you're managing Kudu with CM, there's no way to feed this parameter into Kudu except via the gflagfile safety valve.
... View more
08-28-2017
03:47 PM
There's no workaround for this yet. For the time being, every Kudu table must have a primary key, and it must be populated if loading data from a table without primary keys.
... View more
08-28-2017
03:45 PM
The Kudu documentation on the subject suggests your ntp installation may need some troubleshooting. Interestingly, the max error you're seeing (10,000,016 microseconds or just past 10 seconds) is very close to the max error allowable (exactly 10 seconds). What you could do is configure your Kudu service with something like --max_clock_sync_error_usec=11000000 (11 seconds), and that should be good enough for your ntp installation. You'd need to put this in the Kudu service's safety valve configuration since --max_clock_sync_error_usec is not exposed as a first class configuration parameter in CM.
... View more
08-28-2017
03:36 PM
As you saw in the error message, the "LOAD DATA" statement doesn't work for Kudu tables. Its documentation says as much. I'm no Impala expert, but perhaps you can build an HDFS table around '/data/feature_matrix_100.csv' first, then SELECT that data into your Kudu table?
... View more
08-28-2017
03:32 PM
Looking at the documentation, I think "STORED AS KUDU" needs to precede "tblproperties(...)". Also, I don't think you want "ROW FORMAT..." or "LOCATION ..."; those clauses aren't relevant for Kudu tables.
... View more
08-28-2017
03:29 PM
Bear in mind that this Python code merely connects and operates on an existing Kudu cluster; it doesn't start one for you. If you haven't yet created and started a Kudu service, you'll need to do that first. If you have, the value for 'host' in kudu.connect() needs to be the valid hostname where you're running the Kudu master.
... View more
08-28-2017
03:27 PM
Those warnings are likely harmless. Is there any actual failure during the upserts?
... View more
08-28-2017
03:19 PM
The "rolling extension" process you're suggesting looks reasonable. It should serve as a workaround for Kudu's current inability to add new data directories.
... View more
08-24-2017
02:18 PM
You need to first install the Kudu client system packages. Check out the documentation here: https://www.cloudera.com/documentation/kudu/latest/topics/kudu_installation.html#install_cmdline Specifically, you need to install the 'kudu-client0' (RPM distros) or 'libkuduclient0' (DEB distros) package.
... View more
07-18-2017
11:36 AM
1 Kudo
Going from 1 to 3 masters requires running a specific workflow. Based on what you wrote I'm not sure whether your single master is working now or not, but that definitely needs to be fixed first. What is the value of the --master_addresses command line parameter for your master? Also, on the master machine, run the following code (adjusting the path as needed): kudu pbc dump --oneline /var/lib/kudu/master/consensus-meta/00000000000000000000000000000000 There should be only one listed peer; if there is more than one, you'll need to use kudu local_replica cmeta rewrite_raft_config to fix it (see the workflow link above for details on how that command works).
... View more
07-18-2017
11:15 AM
Not sure if you've resolved your problem or not, but as a general note, you're using a very old version of Impala, and that version may have interoperability problems with a modern Kudu. It'd be best if you switched to Kudu and Impala from CDH 5.10 or newer. Beginning with CDH 5.10 the special "Impala-Kudu" release of Impala no longer exists as regular CDH Impala can interoperate with Kudu.
... View more
07-18-2017
11:09 AM
1 Kudo
Based on the contents of kudu-master.ERROR, it looks like your deployment has multiple masters, or perhaps had multiple masters in the past. Is that the case? I suspect that the error only surfaced after the upgrade because this may be the first time in a long while that you've restarted the master. Anyway, please describe the topology of your cluster.
... View more
06-16-2017
11:19 AM
1 Kudo
Yes, what you're observing is Kudu preallocating one 64 MB write-ahead log segment for each partition. The space will be filled once you start writing to the partition. In Kudu 1.4 we dropped the segment size from 64 MB to 8 MB. If you'd like to make that change now, you can do so via the --log_segment_size_mb command line option. An alternative would be to disable preallocation via --log_async_preallocate_segments=false and/or --log_preallocate_segments=false, but that's not something we generally test so I would advise against it.
... View more
06-15-2017
09:43 AM
1 Kudo
It would be interesting (and useful) to see what new files were created as part of the CREATE TABLE. I suspect the increase in consumption was due to new WAL segments. We create one per partition and in Kudu 1.3 I believe we preallocate it to 32M. Can you confirm this? Find Kudu's configured WAL directory and look at its contents before and after the CREATE TABLE.
... View more
06-14-2017
11:35 AM
1 Kudo
What version of Kudu are you using? How exactly did you measure the new partition's space consumption?
... View more
06-11-2017
11:51 AM
You can avoid the dependency on ntpd by running Kudu with --use-hybrid-clock=false, but that has a serious effect on transactional consistency so it's not something we recommend. Instead, I'd focus your efforts on figuring out why your servers' time isn't synchronized. It may have to do with your ntp configuration. Unfortunately I don't know how ntp works; perhaps you can search across past forum posts? If you do manage to fix this, please post your findings here; if it's a general purpose fix (i.e. not particular to your site configuration), we'll include it in the Kudu documentation.
... View more
06-09-2017
02:39 PM
HDFS datanodes don't require clock synchronization in the way that Kudu does. Is NTP running on these nodes? What is the output of the 'ntptime' command? Are these nodes running on physical hardware, or something else?
... View more
04-10-2017
12:10 PM
Could you get by with the Kudu quickstart VM? I don't think it's as full-featured as the CDH quickstart VM (it may only have HDFS, Hive, Impala, and Kudu installed), but I know it's set up to use CDH 5.10. You can find documentation for installing it here: https://kudu.apache.org/docs/quickstart.html
... View more
03-28-2017
10:54 AM
1 Kudo
Kudu tables can be range partitioned on multiple columns, though the syntax is slightly different than the Parquet syntax. Take a look at https://kudu.apache.org/docs/kudu_impala_integration.html#_specifying_tablet_partitioning to see what I mean. https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables also has much more information.
... View more
03-21-2017
07:15 PM
I reformatted your errors so they're a bit easier to read: W0322 09:40:06.534153 129361 negotiation.cc:240] Failed RPC negotiation. Trace:
0322 09:39:51.533481 (+ 0us) reactor.cc:378] Submitting negotiation task for server connection from 11.136.156.50:39299
0322 09:39:51.533517 (+ 36us) sasl_server.cc:230] Waiting for connection header
0322 09:39:57.673309 (+6139792us) sasl_server.cc:238] Connection header received
0322 09:39:57.673309 (+ 0us) sasl_server.cc:181] Waiting for next SASL message...
0322 09:40:06.534117 (+8860808us) negotiation.cc:231] Negotiation complete: Timed out: Server connection negotiation failed: server connection from 11.136.156.50:39299
Metrics: {"negotiator.queue_time_us":8}
W0322 09:40:16.245877 164604 connection.cc:464] server connection from 11.136.156.55:43035 recv error: Network error: recv error: Connection reset by peer (error 104)
W0322 09:40:16.246330 164604 connection.cc:378] Connection torn down before Call kudu.tserver.TabletServerService.Write from 11.136.156.55:43035 (ReqId={client: 574a7048c3824c869a1e553825845ba6, seq_no=467150, attempt_no=1}) could send its response
W0322 09:27:45.480054 101702 negotiation.cc:240] Failed RPC negotiation. Trace:
0322 09:27:31.439139 (+ 0us) reactor.cc:378] Submitting negotiation task for server connection from 11.1 31.120.17:41515
0322 09:27:31.439526 (+ 387us) sasl_server.cc:230] Waiting for connection header
0322 09:27:32.087993 (+648467us) sasl_server.cc:238] Connection header received
0322 09:27:32.087994 (+ 1us) sasl_server.cc:181] Waiting for next SASL message...
0322 09:27:45.479950 (+13391956us) sasl_server.cc:295] SASL Server: Received NEGOTIATE request from client
0322 09:27:45.479964 (+ 14us) sasl_server.cc:341] Sent NEGOTIATE response
0322 09:27:45.479967 (+ 3us) sasl_server.cc:181] Waiting for next SASL message...
0322 09:27:45.480007 (+ 40us) negotiation.cc:231] Negotiation complete: Network error: Server connection negotiation failed: server connection from 11.131.120.17:41515: BlockingRecv error: Recv() got EOF from remote (error 108)
Metrics: {"negotiator.queue_time_us":352,"thread_start_us":344,"threads_started":1} Still, that looks fairly innocuous to me, it's just a connection failure. Have you checked whether all of your data was written?
... View more
03-21-2017
02:02 PM
The error suggests that your Impala processes can't communicate with the Kudu tservers. Impala gets the addresses of the tservers from the Kudu Master. Look at the /tablet-servers page in the Kudu Master web UI; are the published tserver addresses/hostnames reasonable? Can you resolve them and connect to them from every machine in the cluster? Separately, look at the process log for the Kudu Master. If Impala can't connect to the tservers, the Kudu Master won't be able to either, and you may see errors here relating to creation of tablets.
... View more
03-21-2017
11:49 AM
I'm not familiar with Storm; how exactly does it write to Kudu? The error sounds like one piece of the Storm job abruptly disconnected from one of the Kudu servers. It may be harmless (maybe Storm will retry), or it may indicate a failure in the job. Given that Storm didn't exhibit any errors, it's likely to be the former, though you could always verify that by reading the data out of the table and ensuring that it's all there. Is there any other associated logging in Kudu about this failure? Maybe a trace message of some kind?
... View more
03-21-2017
11:30 AM
By default, CM will warn when 50% of a process' FDs are in use. Also by default, Kudu's block manager system will use 50% of the FDs available to the process. So, after accounting for some additional FDs for WALs, Kudu ends up using a little over 50% of the available FDs and CM warns about it. If this bothers you, you can: Reconfigure Kudu's block_manager_max_open_files to some fixed value below 16384. The default value of -1 means Kudu will use 50% of what's available (16384 in your case). Reconfigure CM to warn at a higher threshold than 50%. Wait for CDH 5.11, where Kudu's percent usage was dropped from 50% to 40%.
... View more
03-16-2017
01:33 PM
How exactly are you setting the flag in CM? Did you restart the Kudu service after setting it? Have you verified that it actually took effect? When a Kudu process starts up, it'll log its non-default command line arguments; you can see if maintenance_manager_num_threads is in that log output.
... View more