Member since
10-19-2016
151
Posts
59
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
998 | 03-22-2018 11:48 AM | |
1499 | 01-12-2018 06:25 PM | |
2630 | 01-12-2018 03:56 AM | |
4211 | 01-12-2018 03:38 AM | |
2290 | 01-02-2018 10:29 PM |
07-06-2017
08:38 AM
1 Kudo
Mirroring Datasets Between Hadoop Clusters with Apache Falcon
Introduction
Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters.
It provides data management services such as retention, replications across clusters, archival etc. It makes it much simpler to onboard new workflows/pipelines, with support for late data handling and retry policies. It allows you to easily define relationship between various data and processing elements and integrate with metastore/catalog such as Hive/HCatalog. Finally it also lets you capture lineage information for feeds and processes.
In this tutorial we are going walk the process of mirroring the datasets between Hadoop clusters.
Prerequisites
Download Hortonworks Sandbox 2.5
Complete the Learning the Ropes of the Hortonworks Sandbox tutorial, you will need it for logging into Ambari as an administrator user.
Complete the Leveraging Apache Falcon with Your Hadoop Clusters tutorial to start the falcon service, prepare HDFS directories for Falcon cluster and to create Falcon cluster entities.
Outline
1. Create ambari-qa user
2. Preparing HDFS Directories
3. Setting up the Mirroring Job
4. Running the Job
Summary
1. Create ambari-qa user
After creating cluster entities, let us go back to Ambari as admin user. Click on admin menu drop down and then Manage Ambari :
Click the blue Users button in the bottom box as given below:
Click the Create Local User button at the top of the page. Enter ambari-qa as the user name and then set the password for it. Enter it again for confirmation and Save the user.
You can see the newly added ambari-qa user. Click on it to assign it a group so that it can access Ambari views.
Write "views" and select it in Local Group Membership box and then click on tick mark to add an ambari-qa user in the "views" group.
Now logout of Ambari from the admin user and login to Ambari as ambari-qa user.
2. Preparing HDFS Directories
Select the Files View and you can view the following default folders:
Navigate to /user/ambari-qa and create a new directory falcon
Click on the row of falcon directory and then click on Permissions button:
Add Write permission for both Group and Others and then click Save .
Now create the directories mirrorSrc and mirrorTgt under /user/ambari-qa/falcon as the source and target of the mirroring job we are about to create.
<!—
After creating cluster entities, let’s go back to the SSH terminal, switch the user to root and then to ambari-qa :
hadoop fs -mkdir /user/ambari-qa/falcon
hadoop fs -mkdir /user/ambari-qa/falcon/mirrorSrc
hadoop fs -mkdir /user/ambari-qa/falcon/mirrorTgt
Now we need to set permissions to allow access. You must be logged in as the owner of the directory /user/ambari-qa/falcon/
hadoop fs -chmod -R 777 /user/ambari-qa/falcon
–>
3. Setting up the Mirroring Job
To create the mirroring job, go back to the Falcon UI on your browser and click on the Create drop down.
Click Mirror from the drop down menu, you will see a page like this:
Provide a name of your choice. The name must be unique to the system. We named the Mirror Job MirrorTest .
Ensure the File System mirror type is selected, then select the appropriate Source and Target and type in the appropriate paths. In our case the source cluster is primaryCluster and that HDFS path on the cluster is /user/ambari-qa/falcon/mirrorSrc .
The target cluster is backupCluster and that HDFS path on the cluster is /user/ambari-qa/falcon/mirrorTgt .
Also set the validity of the job to your current time, so that when you attempt to run the job in a few minutes, the job is still within the validity period. Keep default values in Advanced Options and then Click Next .
Verify the summary information, then click Save :
4. Running the Job
Before we can run the job, we need some data to test on HDFS.
<!—Let’s give us permission to upload some data using the HDFS View in Ambari.
su - root
su hdfs
hadoop fs -chmod -R 775 /user/ambari-qa
Open Ambari from your browser at port 8080.
Then launch the HDFS view from the top right hand corner.
–>
Keep login as ambari-qa and from the view on the Ambari console navigate to the directory /user/ambari-qa/falcon/mirrorSrc .
Click Upload button and upload any file you want to use.
Once uploaded the file should appear in the directory.
Now navigate to the Falcon UI and search for the job we created. The name of the Mirror job we had created was MirrorTest .
Select the MirrorTest job by clicking the checkbox and then click on Schedule .
The state of the job should change from SUBMITTED to RUNNING .
After a few minutes, use the HDFS View in the Ambari console to check the /user/ambari-qa/falcon/mirrorTgt directory and you should see that your data is mirrored.
Summary
In this tutorial we walked through the process of mirroring the datasets between two cluster entities.
... View more
- Find more articles tagged with:
- Falcon
- How-ToTutorial
- mirroring
- Sandbox & Learning
Labels:
07-06-2017
08:34 AM
Create a Falcon Cluster Introduction Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It makes it much simpler to onboard new workflows/pipelines, with support for late data handling and retry policies. It allows you to easily define relationship between various data and processing elements and integrate with metastore/catalog such as Hive/HCatalog. Finally it also lets you capture lineage information for feeds and processes. In this tutorial, we are going to create a Falcon cluster by :
Preparing up HDFS directories
Creating two cluster entities (primaryCluster and backupCluster) Prerequisite
Download Hortonworks Sandbox 2.5
Complete the Learning the Ropes of the Hortonworks Sandbox tutorial, you will need it for logging into Ambari as an administrator user. Once you have downloaded the Hortonworks Sandbox and run the VM, navigate to the Ambari interface on the port 8080 of the IP address of your Sandbox VM. Login with the username of admin and the password that you set for the Ambari admin user. You should have a similar image as below: Outline
1. Scenario
2. Starting Falcon
3. Create a Ambari Falcon user
4. Preparing HDFS Directories
5. Creating Cluster Entities
5.1 Creating primaryCluster Entity using Wizard
5.2 Creating primaryCluster Entity using XML
5.3 Creating backupCluster Entity using Wizard
5.4 Creating backupCluster Entity using XML
Summary
Further Reading 1. Scenario In this tutorial, we are going to create a Falcon cluster so that we can configure data pipelines and then perform the feed management services such as feed retention, data replication across clusters and archival. This tutorial is the starting point of all Falcon tutorials where we create two cluster entities which define where the data and the processes for your data pipeline are stored. Allow yourself 1 quality hour to complete this tutorial. 2. Starting Falcon By default, Falcon is not started on the Sandbox. You can start the Falcon service from Ambari by clicking on the Falcon icon in the left hand pane: Then click on the Service Actions button on the top right: Then click on Start : Once Falcon starts, Ambari should clearly indicate as below that the service has started: 3. Create a Ambari falcon user Click the button at the top of the Ambari screen with admin menu drop down and click Manage Ambari . Click the blue Users button in the bottom box as given below: Click the Create Local User button at the top of the page. Enter falcon as the user name and then set the password for it. Enter it again for confirmation and Save the user. You can see the newly added falcon user. Click on it to assign it a group so that it can access Ambari views.
Write "views" and select it in Local Group Membership box and then click on tick mark to add a falcon user in the "views" group. Now logout of Ambari from the admin user and login to Ambari as falcon user. 4. Preparing HDFS Directories Select the Files View like given below: The Files View Interface will appear with the following default folders. We need to create the directories on HDFS representing the two clusters that we are going to define, namely primaryCluster and backupCluster .
Navigate to /apps/falcon folder, click the New Folder button , an add new folder window appears and name the folder primaryCluster. Press enter or Add Similarly, create another folder called backupCluster , you will see your new directories created successfully: Click on the row of primaryCluster directory and then click on Permissions button: Add Write permission for both Group and Others and then click Save . Do the same for backupCluster directory. Now navigate down into the primaryCluster directory and create two new directories: staging and working . Click on the row for the staging directory and add Write permission for both Group and Others. Refresh the page and then navigate to /apps/falcon/primaryCluster to see the changes: Repeat the same steps for backupCluster . Create two directories- staging and working and then assign Write permission in staging directory for Group and Others.
<!—
First SSH into the Hortonworks Sandbox with the command: <code>ssh root@127.0.0.1 -p 2222
The default password is hadoop . If you have changed it earlier, then enter the new one. We need to create the directories on HDFS representing the two clusters that we are going to define, namely primaryCluster and backupCluster . First, from the command line, check whether the Falcon server is running or not.
Switch the user to Falcon using: <code>su - falcon
Change the directory to your HDP version: <code>cd /usr/hdp/current/falcon-server
And run the below script to find the status of Falcon server: <code>./bin/falcon-status
Next, use hdfs dfs -mkdir commands to create the directories /apps/falcon/primaryCluster and /apps/falcon/backupCluster on HDFS. <code> hdfs dfs -mkdir /apps/falcon/primaryCluster
hdfs dfs -mkdir /apps/falcon/backupCluster
Further create directories called staging inside each of the directories we created above: <code> hdfs dfs -mkdir /apps/falcon/primaryCluster/staging
hdfs dfs -mkdir /apps/falcon/backupCluster/staging
Next, create the working directories for primaryCluster and backupCluster : <code> hdfs dfs -mkdir /apps/falcon/primaryCluster/working
hdfs dfs -mkdir /apps/falcon/backupCluster/working
Finally you need to set the proper permissions on the staging/working directories: <code>hdfs dfs -chmod 777 /apps/falcon/primaryCluster/staging
hdfs dfs -chmod 755 /apps/falcon/primaryCluster/working
hdfs dfs -chmod 777 /apps/falcon/backupCluster/staging
hdfs dfs -chmod 755 /apps/falcon/backupCluster/working
–> 5. Creating Cluster Entities Let’s open the Falcon Web UI. You can navigate to the Falcon Web UI directly on the browser. Type 127.0.0.1:15000 . The Falcon UI is by default at port 15000. The default username is ambari-qa .
<!—
You can easily launch the Falcon Web UI from Ambari:
Navigate to the Falcon Summary page and click Quick Links>Falcon Web UI .
–> This UI allows us to create and manage the various entities like Cluster, Feed, Process and Mirror. Each of these entities are represented by an XML file that you either directly upload or generate by filling out the various fields.
You can also search for existing entities and then edit, change state, etc. Let’s first create a couple of cluster entities. To create a cluster entity click on the Create dropdown, Click Cluster on the top. NOTE : If you want to create it from XML, skip the wizard section, and move on to the next one. 5.1 Creating primaryCluster Entity using Wizard A cluster entity defines the default access points for various resources on the cluster as well as default working directories to be used by Falcon jobs. To define a cluster entity, we must specify a unique name by which we can identify the cluster. In this tutorial, we use: <code>primaryCluster
Next enter a data center name or location of the cluster and a description for the cluster. The data center name can be used by Falcon to improve performance of jobs that run locally or across data centers. Mention primaryColo in Colo and this is primary cluster in description. All entities defined in Falcon can be grouped and located using tags. To clearly identify and locate entities, we assign the tag: <code>EntityType
With the value <code>Cluster
Next, we enter the URI for the various resources Falcon requires to manage data on the clusters. These include the NameNode dfs.http.address, the NameNode IPC address used for file system metadata operations, the Yarn client IPC address used for executing jobs on Yarn, the Oozie address used for running Falcon Feeds and Processes, and the Falcon messaging address. The values we will use are the defaults for the Hortonworks Sandbox; if you run this tutorial on your own test cluster, modify the addresses to match those defined in Ambari: <code>Namenode DFS Address - hftp://sandbox.hortonworks.com:50070
File System Default Address - hdfs://sandbox.hortonworks.com:8020
YARN Resource Manager Address - sandbox.hortonworks.com:8050
Workflow Address - http://sandbox.hortonworks.com:11000/oozie/
Message Broker Address - tcp://sandbox.hortonworks.com:61616?daemon=true
You can also override cluster properties for a specific cluster. This can be useful for test or backup clusters which may have different physical configurations. In this tutorial, we’ll just use the properties defined in Ambari.
After the resources are defined, you must define default staging, temporary, and working directories for use by Falcon jobs based on the HDFS directories you created earlier in the tutorial. These can be overridden by specific jobs, but will be used in the event no directories are defined at the job level. In the current version of the UI, these directories must exist, be owned by Falcon, and have the proper permissions. <code>Staging* - /apps/falcon/primaryCluster/staging
Temp* - /tmp
Working* - /apps/falcon/primaryCluster/working
We then need to specify the owner and permissions for the cluster. Click on Advanced Options drop down menu So we enter: <code>Owner: ambari-qa
Group: users
Permissions: 755
Owner - Check box Read, Write and Execute
Group - Check box Read and Execute
Others - Check box Read and Execute
If you want to view the XML preview of whatever values you are entering, you can click on XML preview. Click Next to view the summary. Click Save to persist the entity. 5.2 Creating primaryCluster Entity using XML After clicking on the Create drop down menu, select Cluster button and click on the Edit XML button over XML Preview area. Replace the XML content with the XML document below: <code><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster name="primaryCluster" description="this is primary cluster" colo="primaryColo" xmlns="uri:falcon:cluster:0.1">
<tags>primaryKey=primaryValue</tags>
<interfaces>
<interface type="readonly" endpoint="hftp://sandbox.hortonworks.com:50070" version="2.2.0"/>
<interface type="write" endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0"/>
<interface type="execute" endpoint="sandbox.hortonworks.com:8050" version="2.2.0"/>
<interface type="workflow" endpoint="http://sandbox.hortonworks.com:11000/oozie/" version="4.0.0"/>
<interface type="messaging" endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true" version="5.1.6"/>
</interfaces>
<locations>
<location name="staging" path="/apps/falcon/primaryCluster/staging"/>
<location name="temp" path="/tmp"/>
<location name="working" path="/apps/falcon/primaryCluster/working"/>
</locations>
<ACL owner="ambari-qa" group="users" permission="0x755"/>
<properties>
<property name="test" value="value1"/>
</properties>
</cluster>
Click Finish on top of the XML Preview area to save the XML. Falcon UI should have automatically parsed out the values from the XML and populated in the right fields. Once you have verified that these are the correct values press Next . Click Save to persist the entity. You should receive a notification that the operation was successful. Falcon jobs require a source cluster and a destination, or target, cluster. For some jobs, this may be the same cluster, for others, such as Mirroring and Disaster Recovery, the source and target clusters will be different. NOTE : If you want to create it from XML, skip the wizard section, and move on to the next one. 5.3 Creating backupCluster Entity using Wizard Let’s go ahead and create a second cluster by creating a cluster with the name: <code>backupCluster
Mention backupColo in Colo and this is backup cluster in description. Reenter the same information you used above except for the directory information. For the directories, use the backupCluster directories created earlier in the tutorial. <code>Staging* - /apps/falcon/backupCluster/staging
Temp* - /tmp
Working* - /apps/falcon/backupCluster/working
Click Save to persist the backupCluster entity. 5.4 Creating backupCluster Entity using XML Click on Create drop down menu and click Cluster button to open up the form to create the cluster entity.
Click on the Edit XML button over XML Preview area. Replace the XML content with the XML document below: <code><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster name="backupCluster" description="this is backup colo" colo="backupColo" xmlns="uri:falcon:cluster:0.1">
<tags>backupKey=backupValue</tags>
<interfaces>
<interface type="readonly" endpoint="hftp://sandbox.hortonworks.com:50070" version="2.2.0"/>
<interface type="write" endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0"/>
<interface type="execute" endpoint="sandbox.hortonworks.com:8050" version="2.2.0"/>
<interface type="workflow" endpoint="http://sandbox.hortonworks.com:11000/oozie/" version="4.0.0"/>
<interface type="messaging" endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true" version="5.1.6"/>
</interfaces>
<locations>
<location name="staging" path="/apps/falcon/backupCluster/staging"/>
<location name="temp" path="/tmp"/>
<location name="working" path="/apps/falcon/backupCluster/working"/>
</locations>
<ACL owner="ambari-qa" group="users" permission="0x755"/>
<properties>
<property name="test2" value="value2"/>
</properties>
</cluster>
Click Finish on top of the XML Preview area to save the XML and then the Next button to verify the values. Once you have verified that these are the correct values press Next . Click Save to persist the backupCluster entity. Summary In this tutorial we learned how to create cluster entities in Apache Falcon using the Falcon UI. Now go ahead and start creating feeds and processes by exploring more Falcon tutorials. Further Reading You can go to following links to explore other Falcon tutorials:
Mirroring Datasets between Hadoop Clusters with Apache Falcon
Define and Process Data Pipelines in Hadoop with Apache Falcon
Incremental Backup of data from HDP to Azure using Falcon for Disaster Recovery and Burst Capacity
Processing Data Pipeline using Apache Falcon
... View more
- Find more articles tagged with:
- clustering
- Falcon
- How-ToTutorial
- Sandbox & Learning
Labels:
07-03-2017
02:09 PM
@arulraj rajamanickam First time I've seen this, I'll try reproducing and report back. Are you able to SSH in and check out log output at /var/log/ambari-server ? Edit: Just as a check: are you running HDP 2.5?
... View more
06-19-2017
09:24 PM
Hey @Ronny Lempel, thanks for reporting that! Getting that cleaned up as I type this. Here's the new tutorial that should be taking it's place: https://hortonworks.com/tutorial/realtime-event-processing-in-hadoop-with-nifi-kafka-and-storm/
... View more
04-26-2017
07:54 PM
1 Kudo
I have "hive.server2.transport.mode" set to "binary", though when connecting to Hive via, for example, Tableau, I have to select "SASL" as the transport mode from within Tableau. Trying to connect with binary gives connection errors. Is the "hive.server2.transport.mode" config not in effect, or is there something else at work that mat not seem intuitive? Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
04-11-2017
09:19 PM
@Jose Johny Would it be possible to get a screenshot of what your "docker ps" looks like? Thanks!
... View more
04-11-2017
04:21 PM
@Greg Goleash Heya! 🙂 https://gist.github.com/orendain/8d05c5ac0eecf226a6fed24a79e5d71a Mind where I ask you got the last link from? It it's still floating around, it may not have been touched up during one of the last updates.
... View more
04-10-2017
07:02 PM
@Saurabh D Thanks for sharing those screenshots. As a sanity check, could you see if, in spite of these errors, you're able to SSH into the Sandbox? (ssh -p 2222 root@localhost) Also, would you be able to delete and redeploy the sandbox to rule out a corrupt Virtualbox build?
... View more
04-10-2017
06:15 PM
@ssanthosh Perfect, thanks!
... View more
04-09-2017
01:35 AM
1 Kudo
When adding users to a policy, the values "{USER}" and "{OWNER}" show up, though they're not found on Ranger's user/groups list. Are they special values? The same goes for other users on the list (e.g. one named "keyadmin"). I've verified that they're not being synced from the host unix system. Where are these other users being pulled from?
... View more
Labels:
- Labels:
-
Apache Ranger
04-08-2017
08:13 PM
@Saurabh D Could you provide a bit more info as to how the sandbox isn't starting? At what point in the process do things go wrong? In addition, I should point out that running the HDP 2.5 sandbox on < 8 gigs will be tough (an alternative is to deploy in the cloud), but we can first figure out where the boot process is stalling at.
... View more
04-06-2017
07:08 PM
@Jay Goebel That tutorial was rewritten and expanded into nine sections (https://github.com/hortonworks/data-tutorials/tree/archive-hdp-2.3/tutorials/hortonworks/hello-hdp-an-introduction-to-hadoop), but it looks like that specific tutorial, before it was expanded, is hanging out at https://github.com/hortonworks/data-tutorials/blob/archive-hdp-2.3/tutorials/hortonworks/hello-hdp-an-introduction-to-hadoop/hello-world-an-introduction-to-hive-and-pig-2-3.md
... View more
04-06-2017
06:33 PM
@Jay Goebel Check out the archived-hdp-2.3 branch of the Data Tutorials repository. Direct link here: https://github.com/hortonworks/data-tutorials/tree/archive-hdp-2.3 Hope that helps, good luck!
... View more
04-06-2017
04:01 PM
1 Kudo
@spdvnz Here's a really good article on leveraging WebSockets in NiFi: https://community.hortonworks.com/articles/68378/nifi-websocket-support.html
... View more
04-06-2017
03:50 AM
@Alexey Grant Edit: Removed answer, as I had the same suggestion as @Jay SenSharma. Instead, I'll just leave the updated script below. Also, there were a couple of powershell script contributions for that script - here's a modified version that you're welcome to take for a spin 🙂 https://gist.github.com/orendain/32e03ff13d17ad1d72a3b253ba06c619
... View more
04-05-2017
07:31 PM
@Mothilal marimuthu That happens when local ports are already bounded. Could you make sure that you don't already have an SSH connection open? Maybe in a different shell somewhere? Also, thanks for pointing out the .ssh/config issues! That would be a good note to add to the tutorials
... View more
04-05-2017
05:29 PM
@Asimakis Nikolaou So you are able to SSH into the sandbox and access Ambari from your browser, but you aren't able to log into Ambari. You will have to set your Ambari's admin password before logging in. Note that the username/password you set up for your Azure VM is not the same as the one you use to log into Ambari. After you set up an SSH tunnel (the tutorial covers that part), try SSH'ing into the sandbox using the following command: ssh root@localhost -p 2222 Default password is: hadoop Once you're logged in, run the following command in order to reset your ambari password. ambari-admin-password-reset That command will reset your Ambari password and restart Ambari for you. Hope that helps, let me know how it goes! Edgar
... View more
04-05-2017
05:23 PM
@Rafael Gomez Absolutely - will get that updated and report back by today. https://github.com/hortonworks/data-tutorials/issues/232
... View more
04-04-2017
11:39 PM
@Timothy Currell I added a new section to the "Deploying Hortonworks Sandbox on Microsoft Azure" tutorial titled "Using PuTTY". Here's a direct link to that section on GitHub https://github.com/orendain/data-tutorials/blob/229-windows-tunneling/tutorials/hdp/hdp-2.5/deploying-hortonworks-sandbox-on-microsoft-azure/tutorial.md#using-putty I'm happy to make more additions to it, or elaborate/clarify wherever necessary. If you think this is clear, I'll make the change to the official Hortonworks tutorial as well.
... View more
04-04-2017
08:42 PM
@Timothy Currell I got you, amigo. I'll make some time this afternoon to write out some instructions on Windows and will see what you think. Will update soon.
... View more
04-03-2017
09:18 PM
@Dinesh Das You'll have to uninstall old versions of Docker first. The logs you posted show docker-selinux may currently be installed. Try this: sudo yum remove docker \
docker-common \
container-selinux \
docker-selinux \
docker-engine
... View more
04-03-2017
09:15 PM
1 Kudo
@Neelabh Kashyap Currently, Azure takes a while to fully boot the VM. Like you, I'm able to SSH in but get the message about the system still booting and get the SSH connection closed. Waiting a few minutes and reconnecting once the system is up and running (which is still even after the Azure portal shows "Running") should work fine, though. Have you experienced this message even after giving the VM some more time to boot up?
... View more
04-03-2017
09:10 PM
@Rohit Ravishankar Check out this answer to a similar question: https://community.hortonworks.com/answers/64633/view.html Hope that helps!
... View more
04-03-2017
08:57 PM
@Francisco Pires It looks like something may have become corrupt. Here's a related docker issue (more explanatory than a solution) https://github.com/docker/docker/issues/18010 . A dmesg might help track things down
... View more
03-30-2017
10:51 PM
1 Kudo
@Lucky_Luke Through putty you can specify the port you want to connect to. This might be an outdated image, but see the top right of this image that makes available a "Port" field.
... View more
03-30-2017
10:40 PM
@Lucky_Luke Heya, can you verify that through putty you connected to port 2222? I suspect you may have connected to port 2122 or to something other than the sandbox container layer.
... View more
03-30-2017
10:25 PM
@Lucky_Luke Can you run the command `sandbox-version` to see which version of the sandbox you're on? I suspect your on the VM hosting the sandbox and not the sandbox itself. What was the SSH command you ran to connect?
... View more
03-30-2017
10:11 PM
@Francisco Pires Ah yes, /root/start_scripts/start_sandbox.sh is the one I was referring to (sorry, was thinking of a different VM). Can you run this file and see where in the process it's failing? That'll tell us which component/service needs reseting or further inspection.
... View more
03-30-2017
09:38 PM
@elliot gimple I think any good suggestion would require knowing your ETL process, but a couple of general notes: - Use a filter somewhere (c.filter(...), for(i <- c; if something) yield {}, etc.) - Consider using monads (*gulp*, I know). Maybe something as simple as Option/Either.map would work in your case.
... View more
- « Previous
- Next »