A convenient way to start working on Big Data concepts is to use the Hortpnworks (HDP) Sandbox. I am not aware of a similar sandbox in other Big Data vendors. This sandbox encapsulates all the various components of the Big data ecosystem in a single VM, so it becomes easy to install. This Sandbox is available in 3 compact formats - VirtualBox, VMware and Docker.
I started working with VirtualBox, since that is (in my opinion) a compact and simple container.
But I faced issues because the Sandbox is very large (>10 GB) and it needs at least 8GB RAM. The installation just froze and I could not proceed.
I then decided to try it out in the AWS cloud, since I could spin up more powerful hosts machines. This worked. :)
In this post, I will explain how to get started.
The Hortonworks portal claims that you need just 15 minutes (!), but lets assume 1 hour.
Step 1 - Launch the EC2 instance with the Amazon Linux AMI. Ensure that the security group is configured to allow the ports 8080, 888 and 22 to be accessible.
Step 2 - Install Docker in AWS. This post from AWS will help - http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html
Step 3 - Modify the Docker config file "/etc/sysconfig/docker-storage" to add the line:
Step 4 - Download, unzip, load and start the Docker sandbox image from Hortonworks. This link has the details - http://hortonworks.com/hadoop-tutorial/hortonworks-sandbox-guide/#section_4
While the Hortonworks guide is well written, it misses out the changes needed to increase the device memory.
Please try out this installation and post your comments. Thanks for reading.
If Step 3 above was not run, the you will see the following error when loading the Docker image.
I started working with VirtualBox, since that is (in my opinion) a compact and simple container.
But I faced issues because the Sandbox is very large (>10 GB) and it needs at least 8GB RAM. The installation just froze and I could not proceed.
I then decided to try it out in the AWS cloud, since I could spin up more powerful hosts machines. This worked. :)
In this post, I will explain how to get started.
What do we need to get started?
- AWS account
- Basic undesstanding of AWS concepts (I have glossed over the AWS instance setup, to keep this post compact)
- An EC2 instance with minimum 128 GB disk storage and 16 GB RAM. (I used an instance type of m4.2xlarge). The larger, the better.
How much time do we need?
Steps
Step 1 - Launch the EC2 instance with the Amazon Linux AMI. Ensure that the security group is configured to allow the ports 8080, 888 and 22 to be accessible.
Step 2 - Install Docker in AWS. This post from AWS will help - http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html
Step 3 - Modify the Docker config file "/etc/sysconfig/docker-storage" to add the line:
DOCKER_STORAGE_OPTIONS="--storage-opt dm.basesize=20G"
This step is needed to ensure that Docker is able to import the Sandbox. This is because by default Docker containers can have a maximum size of 10 GB and the Hortonworks Sandbox is about 13 GB. Restart Docker ("sudo ").
Step 4 - Download, unzip, load and start the Docker sandbox image from Hortonworks. This link has the details - http://hortonworks.com/hadoop-tutorial/hortonworks-sandbox-guide/#section_4
While the Hortonworks guide is well written, it misses out the changes needed to increase the device memory.
Please try out this installation and post your comments. Thanks for reading.
Common Issues
If Step 3 above was not run, the you will see the following error when loading the Docker image.
# docker load < HDP_2.5_docker.tar
b1b065555b8a: Loading layer 202.2 MB/202.2 MB
3901568415a3: Loading layer 10.37 GB/13.85 GB
ApplyLayer exit status 1 stdout: stderr: write /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111.x86_64/jre/lib/amd64/server/libjvm.so: no space left on device
I’m not that much of a internet reader to be honest but your blogs really nice, keep it up! I’ll go ahead and bookmark your website to come back down the road sap training centers in hyderabad
ReplyDelete