It lets you see your local computer files and you remote EC2 instance. Now download the following to your computer:
http://www.sai.msu.su/apache/hadoop/core/stable/
for the latest stable version of HADOOP and download CASSANDRA:
http://cassandra.apache.org/download/
“PIG” is a query language designed for Big Data. We will use this query our Big Data dataset.
http://www.sai.msu.su/apache/pig
Now copy these over to your new EC2 Linux Server:
Once the files have been copied copy and paste the following command:
tar -xvf hadoop-0.20.2.tar.gz tar -xvf apache-cassandra-1.2.1-bin.tar.gz tar -xvf pig-0.10.1.tar.gz Please also be sure to run this command in this directory by typing these commands:
» cd pig-0.10.1 (cd changes » tar –xvf tutorial.tar (also can use utility gunzip)
This extracts the files which are compressed much like a ZIP file.
It is possible to choose MS Windows Server as your preferred EC2 server. We installed Linux here (it is cheaper to run than Windows Server)… so editing files with the VI editing tool is a bit harder to do. Lookup VI on the Internet – it is like a very powerful Windows notepad but is command line driven.
Getting The Linux Environment Set Up – Basic Steps
Type the following:
» cd (changes to the main directory) » vi .bash_profile (vi is the editor and you will be modifying a simple text configuration file) – please see this helpful link from University of San Diego
http://acms.ucsd.edu/info/vi_tutorial.html
» copy and paste the following into your file # . bash_profile
# Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi