Big Data on a Shoestring

Big Data on a Shoestring by Nicholas Bessmer Page B

Book: Big Data on a Shoestring by Nicholas Bessmer Read Free Book Online
Authors: Nicholas Bessmer
Ads: Link
/tmp/hadoop-samim/dfs/name has been successfully formatted.
    12/07/15 15:54:21 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at Shamim-2.local/192.168.0.103
    ************************************************************/
     
Start all Hadoop components $ bin/hadoop-daemon.sh start namenode
     
    hadoop-daemon.sh start jobtracker
    hadoop-daemon.sh start datanode
    h adoop-daemon.sh start tasktracker
    hadoop-daemon.sh start secondarynamenode
     
    starting namenode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-namenode-Shamim-2.local.out
    starting jobtracker, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-jobtracker-Shamim-2.local.out
    starting datanode, logging to /Users/samim/Development/NoSQL/hadoop/core/hadoop-0.20.2/bin/../logs/hadoop-samim-datanode-Shamim-2.local.out
    you can check all the log file to make sure that everything goes well.
     
Use the hadoop command-line tool to test the file system: $ hadoop dfs -ls /
     
    hadoop dfs -mkdir /test_dir
    echo "A few words to test" > /tmp/myfile
    hadoop dfs -copyFromLocal /tmp/myfile /test_dir
    hadoop dfs -cat /test_dir/myfile
    A few words to test
     
    And Hadoop is running! Remember the Linux tips:
     
    »         cd – means change to a directory
    »         Linux user forward rather than backslashes
    »         Unless you set your path, you will need to change (cd) to this directory:
     
    /home/ec2-user/hadoop-0.20.2/bin
    or wherever you copied HADOOP to. You will need to run the commands as follows:
     
    ./hadoop –dfs –ls
     

Let’s Use PIG
     
    Pig is described by Apache foundation as:
     
    Pig is a dataflow programming environment for processing very large files . Pig's language is called Pig Latin . A Pig Latin program consists of a directed
acyclic graph where each node represents an operation that transforms data.
Operations are of two flavors: (1) relational-algebra style operations such as
join, filter, project; (2) functional-programming style operators such as map,
reduce.

Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.
Change to Pig Directory and Run Sample Script From The Tutorial
     
    This script queries The Excite search engine search log file. Please be aware this will take some time to run! This is checking for the frequency of search phrases and uses Hadoop.
     
    The Query Phrase Popularity script (script1-local.pig or script1-hadoop.pig) processes a search query log file from the Excite search engine and finds search phrases that occur with particular high frequency during certain times of the day.
    The output file will report the following and perform basic functions and statistics:
     
    hour , ngram , score , count , mean
     
    Run the following command:
     
    cd /home/ec2-user/pig-0.10.1/tutorial/pigtmp
    And this command:
     
    pig ../scripts/script1-hadoop.pig
     
    You will see a lot of processing information dumped to the screen like:
     
    2013-02-09 00:07:11,446 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-02-09 00:07:11,454 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-02-09 00:07:11,454 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-02-09 00:07:11,456 [Thread-5] INFO 
2013-02-09 00:07:16,614 [Thread-14] INFO  org.apache.hadoop.mapred.MapTask - kvstart = 0; kvend = 262144; length = 327680
2013-02-09 00:07:20,808 [communication thread] INFO 

Similar Books

A Cast of Vultures

Judith Flanders

Can't Shake You

Molly McLain

Wings of Lomay

Devri Walls

Charmed by His Love

Janet Chapman

Angel Stations

Gary Gibson

Cheri Red (sWet)

Charisma Knight