Karan =>PIG Clarifications needed


3 Answer(s)


Hi Ashley,

Probably Karan could answer better.

I can try answering the first 2 questions

1. But from my knowledge like the way we did in mapreduce for nasdaq assignment we checked if the column is not 'exchange' then process the records, similarly we may have to use may be like
FILTER X exchange !='exchange' which you define in the schema
so that it can skip the header record.
2. For the top 5 records we may have to use ORDER BY ASC/DESC and then apply LIMIT BY 5

3. I tried using multiple commands but couldn't. I am not sure if it applies to all but for the wordcount program we did use flatten(TOKENIZE($0))

FYI- One thing I realised was the aggregate functions are case sensitive use SUM, COUNT, AVG.

1. or probably if we know the first row has header we can FILTER the previous relation >1

hi Ashley,

By default the first line will be ignored by Pig when it queries the data using the Pig Latin.

you can login to pig either in hdfs or local mode as shown below and can execute hdfs commands like mkdir, ls ,etc

pig –x (will run in hadoop mode),
pig –x local (will invoke grunt shell)

Thanks