Task attempt .... failed to report status for 600 seconds.Killing!
Also in hadoop docs you can find the following:
By default 600000 is The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. A value of 0 disables the timeout.
So if your mapper work takes long time to emit any output, you can change default timeout using option:
But it still didn't work for me, because my mapper task could take days or weeks to finish its work.
So I decided simply update status string every X seconds in my python script. Let's go back to hadoop docs again and read:
How do I update status in streaming applications? A streaming process can use the stderr to emit status information. To set a status, reporter:status:<message> should be sent to stderr.
In python code this can be done like this:
print >> sys.stderr, "reporter:status:" + "hello_hadoop"