isSplitable
protected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path filename)
Checks to see if the file we are looking at is splittable.
A file is splittable if it is:
- Uncompressed.
- Compressed with the BGZFEnhancedGzipCodec _and_ the underlying stream is
a BGZF stream. BGZFEnhancedGzipCodec looks for files with a .gz
extension, which means that the codec may be selected if the file is a
non-block GZIPed file, and thus is non-splittable. To validate this, we
use HTSJDKs in-built mechanism for checking if a stream is a BGZF stream.
- Any other splittable codec (e.g., .bgz/BGZFCodec, .bz2/BZip2Codec)
- Overrides:
isSplitable
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,org.apache.hadoop.io.Text>
- Parameters:
context
- The job context to get the configuration from.
filename
- The path the input file is saved at.
- Returns:
- Returns false if this file is compressed.