Hadoop consists of two components; HDFS, a filesystem built for high read speeds, and YARN, a resource manager. HDFS is not a POSIX filesystem, so normal command line tools like “cp” and “mv” will not work. Most of the common tools have been reimplemented for HDFS and can be run using the “hdfs dfs” command. All data must be in HDFS for jobs to be able to read it.
Here are a few basic commands:
# List the contents of your HDFS home directory hdfs dfs -ls # Copy local file data.csv to your HDFS home directory hdfs dfs -put data.csv data.csv # Copy HDFS file data.csv back to your local home directory hdfs dfs -get data.csv data2.csv
A complete reference of HDFS commands can be found on the Apache website.