Hadoop: HDFS find / recover corrupt blocks


1) Search for files on corrupt files:

A command like ‘hadoop fsck /’ will show the status of the filesystem and any corrupt files. This command will ignore lines with nothing but dots and lines talking about replication:

hadoop fsck / | egrep -v '^\.+$' | grep -v eplica

2) Determine the corrupt blocks:

hadoop fsck /path/to/corrupt/file -locations -blocks -files

(Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.)

3) Try to copy the files to S3 with s3distcp or s3cmd. If that fails, you will have the option to run:

hadoop fsck -move

which will move what is left of the corrupt blocks into hdfs /lost+found

4) Delete the file:

hadoop fs -rm /path/to/file/with/permanently/missing/blocks

Check file system state again with step 1.

A more drastic command is:

hadoop fsck / -delete

that will search and delete all corrupted files.

Hadoop should not use corrupt blocks again unless the replication factor is low and it does not have enough replicas

References:

http://hadoop.apache.org/docs/r0.19.0/commands_manual.html#fsck

Anuncios

Acerca de hvivani

sysadmin, developer, RHCSA
Esta entrada fue publicada en Uncategorized y etiquetada , , . Guarda el enlace permanente.

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s