Attachment_fu sanitize filename, Regex and Unicode gotcha
Attachment_fu sanitizes the filenames on uploads to remove any funky character (not 0-9 a-z A-Z, underscore or a period). This is accomplished by the sanitize_filename private method in attachment_fu.rb file
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub! /^.*(\\|\/)/, ''
# Finally, replace all non alphanumeric, underscore or periods with underscore
name.gsub! /[^\w\.\-]/, ‘_’
end
end
The shortcut \w is described here as letter or digit; same as [0-9A-Za-z]. However since ruby regex engine has support for unicode, letter means any unicode character. So it will let characters like 爱与希望 remain. This can be a problem if you are passing a filename containing such characters to a flash player. The flash player just won’t play the file!
A quick solution would be to check specifically for 0-9A-Za-z. This can be done by changing the function to
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub! /^.*(\\|\/)/, ''
# Finally, replace all non alphanumeric, underscore or periods with underscore
# name.gsub! /[^\w\.\-]/, ‘_’
# Basically strip out the non-ascii alphabets too and replace with x. You don’t want all _
name.gsub!(/[^0-9A-Za-z.\-]/, ‘x’)
end
end
Finally this is not a problem if non ascii characters don’t cause any issue in your site.
Popularity: 100% [?]
Posted in attachment_fu, gotcha, ruby on rails