Monday, January 11, 2016

Solution to Ruby's REST Client File Corruption Error

One of my projects involves working with REST API to upload binary files and associated metadata, and we are using Ruby to write our programs here. There are multiple ways to work with REST API in Ruby, some of the popular ones are -
1. Ruby's own HTTP client API Net::HTTP
2. REST Client gem
3. Faraday gem

Among three of these, I found REST Client to be comparatively easy to use as it has simpler syntax, and has advanced enough options to get the work done. But I was having a strange corrupted file error while using this, where only PDF files were uploaded fine but the other file formats were corrupted when uploaded using REST Client. Our system is a bit more complicated where it downloads the files from different source, and then ingests them into DSpace repository using their REST API. So, we were not sure at first in which step of the whole process the files are getting corrupted. Our primary assumption was, the files were probably sent for uploading into DSpace before they were fully downloaded, hence they were broken. But they seemed to be downloaded fine, instead something was going wrong in the upload process using REST Client. 

Here is the method for POSTing file mentioned in the original documentation of REST Client- 

RestClient.post( url,
  {
      :transfer => {
      :path => '/foo/bar',
      :owner => 'that_guy',
      :group => 'those_guys'
    },
      :upload => {
      :file => File.new(path, 'rb')
    }
  })

After digging little bit I found that some other people had similar problem using REST multipart POST - librelist archives - uploaded pictures are in a bad shape. According to one solution, form encoding of the payload could be the main reason for corruption. Though I was unable to open other file formats, after checking the content of an uploaded JSON-LD file I could figure out what was happening inside. This is how the corrupted file looks like when uploaded using multipart POST.

--653361
Content-Disposition: form-data; name="transfer[type]"

bitstream
--653361
Content-Disposition: form-data; name="upload[file]"; filename="Filetype Check.jsonld"
Content-Type: text/plain

[ Main content of the original JSON-LD]

--653361--

It creates a wrapper around the main content by including the information provided in payload following the original method, and it does the same to other file formats as well. Thus corrupting the files. And here are two solutions that I have found - not using multipart form in REST Client and using Net::HTP.

1. Using REST Client without multipart form

open('filepath') do |fh|
response = RestClient.post(
"url/to/post/file", fh,
{ :content_type => 'application/json', :accept => 'application/json'})
end
p "#{response}"

Or, simply -
RestClient.post("url/to/post/file", File.new('filepath', 'rb'), {:content_type => 'application/json', :accept => 'application/json'})

2. Using Net::HTTP

data = File.read('filepath')
url = "url/to/post/file"
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true                                          # include this if the connection is secured, otherwise fails

request = Net::HTTP::Post.new(uri.request_uri)
request["header"] = "header_info" # if any additional header is required
request.body = data
request.content_type = 'image/jpg'
res = http.request(request)
puts res.body

Both of these methods can upload file without corrupting them by creating that strange wrapper. Hope this would be helpful for any Ruby programmers struggling with the same problem.